Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Performance and Productivity Evaluation of HPC Communication Libraries and Programming Models

Johnson, Alex (2021) Performance and Productivity Evaluation of HPC Communication Libraries and Programming Models. Master's Thesis, University of Pittsburgh. (Unpublished)

Download (1MB) | Preview


To reach exascale performance, data centers must scale their systems, increasing the number of nodes and equipping them with high-performance network interconnects. Orchestration of the communication between nodes serves as one of the most performance-critical aspects of highly distributed app development. While the standard for HPC communication is two-sided communication as represented by Message Passing Interface (MPI), the use of two-sided communication may not effectively express certain communication patterns. It may also fail to take advantage of key performance-critical features supported by state-ofthe-art interconnects such as remote direct memory access (RDMA). By contrast, one-sided communication libraries such as MPI’s extensions for remote memory access (RMA) and OpenSHMEM can provide developers with the added flexibility of one-sided communication primitives and the capability to take advantage of RDMA. To investigate these approaches, this research provides comparative performance and productivity analysis of two-sided MPI, one-sided MPI and OpenSHMEM using kernels to simulate various communication and computation patterns representative of HPC apps. Performance is measured in terms of latency and achieved throughput using up to 320 nodes at the National Energy Research Scientific Computing Center (NERSC) Cori and Pittsburgh Supercomputing Center (PSC) Bridges-2 systems. Additionally, the productivity of the communication interfaces is analyzed quantitatively and qualitatively. RMA-based APIs are found to show lower latency and efficient scalability across the DAXPY, Cannon’s Algorithm Matrix Multiply, SUMMA Matrix Multiply and Integer Sort kernels. Similarly, the RMA-based libraries achieve the best throughput, with OpenSHMEM achieving up to double the total concurrent data movement of MPI. Conversely, MPI’s two-sided API produces the simplest programs in terms of lines of code and API calls, but it generally shows the highest latency across the evaluated kernels. The OpenSHMEM API achieves the highest performance for the four kernels and is simpler in terms of our productivity metrics than one-sided MPI for RMA-optimized codes. In contrast to these findings, two-sided MPI remains a strong library for HPC communication due to its robust set of API calls and optimized collective performance.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Johnson, Alexamj92@pitt.eduamj92
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGeorge, Alan
Committee MemberDallal,
Committee MemberKerestes,
Date: 13 June 2021
Date Type: Publication
Defense Date: 1 April 2021
Approval Date: 13 June 2021
Submission Date: 19 March 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 57
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: RDMA, MPI, OpenSHMEM, HPC, Supercomputing
Date Deposited: 13 Jun 2021 18:36
Last Modified: 13 Jun 2021 18:36


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item