Johnson, Alex
(2021)
Performance and Productivity Evaluation of HPC Communication Libraries and Programming Models.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
To reach exascale performance, data centers must scale their systems, increasing the number of nodes and equipping them with high-performance network interconnects. Orchestration of the communication between nodes serves as one of the most performance-critical aspects of highly distributed app development. While the standard for HPC communication is two-sided communication as represented by Message Passing Interface (MPI), the use of two-sided communication may not effectively express certain communication patterns. It may also fail to take advantage of key performance-critical features supported by state-ofthe-art interconnects such as remote direct memory access (RDMA). By contrast, one-sided communication libraries such as MPI’s extensions for remote memory access (RMA) and OpenSHMEM can provide developers with the added flexibility of one-sided communication primitives and the capability to take advantage of RDMA. To investigate these approaches, this research provides comparative performance and productivity analysis of two-sided MPI, one-sided MPI and OpenSHMEM using kernels to simulate various communication and computation patterns representative of HPC apps. Performance is measured in terms of latency and achieved throughput using up to 320 nodes at the National Energy Research Scientific Computing Center (NERSC) Cori and Pittsburgh Supercomputing Center (PSC) Bridges-2 systems. Additionally, the productivity of the communication interfaces is analyzed quantitatively and qualitatively. RMA-based APIs are found to show lower latency and efficient scalability across the DAXPY, Cannon’s Algorithm Matrix Multiply, SUMMA Matrix Multiply and Integer Sort kernels. Similarly, the RMA-based libraries achieve the best throughput, with OpenSHMEM achieving up to double the total concurrent data movement of MPI. Conversely, MPI’s two-sided API produces the simplest programs in terms of lines of code and API calls, but it generally shows the highest latency across the evaluated kernels. The OpenSHMEM API achieves the highest performance for the four kernels and is simpler in terms of our productivity metrics than one-sided MPI for RMA-optimized codes. In contrast to these findings, two-sided MPI remains a strong library for HPC communication due to its robust set of API calls and optimized collective performance.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
13 June 2021 |
Date Type: |
Publication |
Defense Date: |
1 April 2021 |
Approval Date: |
13 June 2021 |
Submission Date: |
19 March 2021 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
57 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
RDMA, MPI, OpenSHMEM, HPC, Supercomputing |
Date Deposited: |
13 Jun 2021 18:36 |
Last Modified: |
13 Jun 2021 18:36 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/40403 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |