Ganguly, Debashis
(2021)
Adaptive Memory Management for CPU-GPU Heterogeneous Systems.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
High compute-density with massive thread-level parallelism of Graphics Processing Units (GPUs) is behind their unprecedented adoption in systems ranging from data-centers to high-performance computing installations. Currently, discrete GPU(s) combined with CPU via slow CPU-GPU interconnect dominate these computing platforms. The introduction of on-demand paging and fault-driven migration support in the newer generation GPUs, powered by software-managed unified memory runtime, simplified memory management in the CPU-GPU heterogeneous memory systems and ensured higher programmability. As GPUs are increasingly being used to accelerate general-purpose applications beyond traditional graphics processing, these systems raise a number of design challenges, including smart runtime systems, programming libraries, and micro-architecture.
One of the key challenges this dissertation aims to address is the performance slowdown under device memory oversubscription. When the working set of an application exceeds the device's memory capacity, CPU-GPU interconnect-traffic from page eviction and software prefetching becomes a major source of performance bottleneck. Firstly, this dissertation proposes a pre-eviction policy, that adapts the semantics of software prefetcher to reduce the CPU-GPU interconnect traffic from unnecessary page thrashing. Secondly, this dissertation proposes an adaptive page migration and pinning strategy for the runtime that adapts to the irregularity in the access pattern based on the frequency of memory access. Disparate applications demand special attention for memory management based on their workload characteristics, thread-level parallelism, and memory access pattern. Finally, this dissertation introduces a smart runtime that transparently caters to different classes of applications by unifying a wide array of memory management strategies. As GPUs are becoming an integral part of commodity computing clusters, assuring system throughput and execution fairness is becoming a critical challenge for multi-tenant workloads. To this end, the dissertation proposes a CPU-GPU interconnect scheduler that provisions network traffic adapting to the disparate computation characteristics and bandwidth demands of participating applications in the composed workload. By introducing all these techniques, the dissertation makes significant progress towards realizing the goal of developing an adaptive, smart software-managed runtime for CPU-GPU heterogeneous memory systems.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
3 January 2021 |
Date Type: |
Publication |
Defense Date: |
13 October 2020 |
Approval Date: |
3 January 2021 |
Submission Date: |
21 October 2020 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
124 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Computing and Information > Computer Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
heterogeneous systems, adaptive, memory management, GPU, CPU-GPU interconnect, page replacement, page pinning, page migration, multi-tenancy, unified runtime |
Date Deposited: |
03 Jan 2022 06:00 |
Last Modified: |
03 Jan 2022 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/39808 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |