Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Adaptive Memory Management for CPU-GPU Heterogeneous Systems

Ganguly, Debashis (2021) Adaptive Memory Management for CPU-GPU Heterogeneous Systems. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (6MB) | Preview


High compute-density with massive thread-level parallelism of Graphics Processing Units (GPUs) is behind their unprecedented adoption in systems ranging from data-centers to high-performance computing installations. Currently, discrete GPU(s) combined with CPU via slow CPU-GPU interconnect dominate these computing platforms. The introduction of on-demand paging and fault-driven migration support in the newer generation GPUs, powered by software-managed unified memory runtime, simplified memory management in the CPU-GPU heterogeneous memory systems and ensured higher programmability. As GPUs are increasingly being used to accelerate general-purpose applications beyond traditional graphics processing, these systems raise a number of design challenges, including smart runtime systems, programming libraries, and micro-architecture.

One of the key challenges this dissertation aims to address is the performance slowdown under device memory oversubscription. When the working set of an application exceeds the device's memory capacity, CPU-GPU interconnect-traffic from page eviction and software prefetching becomes a major source of performance bottleneck. Firstly, this dissertation proposes a pre-eviction policy, that adapts the semantics of software prefetcher to reduce the CPU-GPU interconnect traffic from unnecessary page thrashing. Secondly, this dissertation proposes an adaptive page migration and pinning strategy for the runtime that adapts to the irregularity in the access pattern based on the frequency of memory access. Disparate applications demand special attention for memory management based on their workload characteristics, thread-level parallelism, and memory access pattern. Finally, this dissertation introduces a smart runtime that transparently caters to different classes of applications by unifying a wide array of memory management strategies. As GPUs are becoming an integral part of commodity computing clusters, assuring system throughput and execution fairness is becoming a critical challenge for multi-tenant workloads. To this end, the dissertation proposes a CPU-GPU interconnect scheduler that provisions network traffic adapting to the disparate computation characteristics and bandwidth demands of participating applications in the composed workload. By introducing all these techniques, the dissertation makes significant progress towards realizing the goal of developing an adaptive, smart software-managed runtime for CPU-GPU heterogeneous memory systems.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMelhem,
Committee MemberYang,
Committee MemberZhang,
Committee MemberChilders,
Date: 3 January 2021
Date Type: Publication
Defense Date: 13 October 2020
Approval Date: 3 January 2021
Submission Date: 21 October 2020
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 124
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: heterogeneous systems, adaptive, memory management, GPU, CPU-GPU interconnect, page replacement, page pinning, page migration, multi-tenancy, unified runtime
Date Deposited: 03 Jan 2022 06:00
Last Modified: 03 Jan 2022 06:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item