Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Efficient Synchronization for GPGPU

Liu, Jiwei (2018) Efficient Synchronization for GPGPU. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (2MB) | Preview

Abstract

High-performance General Purpose Graphics processing units (GPGPUs) have exposed bottlenecks in synchronizations of threads and cores. The massively parallel computing cores and complex hierarchies of threads present new challenges for synchronizations at different granularities. Performance of GPU is hindered by inefficient global and local synchronizations. I propose hardware-software cooperative frameworks for efficient synchronization of GPGPU to address the following issues.
To provide efficient global synchronization (Gsync), an API with direct hardware support is proposed. The GPU cores are synchronized by an on-chip Gsync controller. Partial context switch is employed to guarantee deadlock-free execution. The proposed Gsync avoids expensive API calls and alleviates data thrashing. Prioritized warp scheduling is used to increase the overlap of context switch with kernel execution.
To efficiently exploit the inherent parallelism of producer-consumer problems, a flexible wait-signal scheme is proposed at thread-block level. I propose dedicated APIs to express fine-grained static and dynamic dependencies with hardware support. The proposed scheme can accelerate wavefront, graph and machine learning applications. The architectural design of on-chip wait-signal controller eliminates busy wait loop and long-latency memory operations. I also propose thread block dispatch scheduling to address the problem of load imbalance and large context switch overhead.
To reduce stall due to synchronizations, a synchronization-aware warp scheduling is proposed to coordinate multiple warp schedulers upon synchronization events. Both performance and hardware utilization are improved by resolving the barrier sooner.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Liu, Jiweijil138@pitt.edujil1380000-0002-8799-9763
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairYang, Junjuy9@pitt.edujuy9
Committee CoChairMelhem, Ramimelhem@cs.pitt.edumelhem
Committee MemberZhang, Youtaozhangyt@cs.pitt.eduzhangyt
Committee MemberMohanram, Kartikkmram@pitt.edukmram
Committee MemberHuang, Hengheng.huang@pitt.eduheng.huang
Committee MemberGao, Weiweigao@pitt.eduweigao
Date: 25 September 2018
Date Type: Publication
Defense Date: 12 June 2018
Approval Date: 25 September 2018
Submission Date: 20 July 2018
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 134
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: GPU Synchronization
Date Deposited: 25 Sep 2018 15:55
Last Modified: 25 Sep 2018 15:55
URI: http://d-scholarship.pitt.edu/id/eprint/34943

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item