Hammoud, Mohammad Hussein
(2010)
Hardware-Oriented Cache Management for Large-Scale Chip Multiprocessors.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to effectively manage the limited on-chip cache resources shared among co-scheduled threads/processes. This thesis proposes new hardware-oriented solutions for distributed CMP caches. Computer architects are faced with growing challenges when designing cache systems for CMPs. These challenges result from non-uniform access latencies, interference misses, the bandwidth wall problem, and diverse workload characteristics. Our exploration of the CMP cache management problem suggests a CMP caching framework (CC-FR) that defines three main approaches to solve the problem: (1) data placement, (2) data retention, and (3) data relocation. We effectively implement CC-FR's components by proposing and evaluating multiple cache management mechanisms.Pressure and Distance Aware Placement (PDA) decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Flexible Set Balancing (FSB), on the other hand, reduces interference misses via extending the life time of cache lines through retaining some fraction of the working set at underutilized local sets to satisfy far-flung reuses. PDA implements CC-FR's data placement and relocation components and FSB applies CC-FR's retention approach.To alleviate non-uniform access latencies and adapt to phase changes in programs, Adaptive Controlled Migration (ACM) dynamically and periodically promotes cache blocks towards L2 banks close to requesting cores. ACM lies under CC-FR's data relocation category. Dynamic Cache Clustering (DCC), on the other hand, addresses diverse workload characteristics and growing non-uniform access latencies challenges via constructing a cache cluster for each core and expands/contracts all clusters synergistically to match each core's cache demand. DCC implements CC-FR's data placement and relocation approaches. Lastly, Dynamic Pressure and Distance Aware Placement (DPDA) combines PDA and ACM to cooperatively mitigate interference misses and non-uniform access latencies. Dynamic Cache Clustering and Balancing (DCCB), on the other hand, combines DCC and FSB to employ all CC-FR's categories and achieve higher system performance. Simulation results demonstrate the effectiveness of the proposed mechanisms and show that they compare favorably with related cache designs.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
Hammoud, Mohammad Hussein | moh7@pitt.edu | MOH7 | |
|
ETD Committee: |
|
Date: |
30 September 2010 |
Date Type: |
Completion |
Defense Date: |
1 July 2010 |
Approval Date: |
30 September 2010 |
Submission Date: |
6 July 2010 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Computer Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Data Placement; Data Relocation; Chip Multiprocessors; Data Retention |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-07062010-215303/, etd-07062010-215303 |
Date Deposited: |
10 Nov 2011 19:50 |
Last Modified: |
15 Nov 2016 13:45 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/8284 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |