Rahman, Musfiq N
(2017)
CONTINUOUS ONLINE MEMORY DIAGNOSTIC.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Today’s computers have gigabytes of main memory due to improved DRAM density. As density increases, smaller bit cells become more susceptible to errors. With an increase in error susceptibility, the need for memory resiliency also increases. Self-testing of memory health can proactively check for errors to improve resiliency. Developing a memory diagnostic is challenging due to requirements for transparency, scalability and low performance overheads. In my thesis, I developed a software-only self-test to continuously test memory. I present the challenges and the design for two approaches, called COMeT and Asteroid, that are built on a common software framework for memory diagnostic and target chip multiprocessors. COMeT tests memory health simultaneously with single-threaded and multi-threaded application execution in anticipation of memory allocation requests. The approach guarantees that memory is tested within a fixed time interval to limit exposure to lurking errors. On the SPEC CPU2006 and the PARSEC benchmarks, COMeT has a low 4% average performance overhead.
Despite the promising results, COMeT showed poor scalability on multi-programmed workload environment with high memory pressure. I developed another novel approach, Asteroid, which can adapt at runtime to workload behavior and resource availability to maximize test quality while reducing performance overhead. Asteroid is designed to support control policies to dynamically configure a diagnostic. Asteroid is seamlessly integrated with a hierarchical memory allocator in modern operating systems and is optimized to achieve higher memory test speed than COMeT. Using an adaptive policy, in a 16-core server, Asteroid has modest overhead of 1% to 4% for workloads with low to high memory demand. For these workloads, Asteroid’s adaptive policy shows good error coverage and can thoroughly test memory. Thorough evaluation of my techniques provides experimental justification that a transparent and online software-based strategy for memory diagnostic can be achievable by utilizing over-provisioned system resources.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
Rahman, Musfiq N | mur1@pitt.edu | mur1 | |
|
ETD Committee: |
|
Date: |
30 January 2017 |
Date Type: |
Publication |
Defense Date: |
8 December 2016 |
Approval Date: |
30 January 2017 |
Submission Date: |
2 December 2016 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
116 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Computer Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Self-testing, errors, resilience, memory, DRAM, diagnostic |
Date Deposited: |
30 Jan 2017 14:29 |
Last Modified: |
31 Jan 2017 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/30465 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |