Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

CONTINUOUS ONLINE MEMORY DIAGNOSTIC

Rahman, Musfiq N (2017) CONTINUOUS ONLINE MEMORY DIAGNOSTIC. Doctoral Dissertation, University of Pittsburgh.

[img]
Preview
PDF
Download (2MB) | Preview

Abstract

Today’s computers have gigabytes of main memory due to improved DRAM density. As density increases, smaller bit cells become more susceptible to errors. With an increase in error susceptibility, the need for memory resiliency also increases. Self-testing of memory health can proactively check for errors to improve resiliency. Developing a memory diagnostic is challenging due to requirements for transparency, scalability and low performance overheads. In my thesis, I developed a software-only self-test to continuously test memory. I present the challenges and the design for two approaches, called COMeT and Asteroid, that are built on a common software framework for memory diagnostic and target chip multiprocessors. COMeT tests memory health simultaneously with single-threaded and multi-threaded application execution in anticipation of memory allocation requests. The approach guarantees that memory is tested within a fixed time interval to limit exposure to lurking errors. On the SPEC CPU2006 and the PARSEC benchmarks, COMeT has a low 4% average performance overhead.

Despite the promising results, COMeT showed poor scalability on multi-programmed workload environment with high memory pressure. I developed another novel approach, Asteroid, which can adapt at runtime to workload behavior and resource availability to maximize test quality while reducing performance overhead. Asteroid is designed to support control policies to dynamically configure a diagnostic. Asteroid is seamlessly integrated with a hierarchical memory allocator in modern operating systems and is optimized to achieve higher memory test speed than COMeT. Using an adaptive policy, in a 16-core server, Asteroid has modest overhead of 1% to 4% for workloads with low to high memory demand. For these workloads, Asteroid’s adaptive policy shows good error coverage and can thoroughly test memory. Thorough evaluation of my techniques provides experimental justification that a transparent and online software-based strategy for memory diagnostic can be achievable by utilizing over-provisioned system resources.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Rahman, Musfiq Nmur1@pitt.edumur1
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairChilders, Brucechilders@cs.pitt.educhilders
Committee MemberMelhem, Ramimelhem@cs.pitt.edumelhem
Committee MemberAhn, Wonsunwahn@pitt.eduwahn
Committee MemberMohanram, Kartikkmram@pitt.edukmram
Date: 30 January 2017
Date Type: Publication
Defense Date: 8 December 2016
Approval Date: 30 January 2017
Submission Date: 2 December 2016
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 116
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Self-testing, errors, resilience, memory, DRAM, diagnostic
Date Deposited: 30 Jan 2017 14:29
Last Modified: 31 Jan 2017 06:15
URI: http://d-scholarship.pitt.edu/id/eprint/30465

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item