# Evaluation of Algorithm-Based Fault Tolerance for Machine Learning and Computer Vision under Neutron Radiation

Roffe, Seth (2020) Evaluation of Algorithm-Based Fault Tolerance for Machine Learning and Computer Vision under Neutron Radiation. Master's Thesis, University of Pittsburgh. (Unpublished)

 Preview
PDF

## Abstract

In the past decade, there has been a push for deployment of commercial-off-the-shelf (COTS) avionics due in part to cheaper costs and the desire for more performance. Traditional radiation-hardened processors are expensive and only provide limited processing power. With smaller mission budgets and the need for more computational power, low-cost and high-performance COTS solutions become more attractive for these missions. Due to the computational capacity enhancements provided by COTS technology, machine-learning and computer-vision applications are now being deployed on modern space missions. However, COTS electronics are highly susceptible to radiation environments. As a result, reliability in the underlying computations becomes a concern. Matrix multiplication is used in machine-learning and computer-vision applications as the main computation for decisions, making it a critical part of the application. Therefore, the large time and memory footprint of the matrix multiplication in machine-learning and computer-vision applications makes them even more susceptible to single-event upsets.

In this thesis, algorithm-based fault tolerance (ABFT) is investigated to mitigate silent data errors in machine learning and computer vision. ABFT is a methodology of data error detection and correction using information redundancy contained in separate data structures from the primary data. In matrix multiplication, ABFT consists of storing checksum data in vectors separate from the matrix to use for error detection and correction. Fault injection into a matrix-multiplication kernel was performed prior to irradiation. Irradiation was then performed on the kernel under wide-spectrum neutrons at Los Alamos Neutron Science Center to observe the mitigation effects of ABFT. Fault injections targeted towards the general-purpose registers show a $48\times$ reduction in data errors using data-error mitigation with ABFT with a negligible change in run-time. Cross-section results from irradiation show a 5.3x improvement in reliability of using ABFT as opposed to no mitigation with a >99.9999 confidence level. The results of this experiment demonstrate that ABFT is a viable solution for run-time error correction in matrix multiplication for machine-learning and computer-vision applications in future spacecraft.

## Share

Citation/Export: Select format... Citation - Text Citation - HTML Endnote BibTex Dublin Core OpenURL MARC (ISO 2709) METS MODS EP3 XML Reference Manager Refer

## Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
Roffe, Sethssr35@pitt.edussr35
ETD Committee:
Committee ChairGeorge, Alanalan.george@pitt.edualan.george
Committee MemberMao, Zhi-Hongzhm4@pitt.eduzhm4
Committee MemberDallal, Ahmedahd12@pitt.eduahd12
Date: 29 July 2020
Date Type: Publication
Defense Date: 6 March 2020
Approval Date: 29 July 2020
Submission Date: 25 March 2020
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 46
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: Reliability, Data Errors, ARM, Cortex-A9, Zynq, SoC, ABFT, Neutron, Radiation
Date Deposited: 29 Jul 2020 17:48