Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs

Wang, Zhepeng (2021) Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs. Master's Thesis, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

[img]
Preview
PDF (The file for my revised ETD)
Primary Text

Download (582kB) | Preview

Abstract

Deep neural networks (DNNs) have shown their great power in effectively extracting features and making predictions from noisy input data, which makes them the most widely used algorithm in machine learning applications. In the meantime, microcontroller units (MCUs) have become the most common processors in our daily life. Therefore, integrating DNNs into MCUs will definitely make a huge impact on the real world. Despite its importance, little attention has been paid to the deployment of DNNs onto MCUs yet. DNNs are usually resource-intensive while MCUs are resource-constrained, which often makes it infeasible to directly run DNNs on MCUs. Apart from the low frequency (1-16 MHz) and limited storage (e.g., 64KB to 256KB ROM), one of the biggest challenges is the small RAM size (e.g., 2KB to 16KB), which is needed to save the intermediate feature maps of a DNN in the runtime. Most existing DNN compression algorithms aim to reduce the model size so that the model can fit into limited storage. However, these algorithms do not reduce the size of intermediate feature maps significantly, which is referred to as working memory and might exceed the capacity of RAM. Therefore, it is possible that DNNs cannot run on MCUs even after compression. To address this problem, this work proposes a technique to dynamically prune the activation values of the output feature maps in the runtime if necessary, such that intermediate feature maps can fit into limited RAM. Experimental results on SVHN and CIFAR-10 have shown that the proposed algorithm could significantly reduce the working memory of a DNN to satisfy the hard constraint of RAM size while maintaining satisfactory accuracy with relatively low overhead on memory and run-time latency.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Wang, Zhepengzhw82@pitt.eduzhw82
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHu, Jingtongjthu@pitt.edu
Committee MemberMao, Zhi-Hongzhm4@pitt.edu
Committee MemberDickerson, Samueldickerson@pitt.edu
Date: 3 September 2021
Date Type: Publication
Defense Date: 12 July 2021
Approval Date: 3 September 2021
Submission Date: 9 July 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 38
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: Neural Network Deployment, Neural Network Compression, Embedded System, Artificial Intelligence of Things (AIoT), On-Device Artificial Intelligence (AI)
Date Deposited: 03 Sep 2021 15:55
Last Modified: 03 Sep 2021 15:55
URI: http://d-scholarship.pitt.edu/id/eprint/41445

Available Versions of this Item


Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item