Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs

Wang, Zhepeng (2021) Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs. Master's Thesis, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Preview

PDF (The file for my revised ETD)
Primary Text
Download (582kB) | Preview

Abstract

Deep neural networks (DNNs) have shown their great power in effectively extracting features and making predictions from noisy input data, which makes them the most widely used algorithm in machine learning applications. In the meantime, microcontroller units (MCUs) have become the most common processors in our daily life. Therefore, integrating DNNs into MCUs will definitely make a huge impact on the real world. Despite its importance, little attention has been paid to the deployment of DNNs onto MCUs yet. DNNs are usually resource-intensive while MCUs are resource-constrained, which often makes it infeasible to directly run DNNs on MCUs. Apart from the low frequency (1-16 MHz) and limited storage (e.g., 64KB to 256KB ROM), one of the biggest challenges is the small RAM size (e.g., 2KB to 16KB), which is needed to save the intermediate feature maps of a DNN in the runtime. Most existing DNN compression algorithms aim to reduce the model size so that the model can fit into limited storage. However, these algorithms do not reduce the size of intermediate feature maps significantly, which is referred to as working memory and might exceed the capacity of RAM. Therefore, it is possible that DNNs cannot run on MCUs even after compression. To address this problem, this work proposes a technique to dynamically prune the activation values of the output feature maps in the runtime if necessary, such that intermediate feature maps can fit into limited RAM. Experimental results on SVHN and CIFAR-10 have shown that the proposed algorithm could significantly reduce the working memory of a DNN to satisfy the hard constraint of RAM size while maintaining satisfactory accuracy with relatively low overhead on memory and run-time latency.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Wang, Zhepeng	zhw82@pitt.edu	zhw82

ETD Committee:

Title	Member	Email Address
Committee Chair	Hu, Jingtong	jthu@pitt.edu
Committee Member	Mao, Zhi-Hong	zhm4@pitt.edu
Committee Member	Dickerson, Samuel	dickerson@pitt.edu

Date:

3 September 2021

Date Type:

Publication

Defense Date:

12 July 2021

Approval Date:

3 September 2021

Submission Date:

9 July 2021

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Electrical and Computer Engineering

Degree:

MS - Master of Science

Thesis Type:

Master's Thesis

Refereed:

Yes

Uncontrolled Keywords:

Neural Network Deployment, Neural Network Compression, Embedded System, Artificial Intelligence of Things (AIoT), On-Device Artificial Intelligence (AI)

Date Deposited:

03 Sep 2021 15:55

Last Modified:

03 Sep 2021 15:55

URI:

http://d-scholarship.pitt.edu/id/eprint/41445

Available Versions of this Item

Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs. (deposited UNSPECIFIED)
- Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs. (deposited 03 Sep 2021 15:55) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs

Abstract

Share

Details

Available Versions of this Item

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds