Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

HMC-Based Accelerator Design For Compressed Deep Neural Networks

Min, Chuhan (2020) HMC-Based Accelerator Design For Compressed Deep Neural Networks. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (3MB) | Preview

Abstract

Deep Neural Networks (DNNs) offer remarkable performance of classifications and regressions in many high dimensional problems and have been widely utilized in real-word cognitive applications. In DNN applications, high computational cost of DNNs greatly hinder their deployment in resource-constrained applications, real-time systems and edge computing platforms. Moreover, energy consumption and performance cost of moving data between memory hierarchy and computational units are higher than that of the computation itself. To overcome the memory bottleneck, data locality and temporal data reuse are improved in accelerator design. In an attempt to further improve data locality, memory manufacturers have invented 3D-stacked memory where multiple layers of memory arrays are stacked on top of each other. Inherited from the concept of Process-In-Memory (PIM), some 3D-stacked memory architectures also include a logic layer that can integrate general-purpose computational logic directly within main memory to take advantages of high internal bandwidth during computation.
In this dissertation, we are going to investigate hardware/software co-design for neural network accelerator. Specifically, we introduce a two-phase filter pruning framework for model compression and an accelerator tailored for efficient DNN execution on HMC, which can dynamically offload the primitives and functions to PIM logic layer through a latency-aware scheduling controller.
In our compression framework, we formulate filter pruning process as an optimization problem and propose a filter selection criterion measured by conditional entropy. The key idea of our proposed approach is to establish a quantitative connection between filters and model accuracy. We define the connection as conditional entropy over filters in a convolutional layer, i.e., distribution of entropy conditioned on network loss. Based on the definition, different pruning efficiencies of global and layer-wise pruning strategies are compared, and two-phase pruning method is proposed. The proposed pruning method can achieve a reduction of 88% filters and 46% inference time reduction on VGG16 within 2% accuracy degradation.
In this dissertation, we are going to investigate hardware/software co-design for neural network accelerator. Specifically, we introduce a two-phase filter pruning framework for model compres- sion and an accelerator tailored for efficient DNN execution on HMC, which can dynamically offload the primitives and functions to PIM logic layer through a latency-aware scheduling con- troller.
In our compression framework, we formulate filter pruning process as an optimization problem and propose a filter selection criterion measured by conditional entropy. The key idea of our proposed approach is to establish a quantitative connection between filters and model accuracy. We define the connection as conditional entropy over filters in a convolutional layer, i.e., distribution of entropy conditioned on network loss. Based on the definition, different pruning efficiencies of global and layer-wise pruning strategies are compared, and two-phase pruning method is proposed. The proposed pruning method can achieve a reduction of 88% filters and 46% inference time reduction on VGG16 within 2% accuracy degradation.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Min, Chuhanchm114@pitt.educhm114
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairChen, Yiranyiran.chen@duke.edu
Committee MemberMao, Zhi-Hongzhm4@pitt.edu
Committee MemberMiskov-Zivanov, Natasanmzivanov@pitt.edu
Committee MemberDickerson, Samueldickerson@pitt.edu
Committee MemberZeng, Bobzeng@pitt.edu
Date: 29 January 2020
Date Type: Publication
Defense Date: 20 November 2019
Approval Date: 29 January 2020
Submission Date: 21 November 2019
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 99
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Neural network, HMC, model compression, software/hardware co-design
Date Deposited: 29 Jan 2020 16:13
Last Modified: 29 Jan 2020 16:13
URI: http://d-scholarship.pitt.edu/id/eprint/37869

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item