Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA

Li, Sicheng (2018) Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (10MB) | Preview

Abstract

Deep neural network (DNN) has achieved remarkable success in many applications because of its powerful capability for data processing. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex nonlinear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. The brute-force computing model of DNN often requires extremely large hardware resources, introducing severe concerns on its scalability running on traditional von Neumann architecture. The well-known memory wall, and latency brought by the long-range connectivity and communication of DNN severely constrain the computation efficiency of DNN. The acceleration techniques of DNN, either software or hardware, often suffer from poor hardware execution efficiency of the simplified model (software), or inevitable accuracy degradation and limited supportable algorithms (hardware), respectively. In order to preserve the inference accuracy and make the hardware implementation in a more efficient form, a close investigation to the hardware/software co-design methodologies for DNNs is needed.
The proposed work first presents an FPGA-based implementation framework for Recurrent Neural Network (RNN) acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. Secondly, we propose a data locality-aware sparse matrix and vector multiplication (SpMV) kernel. At software level, we reorganize a large sparse matrix into many modest-sized blocks by adopting hypergraph-based partitioning and clustering. Available hardware constraints have been taken into consideration for the memory allocation and data access regularization. Thirdly, we present a holistic acceleration to sparse convolutional neural network (CNN). During network training, the data locality is regularized to ease the hardware mapping. The distributed architecture enables high computation parallelism and data reuse. The proposed research results in an hardware/software co-design methodology for fast and accurate DNN acceleration, through the innovations in algorithm optimization, hardware implementation, and the interactive design process across these
two domains.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Li, Sicheng	sil27@pitt.edu	sil27

ETD Committee:

Title	Member	Email Address
Committee Chair	Li, Hai	hai.li@duke.edu
Committee CoChair	Chen, Yiran	yiran.chen@duke.edu
Committee Member	Wang, Yu	yuwang@mail.tsinghua.edu.cn
Committee Member	Mao, Zhi-Hong	zhm4@pitt.edu
Committee Member	Sejdic, Ervin	esejdic@pitt.edu

Date:

25 January 2018

Date Type:

Publication

Defense Date:

29 September 2017

Approval Date:

25 January 2018

Submission Date:

29 September 2017

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

104

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Electrical Engineering

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Deep Learning

Date Deposited:

25 Jan 2018 21:35

Last Modified:

25 Jan 2018 21:35

URI:

http://d-scholarship.pitt.edu/id/eprint/33238

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds