Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA

Li, Sicheng (2018) Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (10MB) | Preview


Deep neural network (DNN) has achieved remarkable success in many applications because of its powerful capability for data processing. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex nonlinear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. The brute-force computing model of DNN often requires extremely large hardware resources, introducing severe concerns on its scalability running on traditional von Neumann architecture. The well-known memory wall, and latency brought by the long-range connectivity and communication of DNN severely constrain the computation efficiency of DNN. The acceleration techniques of DNN, either software or hardware, often suffer from poor hardware execution efficiency of the simplified model (software), or inevitable accuracy degradation and limited supportable algorithms (hardware), respectively. In order to preserve the inference accuracy and make the hardware implementation in a more efficient form, a close investigation to the hardware/software co-design methodologies for DNNs is needed.
The proposed work first presents an FPGA-based implementation framework for Recurrent Neural Network (RNN) acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. Secondly, we propose a data locality-aware sparse matrix and vector multiplication (SpMV) kernel. At software level, we reorganize a large sparse matrix into many modest-sized blocks by adopting hypergraph-based partitioning and clustering. Available hardware constraints have been taken into consideration for the memory allocation and data access regularization. Thirdly, we present a holistic acceleration to sparse convolutional neural network (CNN). During network training, the data locality is regularized to ease the hardware mapping. The distributed architecture enables high computation parallelism and data reuse. The proposed research results in an hardware/software co-design methodology for fast and accurate DNN acceleration, through the innovations in algorithm optimization, hardware implementation, and the interactive design process across these
two domains.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Li, Sichengsil27@pitt.edusil27
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairLi,
Committee CoChairChen,
Committee MemberWang,
Committee MemberMao,
Committee MemberSejdic,
Date: 25 January 2018
Date Type: Publication
Defense Date: 29 September 2017
Approval Date: 25 January 2018
Submission Date: 29 September 2017
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 104
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Deep Learning
Date Deposited: 25 Jan 2018 21:35
Last Modified: 25 Jan 2018 21:35


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item