Kljucaric, Luke
(2019)
Deep-Learning Inferencing with High-Performance Hardware Accelerators.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
In order to improve their performance-per-watt capabilities over general-purpose architectures, FPGAs are commonly employed to accelerate applications. With the exponential growth of available data, machine-learning apps have generated greater interest in order to better understand that data and increase autonomous processing. As FPGAs become more readily available through cloud services like Amazon Web Services F1 platform, it is worth studying the performance of accelerating machine-learning apps on FPGAs over traditional fixed-logic devices, like CPUs and GPUs. FPGA frameworks for accelerating convolutional neural networks, which are used in many machine-learning apps, have started emerging for accelerated-application development. This thesis aims to compare the performance of these emerging frameworks on two commonly-used convolutional neural networks, GoogLeNet and AlexNet. Specifically, handwritten Chinese character recognition is benchmarked across multiple currently available FPGA frameworks on Xilinx and Intel FPGAs and compared against multiple CPU and GPU architectures featured on AWS, Google’s Cloud platform, the University of Pittsburgh’s Center for Research Computing (CRC), and Intel’s vLab Academic Cluster. All NVIDIA GPUs have proven to have the best performance over every other device in this study. The Zebra framework available for Xilinx FPGAs showed to have an average 8.3× and 9.3× better performance and efficiency, respectively, over the OpenVINO framework available for Intel FPGAs. Although the Zebra framework on the Xilinx VU9P showed better efficiency than the Pascal-based GPUs, the NVIDIA Tesla V100 proved to be the most efficient device at 125.9 and 47.2 images-per-second-per-Watt for AlexNet and GoogLeNet, respectively. Although currently lacking, FPGA frameworks and devices have the potential to compete with GPUs in terms of performance and efficiency.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
23 January 2019 |
Date Type: |
Publication |
Defense Date: |
28 November 2018 |
Approval Date: |
23 January 2019 |
Submission Date: |
29 November 2018 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
54 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
FPGA, GPUs, CPUs, Machine-learning, CNN, Accelerator, HPC |
Date Deposited: |
23 Jan 2019 16:21 |
Last Modified: |
23 Jan 2020 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/35658 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |