Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Deep-Learning Inferencing with High-Performance Hardware Accelerators

Kljucaric, Luke (2019) Deep-Learning Inferencing with High-Performance Hardware Accelerators. Master's Thesis, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF (V2)
Download (1MB) | Preview

Abstract

In order to improve their performance-per-watt capabilities over general-purpose architectures, FPGAs are commonly employed to accelerate applications. With the exponential growth of available data, machine-learning apps have generated greater interest in order to better understand that data and increase autonomous processing. As FPGAs become more readily available through cloud services like Amazon Web Services F1 platform, it is worth studying the performance of accelerating machine-learning apps on FPGAs over traditional fixed-logic devices, like CPUs and GPUs. FPGA frameworks for accelerating convolutional neural networks, which are used in many machine-learning apps, have started emerging for accelerated-application development. This thesis aims to compare the performance of these emerging frameworks on two commonly-used convolutional neural networks, GoogLeNet and AlexNet. Specifically, handwritten Chinese character recognition is benchmarked across multiple currently available FPGA frameworks on Xilinx and Intel FPGAs and compared against multiple CPU and GPU architectures featured on AWS, Google’s Cloud platform, the University of Pittsburgh’s Center for Research Computing (CRC), and Intel’s vLab Academic Cluster. All NVIDIA GPUs have proven to have the best performance over every other device in this study. The Zebra framework available for Xilinx FPGAs showed to have an average 8.3× and 9.3× better performance and efficiency, respectively, over the OpenVINO framework available for Intel FPGAs. Although the Zebra framework on the Xilinx VU9P showed better efficiency than the Pascal-based GPUs, the NVIDIA Tesla V100 proved to be the most efficient device at 125.9 and 47.2 images-per-second-per-Watt for AlexNet and GoogLeNet, respectively. Although currently lacking, FPGA frameworks and devices have the potential to compete with GPUs in terms of performance and efficiency.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Kljucaric, LukeLEK70@pitt.eduLEK70
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGeorge, Alanalan.george@pitt.edu
Committee MemberDickerson, Samueldickerson@pitt.edu
Committee MemberYang, Junjuy9@pitt.edu
Date: 23 January 2019
Date Type: Publication
Defense Date: 28 November 2018
Approval Date: 23 January 2019
Submission Date: 29 November 2018
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 54
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: FPGA, GPUs, CPUs, Machine-learning, CNN, Accelerator, HPC
Date Deposited: 23 Jan 2019 16:21
Last Modified: 23 Jan 2020 06:15
URI: http://d-scholarship.pitt.edu/id/eprint/35658

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item