Deep-Learning Inferencing with High-Performance Hardware Accelerators

Kljucaric, Luke (2019) Deep-Learning Inferencing with High-Performance Hardware Accelerators. Master's Thesis, University of Pittsburgh. (Unpublished)

Preview

PDF (V2)
Download (1MB) | Preview

Abstract

In order to improve their performance-per-watt capabilities over general-purpose architectures, FPGAs are commonly employed to accelerate applications. With the exponential growth of available data, machine-learning apps have generated greater interest in order to better understand that data and increase autonomous processing. As FPGAs become more readily available through cloud services like Amazon Web Services F1 platform, it is worth studying the performance of accelerating machine-learning apps on FPGAs over traditional fixed-logic devices, like CPUs and GPUs. FPGA frameworks for accelerating convolutional neural networks, which are used in many machine-learning apps, have started emerging for accelerated-application development. This thesis aims to compare the performance of these emerging frameworks on two commonly-used convolutional neural networks, GoogLeNet and AlexNet. Specifically, handwritten Chinese character recognition is benchmarked across multiple currently available FPGA frameworks on Xilinx and Intel FPGAs and compared against multiple CPU and GPU architectures featured on AWS, Google’s Cloud platform, the University of Pittsburgh’s Center for Research Computing (CRC), and Intel’s vLab Academic Cluster. All NVIDIA GPUs have proven to have the best performance over every other device in this study. The Zebra framework available for Xilinx FPGAs showed to have an average 8.3× and 9.3× better performance and efficiency, respectively, over the OpenVINO framework available for Intel FPGAs. Although the Zebra framework on the Xilinx VU9P showed better efficiency than the Pascal-based GPUs, the NVIDIA Tesla V100 proved to be the most efficient device at 125.9 and 47.2 images-per-second-per-Watt for AlexNet and GoogLeNet, respectively. Although currently lacking, FPGA frameworks and devices have the potential to compete with GPUs in terms of performance and efficiency.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Kljucaric, Luke	LEK70@pitt.edu	LEK70

ETD Committee:

Title	Member	Email Address
Committee Chair	George, Alan	alan.george@pitt.edu
Committee Member	Dickerson, Samuel	dickerson@pitt.edu
Committee Member	Yang, Jun	juy9@pitt.edu

Date:

23 January 2019

Date Type:

Publication

Defense Date:

28 November 2018

Approval Date:

23 January 2019

Submission Date:

29 November 2018

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Electrical and Computer Engineering

Degree:

MS - Master of Science

Thesis Type:

Master's Thesis

Refereed:

Yes

Uncontrolled Keywords:

FPGA, GPUs, CPUs, Machine-learning, CNN, Accelerator, HPC

Date Deposited:

23 Jan 2019 16:21

Last Modified:

23 Jan 2020 06:15

URI:

http://d-scholarship.pitt.edu/id/eprint/35658

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Deep-Learning Inferencing with High-Performance Hardware Accelerators

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds