Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Exploring ML-Oriented Hardware for Accelerated and Scalable Feature Extraction

Kljucaric, Luke (2024) Exploring ML-Oriented Hardware for Accelerated and Scalable Feature Extraction. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (2MB) | Preview


Machine-learning (ML) algorithms, tools, and devices continually grow intending to automate and accelerate many aspects of daily life. Hardware accelerators can enable these ML apps to achieve maximized performance. The first phase of this dissertation explores the maximum throughput performance of field-programmable gate arrays (FPGAs), CPUs, and GPUs on two architecturally different convolutional neural networks (CNNs) that are comprised of similar fundamental neural-network operations: GoogLeNet and AlexNet. Because of their highly parallel nature, GPUs achieved the highest inference throughput across models and devices, where additional tensor acceleration significantly boosts performance.

To better understand the design and impacts of ML-oriented hardware and software, the second phase of this dissertation analyzes the subsequent generations of high-performance and embedded devices that feature ML optimizations in terms of latency and throughput. Tensor, vision, and other ML-focused architectures are also considered. Because many of these devices feature hardware for quantized and reduced-precision datatypes, GoogLeNet and AlexNet are quantized with more modern ML frameworks for optimized performance with state-of-the-art backend acceleration software. Though GPUs dominate in throughput and FPGAs achieve the lowest latencies, all of the devices use significant compute, memory, and power resources to achieve their respective performance.

The final phase of this dissertation explores neuromorphic technology as an alternative solution to ML object classification to reduce the overall compute, memory, and power required. Neuromorphic sensors capture events at a microsecond resolution as opposed to generating entire frames to limit the amount of redundant data captured. These events can be related spatially, through algorithms such as k-means clustering, or spatio-temporally, through neuromorphic algorithms such as "A Hierarchy Of event-based Time Surfaces" (HOTS). FPGA accelerators for k-means clustering and HOTS are designed and optimized using state-of-the-art high-level synthesis tools and evaluated on multiple datasets. The highly scalable k-means clustering accelerator achieved an event-processing latency of 65 nanoseconds and throughput of 15.38 MEvt/s while using less than 2% of available FPGA resources and being competitive in accuracy. This dissertation benchmarked many state-of-the-art hardware accelerators, analyzed the impacts of ML hardware and software optimizations, and developed an ultra-low-latency, scalable alternative to ML object classification with neuromorphic technology.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Kljucaric, Lukekljucaric@pitt.eduLEK700000-0001-6793-3524
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGeorge, AlanAlan.George@pitt.eduadg910000-0001-9665-2879
Committee MemberDickerson, Samueldickerson@pitt.edusjdst310000-0003-2281-5115
Committee MemberKubendran, Rajkumarrajkumar.ece@pitt.edurak1960000-0003-3066-4898
Committee MemberTang, Xulongxulongtang@pitt.edutax60000-0002-3385-2053
Committee MemberZhou, Peipeipeipei.zhou@pitt.edupez410000-0002-0493-1844
Date: 11 January 2024
Date Type: Publication
Defense Date: 26 October 2023
Approval Date: 11 January 2024
Submission Date: 29 September 2023
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 135
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: machine learning, neuromorphics, high-level synthesis, FPGA, neural networks, high-performance computing, architectures, design tools
Date Deposited: 11 Jan 2024 19:32
Last Modified: 11 Jan 2024 19:32


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item