Bickerstaff, James
(2024)
Design Space Exploration of High-Throughput Graph- and Signal-Processing Architectures using High-Level Synthesis and FPGAs.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
Data-intensive apps are becoming ever more prevalent due to the increasing amount of information available from sources such as social media and high-resolution sensors. The need to rapidly process this data and provide insights cannot be met easily through traditional computing methods. Accelerating apps through the use of custom hardware and specialized techniques is key for more efficient processing as datasets continue to grow in scale. This research focuses on creating high-throughput acceleration architectures for Intel FPGA devices using the oneAPI high-level synthesis (HLS) toolkit. We target two areas of research: graph processing and signal processing. The two chosen graph operations are breadth-first search (BFS) and minimum-spanning-tree (MST). The signal processing investigation focuses on accelerating the Fast Fourier Transform (FFT). Custom, partition-based methods are designed and developed for the acceleration of BFS and MST. Through design space exploration, we evaluate overall performance and productivity gains achieved by leveraging the oneAPI tools. Results showcase BFS performance of up to 75 million traversed edges per second, achieving up to 3.0× speedup over the Intel Xeon 6128 CPU baseline. Despite falling short of related hardware description language (HDL) research, the HLS methods created use 5.85× fewer lines of code compared to the HDL implementations. MST designs exhibit speedups of ∼1.5× when compared to the CPU baseline. To accelerate FFT using oneAPI and FPGA, a feedforward architecture was implemented and optimized. A design space exploration is performed to evaluate varying FFT resolutions, from 64k-point up to 512k-point in size. We find that a resolution of 256k-point provides a balance between resource utilization and performance, however, its performance lags behind that of the Fastest Fourier Transform in the West (FFTW) and Intel oneMKL libraries when executed in parallel on an eight-core Intel Xeon Platinum 8256 processor. Through the creation of these architectures, we are able to demonstrate the high productivity available with the oneAPI toolkit by evaluating different configurations of the designs with only minor changes to the code base.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
11 January 2024 |
Date Type: |
Publication |
Defense Date: |
9 November 2023 |
Approval Date: |
11 January 2024 |
Submission Date: |
2 October 2023 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
58 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
FPGA, high-level synthesis (HLS), graph, breadth-first search (BFS), minimum-spanning-tree (MST), fast Fourier transform (FFT), million traversed edges per second (MTEPS), oneAPI |
Date Deposited: |
11 Jan 2024 19:33 |
Last Modified: |
11 Jan 2024 19:33 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/45424 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |