Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

FPGA Acceleration of k-mer Counting using On-Chip HBM2 and oneAPI

Lucas, Owen (2024) FPGA Acceleration of k-mer Counting using On-Chip HBM2 and oneAPI. Master's Thesis, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (3MB) | Preview

Abstract

Counting substrings of an arbitrary length k (k -mers) is the single most time-consuming step of de novo genome sequencing. Sequencing machines generate large quantities of data (>100s of GBs per genome). Processing this genetic information requires frequent memory accesses into data structures considerably larger than available cache, leading to a memory-bound runtime. Stemming from the gap between processor and memory speed, this bottleneck can be alleviated through alternative computing architectures. Recent FPGA devices, equipped with on-chip High-Bandwidth Memory (HBM), enable custom architectures to employ high-capacity, high-bandwidth memory to address memory-bound tasks. This research investigates accelerating k-mers counting with one such device, the BittWare 520N-MX, a Stratix 10 FPGA with 16 GB of on-chip HBM2. The architecture was designed using Intel’s oneAPI framework. The accelerator architecture leverages inherent parallelism in the algorithm via multiple parallel hash functions, along with partitioning data structures across multiple memory banks, and employing multiple independent parallel processing pipelines on the device to maximize throughput. The accelerator achieves 57.98M k-mers per second, 3.80× more than the throughput-optimized CPU version and 5.85× more than the original CPU app. This was done despite the clock speeds in the oneAPI design falling well below the board’s maximum frequency. Multiple methods of improving the clock speeds were attempted but were ultimately unsuccessful. OneAPI was able to achieve speedup over the CPU using the FPGA equipped with the on-chip HBM2, but there is the potential for additional performance improvement with higher FPGA clock speeds.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Lucas, Owen
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorGeorge, Alan D.alan.george@pitt.eduadg910000-0001-9665-2879
Committee MemberZhou, Peipeipeipei.zhou@pitt.edupez410000-0002-0493-1844
Committee MemberDickerson, Samueldickerson@pitt.edusjd310000-0003-2281-5115
Date: 6 September 2024
Date Type: Publication
Defense Date: 30 July 2024
Approval Date: 6 September 2024
Submission Date: 25 June 2024
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 61
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: Field programmable gate arrays, bioinformatics, high level synthesis, hardware acceleration, design tools, memory management, reconfigurable architectures
Date Deposited: 06 Sep 2024 19:56
Last Modified: 06 Sep 2024 19:56
URI: http://d-scholarship.pitt.edu/id/eprint/46625

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item