Schubert, Marika E.
(2021)
Benchmarking Transformer-Based Transcription on Embedded GPUs for Space Applications.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
Speech transcription is a necessary tool for backend applications commonly found in voice assistants. Transcription is typically performed using cloud-based servers or custom hardware, but those resources are not always amenable to space environments due to size, weight, power, and cost constraints. Therefore, it is important to determine the performance of and optimal conditions for running transcription on hardware that is feasible for deployment in a space application. This research investigates and evaluates the performance of an optimized version of the wav2vec2 speech transcription engine, the current state-of-the-art model for this domain. The target hardware, the NVIDIA Xavier NX Jetson embedded GPU, was chosen for its modern GPU architecture and small form factor. In addition to examining the input scaling behavior, we evaluate the hyperparameters of the clustered attention optimization, and average power and energy for inference relative to the operating power mode of the device. The clustered attention model outperformed the improved-clustered model for large input sizes, but the original wav2vec2 performed better for small input sizes. The clustered model energy per inference (13.90 J) was less than energy per inference of the improved-cluster model (15.03 J) and the vanilla model (15.85 J). All models meet real-time speech processing requirements necessary to perform onboard inference entirely on a space system.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
13 June 2021 |
Date Type: |
Publication |
Defense Date: |
31 March 2021 |
Approval Date: |
13 June 2021 |
Submission Date: |
18 March 2021 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
37 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
Automatic speech recognition, benchmarking, GPU, machine learning, optimization, parallel processing |
Date Deposited: |
13 Jun 2021 18:38 |
Last Modified: |
13 Jun 2023 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/40387 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |