Schuster, Jeffrey William
(2006)
A Vectorized Processing Algorithm for Continuous Speech Recognition and Associated FPGA-Based Architecture.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
This work analyzes Continuous Automatic Speech Recognition (CSR) and in contrast to prior work, it shows that the CSR algorithms can be specified in a highly parallel form. Through use of the MATLAB software package, the parallelism is exploited to create a compact, vectorized algorithm that is able to execute the CSR task. After an in-depth analysis of the SPHINX 3 Large Vocabulary Continuous Speech Recognition (LVCSR) engine the major functional units were redesigned in the MATLAB environment, taking special effort to flatten the algorithms and restructure the data to allow for matrix-based computations. Performing this conversion resulted in reducing the original 14,000 lines of C++ code into less then 200 lines of highly-vectorized operations, substantially increasing the potential Instruction Line Parallelism of the system. Using this vector model as a baseline, a custom hardware system was then created that is capable of performing the speech recognition task in real-time on a Xilinx Virtex-4 FPGA device. Through the creation independent hardware engines for each stage of the speech recognition process, the throughput of each is maximized by customizing the logic to the specific task. Further, a unique architecture was designed that allows for the creation of a static data path throughout the hardware, effectively removing the need for complex bus arbitration in the system. By making using of shared memory resources and applying a token passing scheme to the system, both the data movement within the design as well as the amount of active data are continually minimized during run-time. These results provide a novel method for perform speech recognition in both hardware and software, helping to further the development of systems capable of recognizing human speech.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID  |
---|
Schuster, Jeffrey William | jws52@pitt.edu | JWS52 | |
|
ETD Committee: |
|
Date: |
27 September 2006 |
Date Type: |
Completion |
Defense Date: |
29 April 2006 |
Approval Date: |
27 September 2006 |
Submission Date: |
10 April 2006 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical Engineering |
Degree: |
MSEE - Master of Science in Electrical Engineering |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
; acoustic modeling; Gaussian distributions; hidden markov models; MATLAB; phoneme evaluation; speech recognition |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-04102006-155357/, etd-04102006-155357 |
Date Deposited: |
10 Nov 2011 19:35 |
Last Modified: |
15 Nov 2016 13:39 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/6949 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |