Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A Vectorized Processing Algorithm for Continuous Speech Recognition and Associated FPGA-Based Architecture

Schuster, Jeffrey William (2006) A Vectorized Processing Algorithm for Continuous Speech Recognition and Associated FPGA-Based Architecture. Master's Thesis, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Primary Text

Download (2MB) | Preview

Abstract

This work analyzes Continuous Automatic Speech Recognition (CSR) and in contrast to prior work, it shows that the CSR algorithms can be specified in a highly parallel form. Through use of the MATLAB software package, the parallelism is exploited to create a compact, vectorized algorithm that is able to execute the CSR task. After an in-depth analysis of the SPHINX 3 Large Vocabulary Continuous Speech Recognition (LVCSR) engine the major functional units were redesigned in the MATLAB environment, taking special effort to flatten the algorithms and restructure the data to allow for matrix-based computations. Performing this conversion resulted in reducing the original 14,000 lines of C++ code into less then 200 lines of highly-vectorized operations, substantially increasing the potential Instruction Line Parallelism of the system. Using this vector model as a baseline, a custom hardware system was then created that is capable of performing the speech recognition task in real-time on a Xilinx Virtex-4 FPGA device. Through the creation independent hardware engines for each stage of the speech recognition process, the throughput of each is maximized by customizing the logic to the specific task. Further, a unique architecture was designed that allows for the creation of a static data path throughout the hardware, effectively removing the need for complex bus arbitration in the system. By making using of shared memory resources and applying a token passing scheme to the system, both the data movement within the design as well as the amount of active data are continually minimized during run-time. These results provide a novel method for perform speech recognition in both hardware and software, helping to further the development of systems capable of recognizing human speech.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Schuster, Jeffrey Williamjws52@pitt.eduJWS52
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHoare, Raymond Rhoare@engr.pitt.edu
Committee MemberJones, Alex Kakj8@pitt.eduAKJ8
Committee MemberLevitan, Steven Psteve@ee.pitt.eduLEVITAN
Date: 27 September 2006
Date Type: Completion
Defense Date: 29 April 2006
Approval Date: 27 September 2006
Submission Date: 10 April 2006
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: MSEE - Master of Science in Electrical Engineering
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: ; acoustic modeling; Gaussian distributions; hidden markov models; MATLAB; phoneme evaluation; speech recognition
Other ID: http://etd.library.pitt.edu/ETD/available/etd-04102006-155357/, etd-04102006-155357
Date Deposited: 10 Nov 2011 19:35
Last Modified: 15 Nov 2016 13:39
URI: http://d-scholarship.pitt.edu/id/eprint/6949

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item