Beauford, Jayne Angela
(2010)
IMPROVING THE AUTOMATIC RECOGNITION OF DISTORTED SPEECH.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Automatic speech recognition has a wide variety of uses in this technological age, yet speech distortions present many difficulties for accurate recognition. The research presented provides solutions that counter the detrimental effects that some distortions have on the accuracy of automatic speech recognition. Two types of speech distortions are focused on independently. They are distortions due to speech coding and distortions due to additive noise. Compensations for both types of distortion resulted in decreased recognition error.Distortions due to the speech coding process are countered through recognition of the speech directly from the bitstream, thus eliminating the need for reconstruction of the speech signal and eliminating the distortion caused by it. There is a relative difference of 6.7% between the recognition error rate of uncoded speech and that of speech reconstructed from MELP encoded parameters. The relative difference between the recognition error rate for uncoded speech and that of encoded speech recognized directly from the MELP bitstream is 3.5%. This 3.2 percentage point difference is equivalent to the accurate recognition of an additional 334 words from the 12,863 words spoken.Distortions due to noise are offset through appropriate modification of an existing noise reduction technique called minimum mean-square error log spectral amplitude enhancement. A relative difference of 28% exists between the recognition error rate of clean speech and that of speech with additive noise. Applying a speech enhancement front-end reduced this difference to 22.2%. This 5.8 percentage point difference is equivalent to the accurate recognition of an additional 540 words from the 12,863 words spoken.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID  |
---|
Beauford, Jayne Angela | axbst2@pitt.edu | AXBST2 | |
|
ETD Committee: |
|
Date: |
26 January 2010 |
Date Type: |
Completion |
Defense Date: |
11 September 2009 |
Approval Date: |
26 January 2010 |
Submission Date: |
20 November 2009 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
linear prediction; mixed-excitation linear prediction; speech analysis; speech processing |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-11202009-232904/, etd-11202009-232904 |
Date Deposited: |
10 Nov 2011 20:05 |
Last Modified: |
15 Nov 2016 13:51 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/9742 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |