Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

IMPROVING THE AUTOMATIC RECOGNITION OF DISTORTED SPEECH

Beauford, Jayne Angela (2010) IMPROVING THE AUTOMATIC RECOGNITION OF DISTORTED SPEECH. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Primary Text

Download (1MB) | Preview

Abstract

Automatic speech recognition has a wide variety of uses in this technological age, yet speech distortions present many difficulties for accurate recognition. The research presented provides solutions that counter the detrimental effects that some distortions have on the accuracy of automatic speech recognition. Two types of speech distortions are focused on independently. They are distortions due to speech coding and distortions due to additive noise. Compensations for both types of distortion resulted in decreased recognition error.Distortions due to the speech coding process are countered through recognition of the speech directly from the bitstream, thus eliminating the need for reconstruction of the speech signal and eliminating the distortion caused by it. There is a relative difference of 6.7% between the recognition error rate of uncoded speech and that of speech reconstructed from MELP encoded parameters. The relative difference between the recognition error rate for uncoded speech and that of encoded speech recognized directly from the MELP bitstream is 3.5%. This 3.2 percentage point difference is equivalent to the accurate recognition of an additional 334 words from the 12,863 words spoken.Distortions due to noise are offset through appropriate modification of an existing noise reduction technique called minimum mean-square error log spectral amplitude enhancement. A relative difference of 28% exists between the recognition error rate of clean speech and that of speech with additive noise. Applying a speech enhancement front-end reduced this difference to 22.2%. This 5.8 percentage point difference is equivalent to the accurate recognition of an additional 540 words from the 12,863 words spoken.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Beauford, Jayne Angelaaxbst2@pitt.eduAXBST2
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairEl-Jaroudi, Amroamro@pitt.eduAMRO
Committee MemberLennard, Christopherlennard@pitt.eduLENNARD
Committee MemberChaparro, Luislfch@pitt.eduLFCH
Committee MemberLoughlin, Patrickloughlin@pitt.eduLOUGHLIN
Committee MemberBoston , Robertbbn@pitt.eduBBN
Committee MemberMao, Zhi-Hongzhm4@pitt.eduZHM4
Date: 26 January 2010
Date Type: Completion
Defense Date: 11 September 2009
Approval Date: 26 January 2010
Submission Date: 20 November 2009
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: linear prediction; mixed-excitation linear prediction; speech analysis; speech processing
Other ID: http://etd.library.pitt.edu/ETD/available/etd-11202009-232904/, etd-11202009-232904
Date Deposited: 10 Nov 2011 20:05
Last Modified: 15 Nov 2016 13:51
URI: http://d-scholarship.pitt.edu/id/eprint/9742

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item