Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Speech Decomposition and Enhancement

Yoo, Sungyub (2005) Speech Decomposition and Enhancement. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (3MB) | Preview


The goal of this study is to investigate the roles of steady-state speech sounds and transitions between these sounds in the intelligibility of speech. The motivation for this approach is that the auditory system may be particularly sensitive to time-varying frequency edges, which in speech are produced primarily by transitions between vowels and consonants and within vowels. The possibility that selectively amplifying these edges may enhance speech intelligibility is examined. Computer algorithms to decompose speech into two different components were developed. One component, which is defined as a tonal component, was intended to predominately include formant activity. The second component, which is defined as a non-tonal component, was intended to predominately include transitions between and within formants.The approach to the decomposition is to use a set of time-varying filters whose center frequencies and bandwidths are controlled to identify the strongest formant components in speech. Each center frequency and bandwidth is estimated based on FM and AM information of each formant component. The tonal component is composed of the sum of the filter outputs. The non-tonal component is defined as the difference between the original speech signal and the tonal component.The relative energy and intelligibility of the tonal and non-tonal components were compared to the original speech. Psychoacoustic growth functions were used to assess the intelligibility. Most of the speech energy was in the tonal component, but this component had a significantly lower maximum word recognition than the original and non-tonal component had. The non-tonal component averaged 2% of the original speech energy, but this component had almost equal maximum word recognition as the original speech. The non-tonal component was amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to the original speech, and the intelligibility of the enhanced speech was compared to the original speech in background noise. The enhanced speech showed higher recognition scores at lower SNRs, and the differences were significant. The original and enhanced speech showed similar recognition scores at higher SNRs. These results suggest that amplification of transient information can enhance the speech in noise and this enhancement method is more effective at severe noise conditions.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Yoo, Sungyubsungyoo@pitt.eduSUNGYOO
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBoston, J. Robertboston@engr.pitt.eduBBN
Committee MemberEl-Jaroudi, Amro A.amro@ee.pitt.eduAMRO
Committee MemberLi, Ching-Chungccl@engr.pitt.eduCCL
Committee MemberLee,
Committee CoChairDurrant, John D.durrant@csd.pitt.eduDURRANT
Date: 14 October 2005
Date Type: Completion
Defense Date: 29 June 2005
Approval Date: 14 October 2005
Submission Date: 1 July 2005
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: format; non-tonal; speech; speech decomposition; speech enhancement; tonal; transition
Other ID:, etd-07012005-135056
Date Deposited: 10 Nov 2011 19:49
Last Modified: 15 Nov 2016 13:45


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item