Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Speech Enhancement using Transient Speech Components

Tantibundhit, Charturong (2006) Speech Enhancement using Transient Speech Components. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF (Speech Enhancement Using Transient Speech Components)
Primary Text

Download (6MB) | Preview
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (13kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (8kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (11kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (17kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)
[img] Audio (WAV)
Supplemental Material

Download (10kB)

Abstract

We believe that the auditory system, like the visual system, may besensitive to abrupt stimulus changes and the transient component inspeech may be particularly critical to speech perception. If thiscomponent can be identified and selectively amplified, improvedspeech perception in background noise may be possible.This project describes a method to decompose speech into tonal,transient, and residual components. The modified discrete cosinetransform (MDCT) and the wavelet transform are transforms used tocapture tonal and transient features in speech. The tonal andtransient components were identified by using a small number of MDCTand wavelet coefficients, respectively. In previous studies, all ofthe MDCT and all of the wavelet coefficients were assumed to beindependent, and identifications of the significant MDCT and thesignificant wavelet coefficients were achieved by thresholds.However, an appropriate threshold is not known and the MDCT and thewavelet coefficients show statistical dependencies, described by theclustering and persistence properties.In this work, the hidden Markov chain (HMC) model and the hiddenMarkov tree (HMT) model were applied to describe the clustering andpersistence properties between the MDCT coefficients and between thewavelet coefficients. The MDCT coefficients in each frequency indexwere modeled as a two-state mixture of two univariate Gaussiandistributions. The wavelet coefficients in each scale of each treewere modeled as a two-state mixture of two univariate Gaussiandistributions. The initial parameters of Gaussian mixtures wereestimated by the greedy EM algorithm. By utilizing the Viterbi andthe MAP algorithms used to find the optimal state distribution, thesignificant MDCT and the significant wavelet coefficients weredetermined without relying on a threshold.The transient component isolated by our method was selectivelyamplified and recombined with the original speech to generateenhanced speech, with energy adjusted to equal to the energy of theoriginal speech. The intelligibility of the original and enhancedspeech was evaluated in eleven human subjects using the modifiedrhyme protocol. Word recognition rate results show that theenhanced speech can improve speech intelligibility at low SNR levels(8% at -15 dB, 14% at -20dB, and 18% at -25 dB).


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Tantibundhit, Charturongcharturt@yahoo.com
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBoston, J. Robertboston@ee.pitt.eduBBN
Committee MemberEl-jaroudi, Amro A.amro@ee.pitt.eduAMRO
Committee MemberLi, Ching-Chungccl@engr.pitt.eduCCL
Committee MemberLee, Heung-nohnlee@ee.pitt.edu
Committee MemberDurrant, John D.durrant@csd.pitt.eduDURRANT
Date: 2 June 2006
Date Type: Completion
Defense Date: 27 January 2006
Approval Date: 2 June 2006
Submission Date: 10 February 2006
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: hidden Markov model; speech enhancement; statistical dependencies; the wavelet transform; the modified discrete cosine transform (MDCT); transient speech
Other ID: http://etd.library.pitt.edu/ETD/available/etd-02102006-142400/, etd-02102006-142400
Date Deposited: 10 Nov 2011 19:31
Last Modified: 15 Nov 2016 13:36
URI: http://d-scholarship.pitt.edu/id/eprint/6343

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item