Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A Bandpass Transform for Speaker Normalization

Dognin, Pierre L. (2003) A Bandpass Transform for Speaker Normalization. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Primary Text

Download (778kB) | Preview

Abstract

One of the major challenges for Automatic Speech Recognition is to handle speech variability. Inter-speaker variability is partly due to differences in speakers' anatomy and especially in their Vocal Tract geometry. Dissimilarities in Vocal Tract Length (VTL) are a known source of speech variation. Vocal Tract Length Normalization is a popular Speaker Normalization technique that can be implemented as a transformation of a spectrum frequency axis. We introduce in this document a new spectral transformation for Speaker Normalization. We use the Bilinear Transformation to introduce a new frequency warping resulting from a mapping of a prototype Band-Pass (BP) filter into a general BP filter. This new transformation called the Bandpass Transformation (BPT) offers two degrees of freedom enabling complex warpings of the frequency axis that are different from previous works with the Bilinear Transform. We then define a procedure to use BPT for Speaker Normalization based on the Nelder-Mead algorithm for the estimation of the BPT parameters. We present a detailed study of the performance of our new approach on two test sets with gender dependent and independent systems. Our results demonstrate clear improvements compared to standard methods used in VTL Normalization. A score compensation procedure is presented and results in further improvements of our results by refining our BPT parameter estimation.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Dognin, Pierre L.dognin@siglab.ee.pitt.edu
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairEl-Jaroudi, Amro Aamro@pitt.eduAMRO
Committee MemberLi, Ching-Chungccl@ee.pitt.eduCCL
Committee MemberBoston, J. Robertboston@ee.pitt.eduBBN
Committee MemberChaparro, Luis Fchaparro@ee.pitt.eduLFCH
Committee MemberAnitescu, Mihaianitescu@mcs.anl.gov
Committee Member,
Date: 3 September 2003
Date Type: Completion
Defense Date: 23 July 2003
Approval Date: 3 September 2003
Submission Date: 6 August 2003
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Analytical Function; Automatic Speech Recognition; Bilinear Transformation; Feature Transformation; Frequency Warping; Front End Processing; Model Adaptation; Nelder-Mead Optimization; Non-Linear Transformation; Speaker Normalization; Vocal Tract Length Normalization
Other ID: http://etd.library.pitt.edu:80/ETD/available/etd-08062003-112127/, etd-08062003-112127
Date Deposited: 10 Nov 2011 19:57
Last Modified: 15 Nov 2016 13:48
URI: http://d-scholarship.pitt.edu/id/eprint/8926

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item