Dognin, Pierre L.
(2003)
A Bandpass Transform for Speaker Normalization.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
One of the major challenges for Automatic Speech Recognition is to handle speech variability. Inter-speaker variability is partly due to differences in speakers' anatomy and especially in their Vocal Tract geometry. Dissimilarities in Vocal Tract Length (VTL) are a known source of speech variation. Vocal Tract Length Normalization is a popular Speaker Normalization technique that can be implemented as a transformation of a spectrum frequency axis. We introduce in this document a new spectral transformation for Speaker Normalization. We use the Bilinear Transformation to introduce a new frequency warping resulting from a mapping of a prototype Band-Pass (BP) filter into a general BP filter. This new transformation called the Bandpass Transformation (BPT) offers two degrees of freedom enabling complex warpings of the frequency axis that are different from previous works with the Bilinear Transform. We then define a procedure to use BPT for Speaker Normalization based on the Nelder-Mead algorithm for the estimation of the BPT parameters. We present a detailed study of the performance of our new approach on two test sets with gender dependent and independent systems. Our results demonstrate clear improvements compared to standard methods used in VTL Normalization. A score compensation procedure is presented and results in further improvements of our results by refining our BPT parameter estimation.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
3 September 2003 |
Date Type: |
Completion |
Defense Date: |
23 July 2003 |
Approval Date: |
3 September 2003 |
Submission Date: |
6 August 2003 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Analytical Function; Automatic Speech Recognition; Bilinear Transformation; Feature Transformation; Frequency Warping; Front End Processing; Model Adaptation; Nelder-Mead Optimization; Non-Linear Transformation; Speaker Normalization; Vocal Tract Length Normalization |
Other ID: |
http://etd.library.pitt.edu:80/ETD/available/etd-08062003-112127/, etd-08062003-112127 |
Date Deposited: |
10 Nov 2011 19:57 |
Last Modified: |
15 Nov 2016 13:48 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/8926 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |