Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Evaluating the efficacy of Prosody-lab Aligner for a study of vowel variation in Cantonese

Peters, Andrew and Tse, Holman (2016) Evaluating the efficacy of Prosody-lab Aligner for a study of vowel variation in Cantonese. In: Workshop on Innovations in Cantonese Linguistics (WICL-3), 12 March 2016 - 13 March 2016, Columbus, United States of America.

Available under License : See the attached license file.

Download (5MB)
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)


In this talk, we discuss the effectiveness of using Prosody-lab Aligner (Gorman, Howell, & Wagner, 2011) as a tool for the study of vowel variation and change in Cantonese. Automated (Forced-)aligner programs have recently been introduced as a computational tool for facilitating the process of creating time-aligned transcripts of speech data. The most widely used program, FAVE (Rosenfelder, Fruehwald, Evanini, & Yuan, 2011), however, is designed to work only on English. We therefore use Prosody-lab Aligner (Gorman et al., 2011) as an alternative because of its ability to train models for alignment of any language. Speech samples used in this project come from sociolinguistic interviews that were collected as part of the Heritage Language Variation and Change in Toronto (HLVC) project (Nagy, 2011). We investigate two questions for evaluating the efficacy of this methodology for use in a larger project on intergenerational change in Heritage Cantonese vowels: 1) Is Prosody-lab aligner effective at producing sufficiently accurate transcript alignment to permit automated measurement of vowel data? 2) What sort of data used to train models for Prosodylab-aligner is most effective at producing results that require minimal manual adjustments? We address these questions by running Prosodylab on 10 speakers, including four GEN 1 speakers (born and raised in Hong Kong), and six GEN 2 speakers (raised in Toronto). For each speaker, 50% of their transcript data was set aside for model training, and on each speaker the aligner was run using 3 different models: once with data from that speaker alone in the model training, once with data from all speakers in the respective generation used in model training, and a final time with data from all speakers used in model training. The three types of model training were compared for their efficacy quantitatively by measuring the differential between the automatically-generated boundaries of 468 monophthong vowel tokens, and “gold-standard” manually-aligned vowel boundaries for the same vowels. On this data, the root-meansquare-deviation was calculated for the time-aligned results of each model type (Chen, Liu, Harper, Maia, & McRoy, 2004). The percentage of occurrences in which the center of the automatically-aligned vowel segment lay within the manually-aligned target vowel area was also calculated for each instance. Our results show that models trained on the individual speakers alone produced the least-deviant data from the ideal manually aligned vowel targets, and the model trained on data from all speakers produced the most deviant results. However, as requirements on a minimum amount of data to be made available for model training would necessitate up to 50% loss in analyzable vowel tokens if taken from one speaker’s interview alone, an individual based training model is rejected as impractical. A model trained on data from the respective generational cohort is accepted as the best compromise that produces results requiring the least manual adjustment post-alignment, without sacrificing large amounts of data to model training.


Social Networking:
Share |


Item Type: Conference or Workshop Item (Paper)
CreatorsEmailPitt UsernameORCID
Peters, Andrew
Tse, Holmanhbt3@pitt.eduHBT30000-0002-2398-5776
Date: 12 March 2016
Date Type: Publication
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Event Title: Workshop on Innovations in Cantonese Linguistics (WICL-3)
Event Dates: 12 March 2016 - 13 March 2016
Event Type: Conference
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Linguistics
Refereed: Yes
Official URL:
Related URLs:
Additional Information: Full location: The Ohio State University, Columbus, OH
Date Deposited: 16 Mar 2016 18:45
Last Modified: 18 Jun 2019 05:55


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item