Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

From categories to gradience: Auto-coding sociophonetic variation with random forests

Villarreal, Dan and Clark, Lynn and Hay, Jennifer and Watson, Kevin (2020) From categories to gradience: Auto-coding sociophonetic variation with random forests. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 11 (1). p. 6. ISSN 1868-6354

Published Version

Download (2MB) | Preview


The time-consuming nature of coding sociophonetic variables that are typically treated as categorical represents an impediment to addressing research questions around these variables that require large volumes of data. In this paper, we apply a machine learning method, random forest classification (Breiman, 2001), to automate coding (categorical prediction) of two English sociophonetic variables traditionally treated as categorical, non-prevocalic /r/ and word-medial intervocalic /t/, based on tokens’ acoustic signatures. We found good performance for binary classifiers of non-prevocalic /r/ (Absent versus Present) and medial /t/ (Voiced versus Voiceless), but not for medial /t/ with a six-way coding distinction (largely due to some codes being sparsely represented in the training data). This method also yields rankings of acoustic measures in terms of importance in classification. Beyond any individual measures, this method generates probabilistic predictions of variation (classifier probabilities) that represent a composite of the acoustic cues fed into the model. In a listening experiment, we found that not only did classifier probabilities significantly capture gradience in trained listeners’ perceptions of rhoticity, they better predicted listeners’ perceptions than individual acoustic measures. This method thus represents a new approach to reconciling the categorical and continuous dimensions of sociophonetic variation.


Social Networking:
Share |


Item Type: Article
Status: Published
CreatorsEmailPitt UsernameORCID
Villarreal, Dand.vill@pitt.edud.vill
Clark, Lynnlclarke@pitt.edulclarke
Hay, Jennifer
Watson, Kevin
Date: 10 June 2020
Date Type: Publication
Journal or Publication Title: Laboratory Phonology: Journal of the Association for Laboratory Phonology
Volume: 11
Number: 1
Publisher: Ubiquity Press
Page Range: p. 6
DOI or Unique Handle: 10.5334/labphon.216
Schools and Programs: Dietrich School of Arts and Sciences > Linguistics
Refereed: Yes
Uncontrolled Keywords: sociophonetic variation, machine learning, rhoticity, new zealand english
ISSN: 1868-6354
Official URL:
Funders: Royal Society of New Zealand Marsden Research
Article Type: Research Article
Date Deposited: 27 Apr 2021 15:03
Last Modified: 27 Apr 2021 15:03


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item