Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

CLASSIFICATION OF VISEMES USING VISUAL CUES

Alothmany, Nazeeh (2009) CLASSIFICATION OF VISEMES USING VISUAL CUES. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Primary Text

Download (1MB) | Preview

Abstract

Studies have shown that visual features extracted from the lips of a speaker (visemes) can be used to automatically classify the visual representation of phonemes. Different visual features were extracted from the audio-visual recordings of a set of phonemes and used to define Linear Discriminant Analysis (LDA) functions to classify the phonemes. . Audio-visual recordings from 18 speakers of Native American English for 12 Vowel-Consonant-Vowel (VCV) sounds were obtained using the consonants /b,v,w,ð,d,z/ and the vowels /ɑ,i/. The visual features used in this study were related to the lip height, lip width, motion in upper lips and the rate at which lips move while producing the VCV sequences. Features extracted from half of the speakers were used to design the classifier and features extracted from the other half were used in testing the classifiers.When each VCV sound was treated as an independent class, resulting in 12 classes, the percentage of correct recognition was 55.3% in the training set and 43.1% in the testing set. This percentage increased as classes were merged based on the level of confusion appearing between them in the results. When the same consonants with different vowels were treated as one class, resulting in 6 classes, the percentage of correct classification was 65.2% in the training set and 61.6% in the testing set. This is consistent with psycho-visual experiments in which subjects were unable to distinguish between visemes associated with VCV words with the same consonant but different vowels. When the VCV sounds were grouped into 3 classes, the percentage of correct classification in the training set was 84.4% and 81.1% in the testing set.In the second part of the study, linear discriminant functions were developed for every speaker resulting in 18 different sets of LDA functions. For every speaker, five VCV utterances were used to design the LDA functions, and 3 different VCV utterances were used to test these functions. For the training data, the range of correct classification for the 18 speakers was 90-100% with an average of 96.2%. For the testing data, the range of correct classification was 50-86% with an average of 68%.A step-wise linear discriminant analysis evaluated the contribution of different features towards the dissemination problem. The analysis indicated that classifiers using only the top 7 features in the analysis had a performance drop of 2-5%. The top 7 features were related to the shape of the mouth and the rate of motion of lips when the consonant in the VCV sequence was being produced. Results of this work showed that visual features extracted from the lips can separate the visual representation of phonemes into different classes.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Alothmany, Nazeehnothmany@gmail.com
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBoston, Robertboston@ee.pitt.eduBBN
Committee MemberEl-Jaroudi, Amroamro@ee.pitt.eduAMRO
Committee MemberLi, Ching-Chungccl@engr.pitt.eduCCL
Committee MemberDurrant, John D.durrant@pitt.eduDURRANT
Committee MemberChaparro, Luis F.chaparro@ee.pitt.eduLFCH
Committee MemberShaiman, Susanshaiman@pitt.eduSHAIMAN
Date: 25 September 2009
Date Type: Completion
Defense Date: 17 April 2009
Approval Date: 25 September 2009
Submission Date: 26 May 2009
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: AUTOMATIC LIP-READING; VISEMES; AUDIO-VISUAL CLASSIFICATION; CLASSIFICATION OF VISEMES
Other ID: http://etd.library.pitt.edu/ETD/available/etd-05262009-085949/, etd-05262009-085949
Date Deposited: 10 Nov 2011 19:45
Last Modified: 15 Nov 2016 13:43
URI: http://d-scholarship.pitt.edu/id/eprint/7955

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item