CLASSIFICATION OF VISEMES USING VISUAL CUES

Alothmany, Nazeeh (2009) CLASSIFICATION OF VISEMES USING VISUAL CUES. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Primary Text
Download (1MB) | Preview

Abstract

Studies have shown that visual features extracted from the lips of a speaker (visemes) can be used to automatically classify the visual representation of phonemes. Different visual features were extracted from the audio-visual recordings of a set of phonemes and used to define Linear Discriminant Analysis (LDA) functions to classify the phonemes. . Audio-visual recordings from 18 speakers of Native American English for 12 Vowel-Consonant-Vowel (VCV) sounds were obtained using the consonants /b,v,w,ð,d,z/ and the vowels /ɑ,i/. The visual features used in this study were related to the lip height, lip width, motion in upper lips and the rate at which lips move while producing the VCV sequences. Features extracted from half of the speakers were used to design the classifier and features extracted from the other half were used in testing the classifiers.When each VCV sound was treated as an independent class, resulting in 12 classes, the percentage of correct recognition was 55.3% in the training set and 43.1% in the testing set. This percentage increased as classes were merged based on the level of confusion appearing between them in the results. When the same consonants with different vowels were treated as one class, resulting in 6 classes, the percentage of correct classification was 65.2% in the training set and 61.6% in the testing set. This is consistent with psycho-visual experiments in which subjects were unable to distinguish between visemes associated with VCV words with the same consonant but different vowels. When the VCV sounds were grouped into 3 classes, the percentage of correct classification in the training set was 84.4% and 81.1% in the testing set.In the second part of the study, linear discriminant functions were developed for every speaker resulting in 18 different sets of LDA functions. For every speaker, five VCV utterances were used to design the LDA functions, and 3 different VCV utterances were used to test these functions. For the training data, the range of correct classification for the 18 speakers was 90-100% with an average of 96.2%. For the testing data, the range of correct classification was 50-86% with an average of 68%.A step-wise linear discriminant analysis evaluated the contribution of different features towards the dissemination problem. The analysis indicated that classifiers using only the top 7 features in the analysis had a performance drop of 2-5%. The top 7 features were related to the shape of the mouth and the rate of motion of lips when the consonant in the VCV sequence was being produced. Results of this work showed that visual features extracted from the lips can separate the visual representation of phonemes into different classes.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Alothmany, Nazeeh	nothmany@gmail.com

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Boston, Robert	boston@ee.pitt.edu	BBN
Committee Member	El-Jaroudi, Amro	amro@ee.pitt.edu	AMRO
Committee Member	Li, Ching-Chung	ccl@engr.pitt.edu	CCL
Committee Member	Durrant, John D.	durrant@pitt.edu	DURRANT
Committee Member	Chaparro, Luis F.	chaparro@ee.pitt.edu	LFCH
Committee Member	Shaiman, Susan	shaiman@pitt.edu	SHAIMAN

Date:

25 September 2009

Date Type:

Completion

Defense Date:

17 April 2009

Approval Date:

25 September 2009

Submission Date:

26 May 2009

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Electrical Engineering

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

AUTOMATIC LIP-READING; VISEMES; AUDIO-VISUAL CLASSIFICATION; CLASSIFICATION OF VISEMES

Other ID:

http://etd.library.pitt.edu/ETD/available/etd-05262009-085949/, etd-05262009-085949

Date Deposited:

10 Nov 2011 19:45

Last Modified:

15 Nov 2016 13:43

URI:

http://d-scholarship.pitt.edu/id/eprint/7955

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

CLASSIFICATION OF VISEMES USING VISUAL CUES

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds