A Temporal Network of Support Vector Machines for the Recognition of Visual Speech Mihaela Gordan *, Constantine Kotropoulos **, Ioannis Pitas ** * Faculty.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
ECG Signal processing (2)
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.

Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
x – independent variable (input)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Reduced Support Vector Machine
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Aristotle University of Thessaloniki, Department of Informatics A. Tefas, C. Kotropoulos, I. Pitas A RISTOTLE U NIVERSITY OF T HESSALONIKI D EPARTMENT.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
This week: overview on pattern recognition (related to machine learning)
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Software Release and Support.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Multimodal Information Analysis for Emotion Recognition
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
An Introduction to Support Vector Machine (SVM)
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Support Vector Machine
Support Vector Machines Introduction to Data Mining, 2nd Edition by
The following slides are taken from:
Presentation transcript:

A Temporal Network of Support Vector Machines for the Recognition of Visual Speech Mihaela Gordan *, Constantine Kotropoulos **, Ioannis Pitas ** * Faculty of Electronics and Telecommunications Technical University of Cluj-Napoca 15 C. Daicoviciu, 3400 Cluj-Napoca, Romania ** Department of Informatics, Aristotle University of Thessaloniki Artificial Intelligence and Information Analysis Laboratory GR Thessaloniki Box 451, Greece This work was supported by the European Union Research Training Network ``Multi-modal Human-Computer Interaction (HPRN-CT )'' Department of Informatics Aristotle University of Thessaloniki

Brief Overview Visual speech recognition (lipreading): important component of audiovisual speech recognition systems; emerging research field. Support vector machines (SVMs): powerful classifiers for various visual classification tasks (face recognition; medical image processing; object tracking)  Goal of this work: to examine the suitability of using SVMs for visual speech recognition, by developing an SVM-based visual speech recognition system.  In brief: we use SVMs for viseme recognition & & integrate them as nodes in a Viterbi decoding lattice The good results: slightly higher WRR for very simple input features; possibility of easy generalization to larger vocabulary tasks, encourage the continuation of research. Department of Informatics Aristotle University of Thessaloniki

Contents 1. State of the art & research trends 2. Principles of the proposed visual speech recognition approach 3. SVMs and their use for mouth shape recognition 4. Modeling the temporal dynamics of visual speech 5. Block diagram of the proposed visual speech recognition system 6. Experimental results 7. Conclusions Department of Informatics Aristotle University of Thessaloniki

1. State of the art & research trends Visual speech recognition = recognize the spoken words based on visual examination of speaker’s face only, mainly mouth area. State of the art for visual speech recognition: many methods reported, very different in respect to: the feature types (lip contour coordinates, GLDP, gray levels of mouth image); the classifier used (TDNN, HMM); the class definition. Active research trends in the area: Find the most suitable features and classification techniques for efficient discrimination between different mouth shapes, individual-independent Reduce the required processing of the mouth image to increase the speed; Find solutions to facilitate easy integration of audio and visual recognizer. Use of SVMs in speech recognition: recently employed in audio speech recognition with very good results; no attempts in visual speech recognition. Department of Informatics Aristotle University of Thessaloniki

“o”“f” 2. Principles of the proposed visual speech recognition approach - I Visemes = basic units of visual speech  basic shapes of the mouth during speech production. Discrimination between visemes  pattern recognition problem: Feature vector = a representation of the mouth image (e.g. at pixel level: gray levels of the pixels in the mouth image scanned in raw order); Pattern classes = the different visemes (mouth shapes) during the pronunciation of the words from the dictionary. Department of Informatics Aristotle University of Thessaloniki

The proposed strategy: Having a given visual speech recognition task (i.e. a given dictionary of words), 1.Find the phonetic description of each word; 2.Derive the viseme-to- phoneme mapping according to the application (will be one-to-many, due to the involvement of non-visible parts of vocal tract in speech production & dependent to the nationality of the speaker; no universal viseme-to-phoneme mapping currently available); 3.Use the phonetic words descriptions and the viseme-to-phoneme mapping to derive visemic words descriptions (  visemic models = sequences of mouth shapes that could produce the phonetic word realization). 2. Principles of the proposed visual speech recognition approach - II Department of Informatics Aristotle University of Thessaloniki

2. Principles of the proposed visual speech recognition approach - III Department of Informatics Aristotle University of Thessaloniki Viseme-to-phoneme mapping Phonetic and visemic word description models

3. SVMs and their use for mouth shape recognition - I Department of Informatics Aristotle University of Thessaloniki SVMs = statistical learning classifiers based on optimal hyperplane algorithm: Minimize a bound on the empirical error & the complexity of the classifier Capable of learning in sparse high-dimensional spaces with few training examples. Classical SVMs solve 2-class pattern recognition problems: = training examples; = M-dimensional pattern - indicates if example i is a negative / positive example Linear SVMs: the data to be classified are separable in their original domain

3. SVMs and their use for mouth shape recognition - II Department of Informatics Aristotle University of Thessaloniki · = Nonlinear SVMs: the data to be classified are not separable in their original domain ð We project the data in a higher dimensional Hilbert space, , where the data are linearly separable, via the nonlinear mapping and express the dot product of the data by a kernel function: ðthe decision function of the SVM classifier is: where: = the non-negative Lagrange multipliers associated with the QP aiming to maximize the distance between classes and the separating hyperplane;, = hyperplane’s parameters.

The real valued output function of the SVM gives the degree of confidence in the class assignment. SVM = binary classifier  need to train one SVM for each mouth shape (viseme). The features used: the gray levels of pixels in the mouth image scanned in raw order. The set of training patterns = common to all SVMs; just the labels assigned to each training pattern are different. Use only unambiguous positive & negative examples. Training patterns (mouth images) are preprocessed for normalization in respect to scale, translation and rotation. 3. SVMs and their use for mouth shape recognition - III Department of Informatics Aristotle University of Thessaloniki

4. Modeling the temporal dynamics of visual speech - I Department of Informatics Aristotle University of Thessaloniki Symbolic visemic description of a word = L-R sequence of visemes; no information about the relative duration of each viseme in the word realization (strongly person-dependent) Given: –the symbolic visemic description of a word  –the total number of frames in the word pronunciation  build the word model in the temporal domain by assuming any non-zero possible duration of each viseme = a temporal network of models for each symbolic visemic description, as a Viterbi lattice. “one” =

4. Modeling the temporal dynamics of visual speech - II Department of Informatics Aristotle University of Thessaloniki IN OUT Node k Node k+1 Sub-path i Viterbi lattice d for the visemic word model w d ; T=5

4. Modeling the temporal dynamics of visual speech - III Department of Informatics Aristotle University of Thessaloniki Node k = the measure of confidence in the realization of the viseme o k =“ah” at the timeframe t k =3. = the real-valued output of the SVM trained for the recognition of the viseme o k. Sub-path i = the transition probability from the state which generates o k =“ah” at timeframe t k =3 to the state which generates o k+1 =“n” at timeframe t k+1 =4. We assume equal transition probabilities. Path l = any connected path between the states IN and OUT in the Viterbi lattice. Confidence in path l from the Viterbi lattice d: Plausibility of producing the word model w d :

5. Block diagram of the proposed visual speech recognition system Department of Informatics Aristotle University of Thessaloniki w ah n “one” oa ah n “one” f ao r “four” c1c1 c2c2 cDcD i=arg max c d Result: i=1 Word “one” SVM o a SVM a h SVM n

Task to be solved:Task to be solved: visual speech recognition of the first four digits in English Experimental data:Experimental data: the visual part from Tulips1 audiovisual speech database Implementation:Implementation: in C++, using the publicly available SVMLight toolkit writing the code for the Viterbi algorithm and additional modules and integrating them into the visual speech recognizer Training strategy:Training strategy: 12 SVMs (one for each viseme class) with polynomial kernel, degree 3, C=1000. Test strategy:Test strategy: leave-one-out protocol  train the system 12 times on 11 subjects, each time leaving out one subject for testing  24 test sequences/ word  4 words = 96 test sequences. Performance evaluation:Performance evaluation: in terms of: Overall (average) WRR – compared to the similar results from literature; 95% confidence intervals for the WRR of the proposed approach and for WRR of similar approaches from literature. Department of Informatics Aristotle University of Thessaloniki 6. Experimental results - I

Comparison:Comparison: Slightly higher WRR and confidence intervals compared to the literature Exception: lower WRR than the best reported without delta features (87.5%), due to a much better localization of the ROI around lip contour in that case. However – our computational complexity is much lower (no need to redefine the ROI in each frame). Department of Informatics Aristotle University of Thessaloniki 6. Experimental results - II

Department of Informatics Aristotle University of Thessaloniki 7. Conclusions We examined the suitability of SVM classifiers for visual speech recognition. The temporal character of speech was modeled by integrating SVMs with real valued output as nodes in a Viterbi decoding lattice. Performance evaluation of the system on a small visual speech recognition task show: –better WRR than the ones reported in literature, –even for the use of very simple features: directly the gray levels in the mouth image ðSVMs = promising tool for visual speech recognition applications. Future research’s goals: increase the WRR by: including delta features;  examining other SVM’s kernels;  learning the state transition probabilities in the Viterbi decoding lattice