Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Minimum Redundancy and Maximum Relevance Feature Selection
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
CMPUT 466/551 Principal Source: CMU
Principal Component Analysis
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
HIWIRE Progress Report Chania, May 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Speaker Adaptation for Vowel Classification
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Presented by Zeehasham Rasheed
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Isolated-Word Speech Recognition Using Hidden Markov Models
Outline Separating Hyperplanes – Separable Case
Image recognition using analysis of the frequency domain features 1.
Presented by Tienwei Tsai July, 2005
7-Speech Recognition Speech Recognition Concepts
Element 2: Discuss basic computational intelligence methods.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Multimodal Information Analysis for Emotion Recognition
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Performance Comparison of Speaker and Emotion Recognition
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data Mining and Decision Support
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
UCD Electronic and Electrical Engineering Robust Multi-modal Person Identification with Tolerance of Facial Expression Niall Fox Dr Richard Reilly University.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Alexandrina Rogozan Adaptive Fusion of Acoustic and Visual Sources for Automatic Speech Recognition UNIVERSITE du MAINE
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
LECTURE 11: Advanced Discriminant Analysis
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Machine Learning Feature Creation and Selection
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Generally Discriminant Analysis
Research Institute for Future Media Computing
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 2 Outline  Feature selection and extraction – Why select features? – Information theoretic criteria  Our approach – The audio-visual recognizer – Audio-visual integration – Features and selection methods  Experimental results  Conclusion

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 3 Feature selection  Features and classification – Features (or attributes, properties, characteristics) - different types of measures that can be taken on the same physical phenomenon – An instance (or pattern, sample, example) - collection of feature values representing simultaneous measurements – For classification, each sample has an associated class label  Feature selection – Finding from the original feature set, a subset which retains most of the information that is relevant for a classification task – This is needed because of the curse of dimensionality  Why dimensionality reduction? – The number of samples required to obtain accurate models of the data grows exponentially with the dimensionality – The computing resources required also grow with the dimensionality of the data – Irrelevant information can decrease performance

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 4 Feature selection  Entropy and mutual information – H(X), the entropy of X – the amount of uncertainty about the value of X – I(X;Y), the mutual information between X and Y – the reduction in the uncertainty of X due to the knowledge of Y (or vice-versa)  Maximum dependency – One of the frequently used criteria is mutual information – Pick Y S1 …Y Sm from the set Y 1 …Y n of features, such that I(Y S1,Y S2,…, Y Sm ; C) is maximum  How many subsets? – Impossible to check all subsets, high number of combinations: – As an approximate solution, greedy algorithms are used – The number of possibilities is reduced to

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 5 A simple example  Entropies and mutual information can be represented by Venn diagrams  We are searching for the features Y Si with maximum mutual information with the class label  Assume the complete set of features is

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 6 A simple example

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 7 A simple example

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 8 A simple example

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 9 A simple example

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 10 Which criterion to penalize redundancy?  Many different criteria proposed in the literature  Our criterion penalizes only relevant redundancy

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 11 Solutions from the literature  “Natural” DCT ordering – Zigzag scanning, used in compression (JPEG/MPEG)  Maximum mutual information – Typically the redundancy is not taken into account  Linear Discriminant Analysis – A transform is applied on the features

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 12 Our application: AVSR  Experiments on the CUAVE database – 36 speakers, 10 words, 5 repetitions per speaker – Leave-one-out crossvalidation – Audio features: MFCC coefficients – Visual features: DCT with first and second temporal derivatives – Different levels of noise added to the audio

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 13 The multi-stream HMM Audio (39 MFCCs) Video (DCT features)  Audio-visual integration with multi-stream HMMs – States are modeled with gaussian mixtures – Each modality is modeled separately – The emission likelihood is a weighted product – The optimal weights are chosen for each SNR

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 14 Information content of different types of features

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 15 Visual-only recognition rate

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 16 Audio-visual performance

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 17 AV performance with clean audio

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 18 AV performance at 10db SNR

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 19 Noisy AV and visual-only comparison

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 20 Conclusion and future work  Feature selection for audio-visual speech recognition – Visual-only recognition rate not a good predictor for audio-visual performance because of dimensionality – Maximum audio-visual performance is obtained for small video dimensionalities – Algorithms that improve performance at small dimensionalities are needed  Future work – Better methods to compute the amount of redundancy between features