Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 2 Outline  Feature selection and extraction – Why select features? – Information theoretic criteria  Our approach – The audio-visual recognizer – Audio-visual integration – Features and selection methods  Experimental results  Conclusion

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 3 Feature selection  Features and classification – Features (or attributes, properties, characteristics) - different types of measures that can be taken on the same physical phenomenon – An instance (or pattern, sample, example) - collection of feature values representing simultaneous measurements – For classification, each sample has an associated class label  Feature selection – Finding from the original feature set, a subset which retains most of the information that is relevant for a classification task – This is needed because of the curse of dimensionality  Why dimensionality reduction? – The number of samples required to obtain accurate models of the data grows exponentially with the dimensionality – The computing resources required also grow with the dimensionality of the data – Irrelevant information can decrease performance

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 4 Feature selection  Entropy and mutual information – H(X), the entropy of X – the amount of uncertainty about the value of X – I(X;Y), the mutual information between X and Y – the reduction in the uncertainty of X due to the knowledge of Y (or vice-versa)  Maximum dependency – One of the frequently used criteria is mutual information – Pick Y S1 …Y Sm from the set Y 1 …Y n of features, such that I(Y S1,Y S2,…, Y Sm ; C) is maximum  How many subsets? – Impossible to check all subsets, high number of combinations: – As an approximate solution, greedy algorithms are used – The number of possibilities is reduced to

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 5 A simple example  Entropies and mutual information can be represented by Venn diagrams  We are searching for the features Y Si with maximum mutual information with the class label  Assume the complete set of features is

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 6 A simple example

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 10 Which criterion to penalize redundancy?  Many different criteria proposed in the literature  Our criterion penalizes only relevant redundancy

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 11 Solutions from the literature  “Natural” DCT ordering – Zigzag scanning, used in compression (JPEG/MPEG)  Maximum mutual information – Typically the redundancy is not taken into account  Linear Discriminant Analysis – A transform is applied on the features

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 12 Our application: AVSR  Experiments on the CUAVE database – 36 speakers, 10 words, 5 repetitions per speaker – Leave-one-out crossvalidation – Audio features: MFCC coefficients – Visual features: DCT with first and second temporal derivatives – Different levels of noise added to the audio

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 13 The multi-stream HMM Audio (39 MFCCs) Video (DCT features)  Audio-visual integration with multi-stream HMMs – States are modeled with gaussian mixtures – Each modality is modeled separately – The emission likelihood is a weighted product – The optimal weights are chosen for each SNR

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 14 Information content of different types of features

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 15 Visual-only recognition rate

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 16 Audio-visual performance

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 17 AV performance with clean audio

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 18 AV performance at 10db SNR

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 19 Noisy AV and visual-only comparison

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 20 Conclusion and future work  Feature selection for audio-visual speech recognition – Visual-only recognition rate not a good predictor for audio-visual performance because of dimensionality – Maximum audio-visual performance is obtained for small video dimensionalities – Algorithms that improve performance at small dimensionalities are needed  Future work – Better methods to compute the amount of redundancy between features

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

Similar presentations

Presentation on theme: "Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

Similar presentations

Presentation on theme: "Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban."— Presentation transcript:

Similar presentations

About project

Feedback