Audio Visual Speech Recognition

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

Name ____________________ Date ___________ Period ____.
Building an ASR using HTK CS4706
CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Audio-visual speaker verification using continuous fused HMMs.
STQ Workshop, Sophia-Antipolis, February 11 th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM.
Advanced Speech Enhancement in Noisy Environments
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
SecurePhone Workshop - 24/25 June Speaking Faces Verification Kevin McTait Raphaël Blouet Gérard Chollet Silvia Colón Guido Aversano.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Advances in WP1 Turin Meeting – 9-10 March
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Non-native Speech Languages have different pronunciation spaces
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
CRICOS No J † e-Health Research Centre/ CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Comparing Audio and Visual Information.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Why is ASR Hard? Natural speech is continuous
Page 1 | Microsoft Introduction to audio stream Kinect for Windows Video Courses.
Kinect Player Gender Recognition from Speech Analysis
Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory An Examination of Audio-Visual Fused HMMs for Speaker Recognition.
 Speech is bimodal essentially. Acoustic and Visual cues. H. McGurk and J. MacDonald, ''Hearing lips and seeing voices'', Nature, pp , December.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Jacob Zurasky ECE5526 – Spring 2011
Multimodal Information Analysis for Emotion Recognition
Speech recognition and the EM algorithm
Speaker independent Digit Recognition System Suma Swamy Research Scholar Anna University, Chennai 10/22/2015 9:10 PM 1.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Blind Contrast Restoration Assessment by Gradient Ratioing at Visible Edges Nicolas Hautière 1, Jean-Philippe Tarel 1, Didier Aubert 1-2, Eric Dumont 1.
Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI,
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Introduction to Onset Detection Functions HAO-HSUN LI 1/30.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Database and Visual Front End Makis Potamianos.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Audio-Visual Speech Recognition: Audio Noise, Video Noise, and Pronunciation Variability Mark Hasegawa-Johnson Electrical and Computer Engineering.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
UCD Electronic and Electrical Engineering Robust Multi-modal Person Identification with Tolerance of Facial Expression Niall Fox Dr Richard Reilly University.
BRAIN Alliance Research Team Annual Progress Report (Jul – Feb
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments Good morning, My name is Guan-Lin Chao, from Carnegie Mellon.
Robust Data Hiding for MCLT Based Acoustic Data Transmission
PLIP BASED UNSHARP MASKING FOR MEDICAL IMAGE ENHANCEMENT
Seunghui Cha1, Wookhyun Kim1
Computational NeuroEngineering Lab
Packet loss concealment using audio morphing
T H E P U B G P R O J E C T.
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
John H.L. Hansen & Taufiq Al Babba Hasan
AHED Automatic Human Emotion Detection
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
End-to-End Speech-Driven Facial Animation with Temporal GANs
Presentation transcript:

Audio Visual Speech Recognition Dictionary Grammar Acoustic models Features extraction Decoder Projet de recherche sur crédit incitatif GET 2005

Audio processing Features extraction Digits detection Digits recognition: Acoustic parameters : MFCC Context independent HMMs Decoding : Time synchronous algorithm Sound effect Noise : Babble Recognition experiments Projet de recherche sur crédit incitatif GET 2005

Video processing Video extraction Lips localisation Images interpolation (same frequency as speech) Features extraction DCT and DCT2 (DCT+LDA) Projections : PRO et PRO2 (PRO+LDA) Recognition experiments Projet de recherche sur crédit incitatif GET 2005

Fusion techniques Parameters fusion : Concatenation Dimension decrease : Linear Discriminant Analysis (LDA) Modelisation : classical HMM with one stream Scores fusion : Multi-stream HMM Projet de recherche sur crédit incitatif GET 2005

Experimental results : parameters fusion Projet de recherche sur crédit incitatif GET 2005

Experimental results : Scores fusion at -5db Projet de recherche sur crédit incitatif GET 2005

Bibliography G. Potamianos, C. Neti, G. Gravier, A. Garp, A. W. Senior,  « Recent Advances in the Automatic Recognition of Audiovisuel Speech ». In proceedings of IEEE Vol. 91, pages 1306-1326. sept 2003. J.N. Gowdy, A. Subramanya, C. Bartels, and J. Bilmes, « DBN-Based Multi-Stream Models for Audio-Visual Speech Recognition ». IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, May 2004. Montreal, Canada F. Brugger, L. Zouari, H. Bredin, A. Ameheaye, G. Chollet, D. Pastor et Y. Ni, « Reconnaissance de la parole audiovisuelle par VMike ». XVIèmes Journées d’Etude sur la Parole JEP. Dinard 2006. Projet de recherche sur crédit incitatif GET 2005