Time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory.

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Audio-visual speaker verification using continuous fused HMMs.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
A two dimensional kinematic mapping between speech acoustics and vocal tract configurations : WISP A.Hatzis, P.D.Green1 History of Vowel.
Speech Recognition with Hidden Markov Models Winter 2011
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Speech Group INRIA Lorraine
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
SecurePhone Workshop - 24/25 June Speaking Faces Verification Kevin McTait Raphaël Blouet Gérard Chollet Silvia Colón Guido Aversano.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Lecture 14 – Neural Networks
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Real-Time Audio-Visual Automatic Speech Recognition Demonstrator TSI-TUC, Greece (A. Potamianos, E. Sanchez-Soto, M. Perakakis) NTUA, Greece (P. Maragos,
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer.
W M AM A I AI IM AIM Time (samples) Response (V) True rating Predicted rating  =0.94  =0.86 Irritation Pleasantness.
The trough effect: Can we predict tongue lowering from acoustic data alone? Yolanda Vazquez Alvarez.
APPROXIMATE EXPRESSIONS FOR THE MEAN AND COVARIANCE OF THE ML ESTIMATIOR FOR ACOUSTIC SOURCE LOCALIZATION Vikas C. Raykar | Ramani Duraiswami Perceptual.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
12.0 Computer-Assisted Language Learning (CALL) References: 1.“An Overview of Spoken Language Technology for Education”, Speech Communications, 51, pp ,
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Isolated-Word Speech Recognition Using Hidden Markov Models
Research & development Learning optimal audiovisual phasing for an HMM-based control model for facial animation O. Govokhina (1,2), G. Bailly (2), G. Breton.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
 Speech is bimodal essentially. Acoustic and Visual cues. H. McGurk and J. MacDonald, ''Hearing lips and seeing voices'', Nature, pp , December.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
Curve-Fitting Regression
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Speech Signal Processing I
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Point Distribution Models Active Appearance Models Compilation based on: Dhruv Batra ECE CMU Tim Cootes Machester.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Proposed Courses. Important Notes State-of-the-art challenges in TV Broadcasting o New technologies in TV o Multi-view broadcasting o HDR imaging.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Acoustic to Articoulatory Speech Inversion by Dynamic Time Warping
Correlational and Regressive Analysis of the Relationship between Tongue and Lips Motion - An EMA and Video Study of Selected Polish Speech Sounds Robert.
Ch3: Model Building through Regression
Special Topics In Scientific Computing
Mplp(t) derived from PLP cepstra,. This observation
Image Segmentation Techniques
A Tutorial on Bayesian Speech Feature Enhancement
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Presentation transcript:

time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory speech inversion using Hidden Markov Models References H. Kjellstrom, O. Engwall, and O. Balter, “Reconstructing tongue movements from audio and video,” in Interspeech, 2006, pp. 2238–2241. O. Engwall, “Introducing visual cues in acoustic-to-articulatory inversion,” in INTERSPEECH, 2005, pp. 3205–3208. J. Jiang, A. Alwan, P. A. Keating, E. T. Auer Jr., and L. E. Bernstein, “On the relationship between face movements, tongue movements, and speech acoustics,” EURASIP Journal on Applied Signal Processing, vol. 11, pp. 1174– 1188, H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, “Quantitative association of vocal-tract and facial behavior,” Sp. Comm., vol. 26, pp. 23–43, K. Richmond, S. King, and P. Taylor, “Modelling the uncertainty in recovering articulation from acoustics,” Computer Speech and Language, vol. 17, pp. 153– 172, S. Hiroya and M. Honda, “Estimation of articulatory movements from speech acoustics using an hmm-based speech production model,” IEEE TSAP, vol. 12, no. 2, pp. 175–185, March O. Engwall and J. Beskow, “Resynthesis of 3d tongue movements from facial data,” in EUROSPEECH, S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Tr. Multimedia, vol. 2, no. 3, pp. 141–151, K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis. Acad. Press, L. L. Scharf and J. K. Thomas, “Wiener filters in canonical coordinates for transform coding, filtering, and quantizing,” IEEE TSAP, vol. 46, no. 3, pp. 647–654, L. Breiman and J. H. Friedman, “Predicting multivariate responses in multiple linear regression,” Journal of the Royal Stat. Soc. (B), vol. 59, no. 1, pp. 3–54, Acknowledgements This research was co-financed partially by E.U.-European Social Fund (75%) and the Greek Ministry of Development-GSRT (25%) under Grant ΠΕΝΕΔ- 2003ΕΔ866, and partially by the European research project ASPI under Grant FP We would also like to thank O. Engwall from KTH for providing us the QSMT database. Evaluation Measured (black) and predicted (light color) articulatory trajectories Speech inversion ?  Recover vocal tract geometry from the speech signal and speaker’s face  Applications in Language Tutoring, Speech Therapy Zero states correspond to the case of a global linear model. Qualisys-Movetrack database Electromagnetic Articulography (EMA) Video Audio /p 1 / /p 2 / yaya yvyv 3-D marker coordinates wawa wvwv spectral characteristics/MFCC Determination of multistream HMM state sequence Why Canonical Correlation Analysis (CCA)?  Leads to optimal reduced-rank linear regression models.  Improved predictive performance in the case of limited data Generalization error of the linear regression model vs. model order for varying training set size. Upper row: Tongue position from face expression. Lower Row: Face expression from tongue position. We use multistream HMMs  Visual-to-articulatory mapping is expected to be nonlinear.  Visual stream incorporated following the Audiovisual ASR paradigm. We apply CCA  Train a linear mapping at each HMM state between audiovisual and articulatory data Performance is improved  Compared to global linear modal or audio only or visual only HMM x Time t, state i: Maximum A Posteriori articulatory parameter estimate: Where Q i is the covariance of the approximation error and the prior of x is considered to be Gaussian determined at the training phase