Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface.

Slides:



Advertisements
Similar presentations
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Building an ASR using HTK CS4706
Audio Visual Speech Recognition
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Speech Recognition. What makes speech recognition hard?
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.
Non-native Speech Languages have different pronunciation spaces
Exploring Universal Attribute Characterization of Spoken Languages for Spoken Language Recognition.
W M AM A I AI IM AIM Time (samples) Response (V) True rating Predicted rating  =0.94  =0.86 Irritation Pleasantness.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Introduction to Automatic Speech Recognition
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Speech Signal Processing
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Speech and Language Processing
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Potential team members to date: Karen Livescu (presenter) Simon King Florian Metze Jeff Bilmes Articulatory Feature-based Speech Recognition: A Proposal.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Performance Comparison of Speaker and Emotion Recognition
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
1 Detecting Group Interest-level in Meetings Daniel Gatica-Perez, Iain McCowan, Dong Zhang, and Samy Bengio IDIAP Research Institute, Martigny, Switzerland.
OTHER RESEARCH IN SIGNAL PROCESSING AND COMMUNICATIONS IN ECE Richard Stern Carnegie Mellon University (with Dave Casasent, Tsuhan Chen, Vijaya Kumar,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
1 A Two-pass Framework of Mispronunciation Detection & Diagnosis for Computer-aided Pronunciation Training Xiaojun Qian, Member, IEEE, Helen Meng, Fellow,
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Automatic Speech Recognition
Mr. Darko Pekar, Speech Morphing Inc.
Speech recognition in mobile environment Robust ASR with dual Mic
Speech Recognition UNIT -5.
Automatic Speech Recognition Introduction
Structure of Spoken Language
Phonetics and Phonemics
AHED Automatic Human Emotion Detection
Phonetics and Phonemics
End-to-End Speech-Driven Facial Animation with Temporal GANs
Hao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu
Presentation transcript:

Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface

Page 2NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper 1 - Silent Speech Interface nSensor-based system allowing speech communication via standard articulators, but without glottal activity nTwo distinct types of application –alternative to tracheo-oesophagal speech (TES) for persons having undergone a tracheotomy –a "silent telephone" for use in situations where quiet must be maintained, or for communication in very noisy environments nSpeech Synthesis from ultrasound and optical imagery of the tongue and lips 1) Oral Ultrasound synthetIc SPEech souRce

Page 3NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - System Overview Ultrasound video of the vocal tract Optical video of the speaker lips Recorded audio Speech Alignment Text Visual Feature Extraction Audio-Visual Speech Corpus Visual Speech Recognizer Visual Unit Selection Audio Unit Concatenatio n TRAININGTRAINING TESTTEST Visual Data N-best Phonetic or ALISP Targets

Page 4NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Training Data

Page 5NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Video Stream Coding T.Hueber, G. Aversano, G.Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, M. Stone, “EigenTongue Feature Extraction For An Ultrasound-based Silent Speech Interface,” IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu Hawaii, USA, Eigenvectors Build a subset of typical frames Perform PCA Code new frames with their projections onto the set of Eigenvectors

Page 6NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Audio Stream Coding ALISP Segmentation Detection of quasi-stationary parts in the parametric representation of speech Assignment of segments to class using unsupervised classification techniques Phonetic Segmentation Forced-alignement of speech with the text Need of a relevant and correct phonetic transcription of the uttered signal Corpus-based synthesis Need of a preliminary segmental description of the signal

Page 7NOLISP 2007, Montréal, PARIS 23 Mai 2007 Audiovisual dictionary building nVisual and acoustic data are synchronously recorded nAudio segmentation is used to bootstrap visual speech recognizer Audiovisual dictionary

Page 8NOLISP 2007, Montréal, PARIS 23 Mai 2007 Visuo-acoustic decoding nVisual speech recognition –Train HMM model for each visual class Use multistream-based learning techniques –Perform a « visuo-phonetic » decoding step Use N-Best list Introduce linguistic constraints –Language model –Dictionary –Multigrams nCorpus-based speech synthesis –Combine probabilistic and data-driven approach in the audiovisual unit selection step.

Page 9NOLISP 2007, Montréal, PARIS 23 Mai 2007 Speech recognition from video-only data ow p ax n y uh r b uh k t uw dh ax f er s t p ey jh ax w ih y uh r b uh k sh uw dh ax v er s p ey jh Open your book to the first page Ref Rec A wear your book shoe the verse page Corpus-based synthesis driven by predicted phonetic lattice is currently under study

Page 10NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Conclusion nMore information on – nContacts