Download presentation
Presentation is loading. Please wait.
Published byKerry Bennett Modified over 9 years ago
1
Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface
2
Page 2NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper 1 - Silent Speech Interface nSensor-based system allowing speech communication via standard articulators, but without glottal activity nTwo distinct types of application –alternative to tracheo-oesophagal speech (TES) for persons having undergone a tracheotomy –a "silent telephone" for use in situations where quiet must be maintained, or for communication in very noisy environments nSpeech Synthesis from ultrasound and optical imagery of the tongue and lips 1) Oral Ultrasound synthetIc SPEech souRce
3
Page 3NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - System Overview Ultrasound video of the vocal tract Optical video of the speaker lips Recorded audio Speech Alignment Text Visual Feature Extraction Audio-Visual Speech Corpus Visual Speech Recognizer Visual Unit Selection Audio Unit Concatenatio n TRAININGTRAINING TESTTEST Visual Data N-best Phonetic or ALISP Targets
4
Page 4NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Training Data
5
Page 5NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Video Stream Coding T.Hueber, G. Aversano, G.Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, M. Stone, “EigenTongue Feature Extraction For An Ultrasound-based Silent Speech Interface,” IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu Hawaii, USA, 2007. Eigenvectors Build a subset of typical frames Perform PCA Code new frames with their projections onto the set of Eigenvectors
6
Page 6NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Audio Stream Coding ALISP Segmentation Detection of quasi-stationary parts in the parametric representation of speech Assignment of segments to class using unsupervised classification techniques Phonetic Segmentation Forced-alignement of speech with the text Need of a relevant and correct phonetic transcription of the uttered signal Corpus-based synthesis Need of a preliminary segmental description of the signal
7
Page 7NOLISP 2007, Montréal, PARIS 23 Mai 2007 Audiovisual dictionary building nVisual and acoustic data are synchronously recorded nAudio segmentation is used to bootstrap visual speech recognizer Audiovisual dictionary
8
Page 8NOLISP 2007, Montréal, PARIS 23 Mai 2007 Visuo-acoustic decoding nVisual speech recognition –Train HMM model for each visual class Use multistream-based learning techniques –Perform a « visuo-phonetic » decoding step Use N-Best list Introduce linguistic constraints –Language model –Dictionary –Multigrams nCorpus-based speech synthesis –Combine probabilistic and data-driven approach in the audiovisual unit selection step.
9
Page 9NOLISP 2007, Montréal, PARIS 23 Mai 2007 Speech recognition from video-only data ow p ax n y uh r b uh k t uw dh ax f er s t p ey jh ax w ih y uh r b uh k sh uw dh ax v er s p ey jh Open your book to the first page Ref Rec A wear your book shoe the verse page Corpus-based synthesis driven by predicted phonetic lattice is currently under study
10
Page 10NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Conclusion nMore information on –http://www.neurones.espci.fr/ouisper/http://www.neurones.espci.fr/ouisper/ nContacts –gerard.chollet@enst.frgerard.chollet@enst.fr –denby@ieee.orgdenby@ieee.org –hueber@ieee.orghueber@ieee.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.