Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface.

Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper 1 - Silent Speech Interface nSensor-based system allowing speech communication via standard articulators, but without glottal activity nTwo distinct types of application –alternative to tracheo-oesophagal speech (TES) for persons having undergone a tracheotomy –a "silent telephone" for use in situations where quiet must be maintained, or for communication in very noisy environments nSpeech Synthesis from ultrasound and optical imagery of the tongue and lips 1) Oral Ultrasound synthetIc SPEech souRce

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - System Overview Ultrasound video of the vocal tract Optical video of the speaker lips Recorded audio Speech Alignment Text Visual Feature Extraction Audio-Visual Speech Corpus Visual Speech Recognizer Visual Unit Selection Audio Unit Concatenatio n TRAININGTRAINING TESTTEST Visual Data N-best Phonetic or ALISP Targets

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Training Data

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Video Stream Coding T.Hueber, G. Aversano, G.Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, M. Stone, “EigenTongue Feature Extraction For An Ultrasound-based Silent Speech Interface,” IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu Hawaii, USA, 2007. Eigenvectors Build a subset of typical frames Perform PCA Code new frames with their projections onto the set of Eigenvectors

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Audio Stream Coding ALISP Segmentation Detection of quasi-stationary parts in the parametric representation of speech Assignment of segments to class using unsupervised classification techniques Phonetic Segmentation Forced-alignement of speech with the text Need of a relevant and correct phonetic transcription of the uttered signal Corpus-based synthesis Need of a preliminary segmental description of the signal

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Audiovisual dictionary building nVisual and acoustic data are synchronously recorded nAudio segmentation is used to bootstrap visual speech recognizer Audiovisual dictionary

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Visuo-acoustic decoding nVisual speech recognition –Train HMM model for each visual class Use multistream-based learning techniques –Perform a « visuo-phonetic » decoding step Use N-Best list Introduce linguistic constraints –Language model –Dictionary –Multigrams nCorpus-based speech synthesis –Combine probabilistic and data-driven approach in the audiovisual unit selection step.

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Speech recognition from video-only data ow p ax n y uh r b uh k t uw dh ax f er s t p ey jh ax w ih y uh r b uh k sh uw dh ax v er s p ey jh Open your book to the first page Ref Rec A wear your book shoe the verse page Corpus-based synthesis driven by predicted phonetic lattice is currently under study

NOLISP 2007, Montréal, PARIS 23 Mai 2007 Ouisper - Conclusion nMore information on –http://www.neurones.espci.fr/ouisper/http://www.neurones.espci.fr/ouisper/ nContacts –gerard.chollet@enst.frgerard.chollet@enst.fr –denby@ieee.orgdenby@ieee.org –hueber@ieee.orghueber@ieee.org

Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface.

Similar presentations

Presentation on theme: "Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface.

Similar presentations

Presentation on theme: "Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface."— Presentation transcript:

Similar presentations

About project

Feedback