LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.

Slides:



Advertisements
Similar presentations
Current HOARSE related activities 6-7 Sept …include the following (+ more) Novel architectures 1.All-combinations HMM/ANN 2.Tandem HMM/ANN hybrid.
Advertisements

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Building an ASR using HTK CS4706
Advances in WP1 Trento Meeting January
Towards speaker and environmental robustness in ASR: the HIWIRE project A. Potamianos 1, G. Bouselmi 2, D. Dimitriadis 3, D. Fohr 2, R. Gemello 4, I. Illina.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Topics Recognition results on Aurora noisy speech databaseRecognition results on Aurora noisy speech database Proposal of robust formant.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
Advances in WP1 Turin Meeting – 9-10 March
Application of HMMs: Speech recognition “Noisy channel” model of speech.
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
HIWIRE Progress Report Chania, May 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
HIWIRE MEETING Chania, May 10-11, 2007 José C. Segura.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Development of protocols WP4 – T4.2 Torino, March 9 th -10 th 2006.
Non-native Speech Languages have different pronunciation spaces
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
Bayesian Networks Toolkit Objective: free C++ toolkit to manipulate and experiment with Bayesian Networks Status: – Released at the INRIA Gforge: BaNeTo.
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
Advances in WP1 Chania Meeting – May
HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March Torino.
HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Speech and Language Processing
7-Speech Recognition Speech Recognition Concepts
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Jacob Zurasky ECE5526 – Spring 2011
LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION Ph.D. Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
SPOKEN LANGUAGE SYSTEMS MIT Computer Science and Artificial Intelligence Laboratory Relating Reliability in Phonetic Feature Streams to Noise Robustness.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Automatic Speech Recognition
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Automatic Speech Recognition Introduction
Statistical Models for Automatic Speech Recognition
Automatic Speech Recognition
Speech Processing Speech Recognition
Statistical Models for Automatic Speech Recognition
Missing feature theory
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006

HIWIRE Work package 1: –Missing Data Work package 2: –Non Native Speech Recognition

WP1 : Missing Data New approach for noise speech recognition Two steps : –Training of mask models –Recognition with mask

Missing Data : Training of Mask Models Computation of mask vectors (« oracle ») for each frame –Spectrum with cuberoot compression –Spectrum for clean data and noisy data –For each frequency band f (1..12) If SNR>0dB then mask(f)=0 else mask(f)=1 Clustering of mask vectors –Euclidian distance –N clusters is used (N=31): each element of a cluster is presented by mask vector and corresponding frame vector (MFCC) Training one GMM per cluster –Observations: observation vectors associated with frames (MFCC+D) Training of one ergodic HMM (N states) –Each state is one of previous GMMs –Only state transition probabilities are trained

Missing Data : Recognition with Masks Compute mask vector for each frame –MFCC coefficients –Viterbi alignment using ergodic HMM –Each frame -> one state -> mask Perform marginalization with the masked frames –Spectrum with cuberoot compression

Missing Data: Experiments Training –Aurora2 –4 noises (test A) 4 SNR (5 1à dB) Test –Aurora2 –Test A and B

Baseline (multi-style)Missing Data Test ATest BTest ATest B clean SNR SNR SNR SNR Average Missing Data: Experiments

WP2 : Non Native Speech Recognition Method based on phone confusion Presented in Granada meeting Extract confusion rules between english phones and native acoustic models English phone -> french phone ah -> a ah ->  Method based on graphemic contraint Presented in Athens meeting Phone prononciation depends on word grapheme English phone [grapheme] -> french phone ah [A] -> a Approach ah [E] ->  cancEl

Non Native Speech Recognition : Method based on Graphemic Constraint Idea : –Example 1 : APPROACH /ah p r ow ch/ APPROACH (A, ah) (PP, p) (R, r) (OA, ow) (CH, ch) –Example 2 : POSITION /p ah z ih sh ah n/ POSITION (P, p) (O, ah) (S, z) (I, ih) (TI, sh) (O, ah) (N, n) Alignment between graphemes and phones for each word of lexicon –Using discret HMM –Each state of HMM is a phone symbol Lexicon modification: add graphemes for each word ( like in examples 1, 2) Confusion rules extraction (grapheme, english phone) → list of non native phones Example: (A, ah) → a Confusion rules integration in acoustic models Recognition

Example of acoustic model modification for english phone /t  /  /t/  /k/  //// /t  / //// //// Extracted rules Modifed structure of HMM for model /t  / English phonesFrench phones English model French models

Used Approach FrenchItalianSpanish WERSERWERSERWERSER Thales grammar baseline confusion graphemes confusion Word loop grammar baseline confusion graphemes confusion Experiments : HIWIRE Database Training French acoustic models : Broadcast News corpus Training English acoustic models: TIMIT Non native speech recognition : 50 sentences per speaker for rules extraction, 50 sentences per speaker for test

Questions about prototype Which noise robustness aproaches will be puted in the prototype? Which speaker robustness aproaches will be puted in the prototype? Who to integrate noise and speaker robustness approaches in the same time? Which grammar to use : Thales grammar or large vocabulary grammar? Real time recognition?