1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology.

Slides:



Advertisements
Similar presentations
Acoustic model adaptation for telephone-based speech recognition N. Kleynhans and E. Barnard 27 January 2010.
Advertisements

Building an ASR using HTK CS4706
Speech Recognition with Hidden Markov Models Winter 2011
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Speaker Adaptation for Vowel Classification
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
9/20/2004Speech Group Lunch Talk Speaker ID Smorgasbord or How I spent My Summer at ICSI Kofi A. Boakye International Computer Science Institute.
Optimal Adaptation for Statistical Classifiers Xiao Li.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Robust Automatic Speech Recognition by Transforming Binary Uncertainties DeLiang Wang (Jointly with Dr. Soundar Srinivasan) Oticon A/S, Denmark (On leave.
1 Linking Computational Auditory Scene Analysis with ‘Missing Data’ Recognition of Speech Guy J. Brown Department of Computer Science, University of Sheffield.
Representing Acoustic Information
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Isolated-Word Speech Recognition Using Hidden Markov Models
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Research & Technology Progress in the framework of the RESPITE project at DaimlerChrysler Research & Technology Dr-Ing. Fritz Class and Joan Marí Sheffield,
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Progress Report - V Ravi Chander
Statistical Models for Automatic Speech Recognition
Speech Processing Speech Recognition
Statistical Models for Automatic Speech Recognition
DeLiang Wang (Jointly with Dr. Soundar Srinivasan) Oticon A/S, Denmark
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Keyword Spotting Dynamic Time Warping
Deep Neural Network Language Models
Presentation transcript:

1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology

2 Introduction Aim: Test large vocabulary ASR in stir – sir paradigm Motivation: Large vocabulary ASR has learned phoneme models close to humans ASR: a newly trained English-English large vocabulary recogniser –Trained on read Wall street journal articles –Sampling rate 16 kHz

3 ASR details Standard features: Mel freq. cepstral coefficients (MFCCs) + power + deltas + accelerations Triphone HMMs with acoustic likelihood modeled by Gaussian mixture model Supervised adaptation using constrained maximum likelihood linear regression, CMLLR –Can be formulated as linear feature transformation

4 Experiments Three things tested for –Free recognition result –Recognizer chooses in between: ”next_you'll_get_sir_to_click_on” “next_you'll_get_stir_to_click_on” –Temporally averaged log-probability of ”t”

5 Experiments Experiment 1: ”dry” models with no adaptation Experiment 2: ”dry” models adapted to right conditions –Near-near adapted with near-near –Far-far adapted with far-far –Supervised adaptation with utterances at ends of continuum Experiment 3: "dry” models adapted to both ”near near”, and ”far-far” –Supervised adaptation with utterances at the ends of continuum

6 Exp. 1: “dry” models, no adaptation Free recognition: –near-near: “nantz two-a-days so far”, “nursing care so far” –far-far: “nantz th”, “NMS death”, “ “ Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model –Near near: change in between conditions 08 and 09 –Far-far: everything silence

7 Exp. 1: “dry” models, no adaptation

8 Exp. 1: “dry” models, adapted to right cond. Free recognition: –Near-near: “next month though the khon” –Far-far: ”next he’ll throw the khon” Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model –Near near: change in between conditions 03 and 04 –Far-far: ”sir” all the time

9 Exp. 1: “dry” models, adapted to right cond.

10 Exp. 1: “dry” models, adapted to both Free recognition: –Near-near: next month though the khon –Far far: “next month khon” or “nantz khon” Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model –Switches in between the sentences oddly

11 Exp. 1: “dry” models, adapted to both

12 Discussion & Future directions Currently ”unconvincing” –Poor free recognition performance –Especially poor far-far performance –May be hard to obtain similar sensitivity as human listeners have Tricks to get around the poor performance –Cooke (2006) uses a priori masks in order to find glimpses of speech –Choose in between two sentences rather than free recogniton –Measure log-prob instead of recogn performance How to model Compensation which is the main issue