1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Acoustic model adaptation for telephone-based speech recognition N. Kleynhans and E. Barnard 27 January 2010.
ViSiCAST Consortium Meeting, January WP3 Report, September-December 2001.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
The SRI 2006 Spoken Term Detection System Dimitra Vergyri, Andreas Stolcke, Ramana Rao Gadde, Wen Wang Speech Technology & Research Laboratory SRI International,
SRI 2001 SPINE Evaluation System Venkata Ramana Rao Gadde Andreas Stolcke Dimitra Vergyri Jing Zheng Kemal Sonmez Anand Venkataraman.
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang, Xin Lei, Wen Wang*, Takahiro Shinozaki University of Washington, *SRI 9/19/2006,
Incorporating Tone-related MLP Posteriors in the Feature Representation for Mandarin ASR Overview Motivation Tone has a crucial role in Mandarin speech.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
May 30th, 2006Speech Group Lunch Talk Features for Improved Speech Activity Detection for Recognition of Multiparty Meetings Kofi A. Boakye International.
Introduction to Automatic Speech Recognition
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Speech Recognition Application
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Speaker Diarisation and Large Vocabulary Recognition at CSTR: The AMI/AMIDA System Fergus McInnes 7 December 2011 History – AMI, AMIDA and recent developments.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.
11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas.
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE NEURAL NETWORKS RESEARCH CENTRE The development of the HTK Broadcast News.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
SCALING UP: LEARNING LARGE-SCALE RECOGNITION METHODS FROM SMALL-SCALE RECOGNITION TASKS Nelson Morgan, Barry Y Chen, Qifeng Zhu, Andreas Stolcke International.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Microsoft The State of the Art in ASR (and Beyond?) Cambridge University Engineering Department Speech Vision and Robotics Group Steve Young Microsoft.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Performance Comparison of Speaker and Emotion Recognition
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Experiments in Adaptive Language Modeling Lidia Mangu & Geoffrey Zweig.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
ECE 8443 – Pattern Recognition EE 8524 – Speech Signal Processing Objectives: Word Graph Generation Lattices Hybrid Systems Resources: ISIP: Search ISIP:
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Dec. 4-5, 2003EARS STT Workshop1 Broadcast News Training Experiments Anand Venkataraman, Dimitra Vergyri, Wen Wang, Ramana Rao Gadde, Martin Graciarena,
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Page 1 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Sanjay Patil, Jun-Won Suh Human and Systems Engineering Experimental.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Gist of Achieving Human Parity in Conversational Speech Recognition
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Juicer: A weighted finite-state transducer speech decoder
The Development of the AMI System for the Transcription of Speech in Meetings Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Giulia Garau, Martin.
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Sphinx Recognizer Progress Q2 2004
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Learning Long-Term Temporal Features
Presenter : Jen-Wei Kuo
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan

2M4 speech recognition Status at last meeting No working speech recognition system Use HTK for speech recognition –No large vocabulary decoder with HTK Use DUcoder instead of HVite –Provided by TUM Use SRI LM toolkit for language modelling Build initial models using SWITCHBOARD and ICSI meetings data

3M4 speech recognition Software limitations Acoustic modelling –HTK 3.2 No efficient large vocabulary decoder Use only bigram language models HVite, a time synchronous decoder, is slow Decoding –Ducoder Based on HTK version 2 Only capable of word internal context dependent triphone model decoding Trade-off between cross word triphone models and trigram language model appears necessary!

4M4 speech recognition System Architecture Front end n -best lattice generation Best first decoding (Ducoder) Trigram language model (SRILM) Word internal triphone models MLLR adaptation (HTK) Cross word triphone models Recognition output Lattice rescoring Time synchronous decoding (HTK)

5M4 speech recognition System limitations N-best list rescoring not optimal Many more hyper-parameters to tune manually Adaptation must be performed on two sets of acoustic models

6M4 speech recognition Method comparison Ducoder/HTK system compared to a pure HTK system: –Pure HTK: 61.51% wer –Cross word triphones, bigram LM –Decoding time: over a month –Ducoder/HTK: 60.29% wer –Word internal triphones & trigram LM, rescored using cross word triphones –Decoding time: a few days Early comparison result obtained using Switchboard data

7M4 speech recognition Current recognisers SWITCHBOARD recogniser –Acoustic & language models trained on 200 hours of speech ICSI meetings recogniser –Acoustic models trained on 40 hours of speech –Language model is a combination of SWB and ICSI

8M4 speech recognition ResultsResults RecogniserSWITCHBOARDICSI meetings Test on same data without adaptation 55.05% WER52.34% WER Test on same data with speaker adaptation N/A49.27% WER Test on M4 data without adaptation 88.14% WER74.00% WER Test on M4 data with unsupervised adaptation 84.67% WERN/A

9M4 speech recognition Areas for immediate improvement Increase number of lattices in n-best list Word internal triphone models need adaptation Fully supervised speaker dependent adaptation Vocal tract length normalisation

10M4 speech recognition Coming soon Release current models and scripts Merged Switchboard and ICSI acoustic models ASR transcriptions of the M4 meetings