Research & Technology Progress in the framework of the RESPITE project at DaimlerChrysler Research & Technology Dr-Ing. Fritz Class and Joan Marí Sheffield,

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Aggregating local image descriptors into compact codes
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
An Overview of Machine Learning
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Advances in WP2 Nancy Meeting – 6-7 July
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Adaptation for Vowel Classification
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Speech and Language Processing
Perception Vision, Sections Speech, Section 24.7.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Research & Technology Experiments on different feature sets; comparison with DC baseline system RESPITE workshop Jan Martigny Joan Mari Hilario.
Speech recognition and the EM algorithm
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Algoritmi e Programmazione Avanzata
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Power Linear Discriminant Analysis (PLDA) M. Sakai, N. Kitaoka and S. Nakagawa, “Generalization of Linear Discriminant Analysis Used in Segmental Unit.
1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology.
ISL Meeting Recognition Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Yue Pan, Sze-Chen Jou Interactive Systems Laboratories.
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
2D-LDA: A statistical linear discriminant analysis for image matrix
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Statistical Models for Automatic Speech Recognition
Course Projects Speech Recognition Spring 1386
Tracking parameter optimization
Classification Discriminant Analysis
Classification Discriminant Analysis
ECE539 final project Instructor: Yu Hen Hu Fall 2005
CRANDEM: Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
Decision Making Based on Cohort Scores for
Generally Discriminant Analysis
Presentation transcript:

Research & Technology Progress in the framework of the RESPITE project at DaimlerChrysler Research & Technology Dr-Ing. Fritz Class and Joan Marí Sheffield, June 2002

Research & Technology Contents DaimlerChrysler off-line demonstrator Block-diagram of our off-line demonstrator Evaluation experiments using our demonstrator On-going research in Discriminative Feature Extraction TANDEM acoustic modelling Clustering of HMM-states to define discriminative feature space British-English recognizer „Online demonstrator“

Research & Technology DC off-line demonstrator: block-diagram DC ASR system CTK/QUICKNET/MSTK

Research & Technology DC off-line demonstrator: results on AURORA 2000

Research & Technology Discriminative Feature Extraction: TANDEM Acoustic Modelling The TANDEM approach finds a feature space transform to reduce the dimensionality while preserving important classification information Tipically, dimensionality is reduced from N to n (N>>n), where n is the number of phones in the digit set, and encode thus essential classification information Similar to LDA concept, but with a different criterion to be minimised and of course a non-linear mapping instead Tipically LDA transform is found by assigning to each HMM-state a high dimensional gaussian distribution, and minimising a criterion based on the inter- and intra-state scatter covariance matrices computed from the state- gaussians Both approaches can be theoretically linked using Bayes Classifier Theory

Research & Technology Discriminative Feature Extraction: Clustering HMM-states to define discriminative feature space The idea is to cluster HMM- states which have similar mixtures of gaussians, because then the state-likelihoods of those states for a given frame will be similar This target-clusters encode the essential classification information By applying a mapping, a new state-cluster alignment is obtained, which can be used to train NN for discriminative feature extraction

Research & Technology Discriminative Feature Extraction: results on AURORA 2000

Research & Technology newly developed with latest perceptions out of RESPITE work training with inhouse real car database; about 1000 native english speakers (60% male, 40% female) 16 khz sampling rate training: - vocabulary: commands for handling car functions like car-phone, audio (adio, cd, climate,...), navigation system; city and street names; digit strings; spelling; longer sentences (conversation) - pre-version; optimizations are under work tests: - test set 3800 utterances, not included in training set - digit strings (2300), spelling (1000), city/street names (500) - test vocabulary: 350 words - tests on commands are under work British-English recognizer

Research & Technology British-English recognizer: pre-version results (% word error rate)