VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.

Slides:



Advertisements
Similar presentations
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Advertisements

Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Data Clustering Methods
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Lecture 28: Bag-of-words models
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.
Bag-of-features models
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
A Multiresolution Symbolic Representation of Time Series
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Exercise Session 10 – Image Categorization
So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
1 An Efficient VQ-based Data Hiding Scheme Using Voronoi Clustering Authors:Ming-Ni Wu, Puu-An Juang, and Yu-Chiang Li.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Ch 5b: Discriminative Training (temporal model) Ilkka Aho.
A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.
Chapter 4: Feature representation and compression
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Vector Quantization CAP5015 Fall 2005.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Faculty of Information Engineering, Shenzhen University Liao Huilian SZU TI-DSPs LAB Aug 27, 2007 Optimizer based on particle swarm optimization and LBG.
DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch.
Genetic Algorithms for clustering problem Pasi Fränti
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
S.R.Subramanya1 Outline of Vector Quantization of Images.
Ch. 4: Feature representation
ARTIFICIAL NEURAL NETWORKS
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
Digital Systems: Hardware Organization and Design
Ch. 4: Feature representation
Clustering 77B Recommender Systems
4.0 More about Hidden Markov Models
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Dynamic Time Warping and training methods
Simple Kmeans Examples
Handwritten Characters Recognition Based on an HMM Model
ROBOT CONTROL WITH VOICE
Presentation transcript:

VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系

Construct VQ-based Classifiers  Design phase  Use k-means or generalized Lloyd algorithms to design a codebook (which contains a set of code words, centroids, or prototypical vectors) for each class.  Application phase  For a given utterance, find the min. average distortion scores among all classes.

Vector Quantization (VQ)  Advantages of VQ for classifier designs:  Reduced storage  Reduced computation  Efficient representation that improves generalization capability

ASR without Time Alignment  Good for simple vocabularies of highly distinct words, such as English digits  Bad for vocabularies  With words that can only be distinguished by their temporal sequential characteristics, such as “ car ” and “ rack ”, “ we ” and “ you ”  With complex speech classes that encompass long utterances with rich phonetic contents

Centroid Computation in VQ  The codebook for a class should be designed to minimize the average distortion. Different distortions lead to different codebook designs.  L2 distance  mean vector  Mahalanobis distance  mean vector  L1 distance  median vector  Other distortions and corresponding centroid computation can be found in of “ Fundamental of Speech Recognition ” by L. Rabiner and B.-H. Juang.

Other Uses of VQ in ASR  VQ can be used to speed up DTW-based comparison:  VQ-based classifiers can be used as a preprocessor for DTW-based classifiers.  VQ can be used for efficient computation of (approximate) DTW distance.

DTW for Speaker-independent Tasks  Basic procedures to adapt DTW for speaker-independent tasks:  Massive recordings  Use MFCC with vocal tract length normalization  Use modified K-means to find cluster centers (or representative utterances) for each class

Modified K-means  Goal: Find representative objects in a set with elements in a non-Euclidean space  Steps:  1. Select initial centers  2. Label each utterance to a cluster  3. Revise cluster centers:  Minimax centers  Pseudoaverage centers  Segmental version of the above two (segmental VQ)  4. Go back to step 2 till converge

Segmental Vector Quantization Utterance 1 Utterance 2 Utterance 3 Utterance n Codebook 1Codebook 2Codebook Utterances of the same class