Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
K-means method for Signal Compression: Vector Quantization
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Mutual Information Mathematical Biology Seminar
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Chapter 4: Trees Radix Search Trees Lydia Sinapova, Simpson College Mark Allen Weiss: Data Structures and Algorithm Analysis in Java.
Spatial and Temporal Data Mining
Reduced Support Vector Machine
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Vector Quantization. 2 outline Introduction Two measurement : quality of image and bit rate Advantages of Vector Quantization over Scalar Quantization.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Fast vector quantization image coding by mean value predictive algorithm Authors: Yung-Gi Wu, Kuo-Lun Fan Source: Journal of Electronic Imaging 13(2),
Gene expression & Clustering (Chapter 10)
So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Theory and Applications of GF(2 p ) Cellular Automata P. Pal Chaudhuri Department of CST Bengal Engineering College (DU) Shibpur, Howrah India (LOGIC ON.
Professor: S. J. Wang Student : Y. S. Wang
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
CSIE Dept., National Taiwan Univ., Taiwan
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Soft Computing Lecture 14 Clustering and model ART.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.
Chapter 4: Feature representation and compression
Speed improvements to information retrieval-based dynamic time warping using hierarchical K-MEANS clustering Presenter: Kai-Wun Shih Gautam Mantena 1,2.
Machine Learning Queens College Lecture 7: Clustering.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Vector Quantization CAP5015 Fall 2005.
Faculty of Information Engineering, Shenzhen University Liao Huilian SZU TI-DSPs LAB Aug 27, 2007 Optimizer based on particle swarm optimization and LBG.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Genetic Algorithms for clustering problem Pasi Fränti
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
S.R.Subramanya1 Outline of Vector Quantization of Images.
Ch. 4: Feature representation
Agglomerative clustering (AC)
ARTIFICIAL NEURAL NETWORKS
Clustering and Segmentation
Image Compression using Vector Quantization
Digital Systems: Hardware Organization and Design
Ch. 4: Feature representation
Clustering 77B Recommender Systems
4.0 More about Hidden Markov Models
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Foundation of Video Coding Part II: Scalar and Vector Quantization
Dynamic Time Warping and training methods
Handwritten Characters Recognition Based on an HMM Model
Density-Based Image Vector Quantization Using a Genetic Algorithm
Scalable light field coding using weighted binary images
Presentation transcript:

Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ

9.1 Basic idea of DTW (1)  The frames of T and R i are not exactly corresponding. They will have a non-linear corresponding such that the total distance will be minimal. This is a natural idea.  Suppose T(i) is frame i for test utterance (1<=i<=N T ), R k (j) is frame j for reference utterance k in the vocabulary or R(j) for short(1<=j<=N k ). T(i) and R(j) are both vector. d(i,j) is the distance between T(i) and R(j).

Basic idea of DTW (2)  If a set of (i,j) pair could be found and the total distance D along these points (or the path) will be minimal :  D = min Σd(i,j) (i,j) ∈ path  Suppose the point on the path is (n i,m i ); the path has some constraints : (n 1,m 1 ) = (1,1) and (n N,m N ) = (N T,N k ) (N=N T ); for limiting the calculation, it is supposed that N k /2 <= N T <= 2N k for any k

Basic idea of DTW (3)  So the average slope of the path will be in 0.5~2.0. For meeting that, if the current point is (n i,m i ), the next point will be: (n i +1,m i +2) or (n i +1,m i +1) or (n i +1, m i ), the last one is possible only if m i-1 != m i.  Sometime the initial point could be floating to get better matching.

DTW algorithm (1)  DTW means Dynamic Time Warping.  It uses Dynamic Programming method to implement the idea described in 9.1.  The algorithm could be described like following: (1) i=1,j=1, d[i,j] = d(T(i),R(j)) (2) if ++i <=N T, calculate j l and j h according to the constraint condition, and calculate d(T(i),R(j)) for j=j l to j=j h ; (3) For all (i,j l ) to (i,j h ) do

DTW algorithm (2) D[i,j] = d(T(i),R(j)) + D[i-1,j’] j’ could be j-2,j-1 or j determined by D[i-1,j’] = min { D[i-1,j],D[i-1,j-1],D[i-1,j-2] } Store D[i,j] and j’(i) (4)When i>N T, stop. D[N T,j’] = min k D[N T,k], k=N k, N k -1 or N k -2 (5) Start from (N T,j’) to backtrace the points on the path by j’(N T ) and get the path.

DTW algorithm (3)  The D[N,j’] will be the distance between T and a reference R k. By using same procedure to get all distances between T and R i ’s and use minimal distance principle we can easily determine the best matched word for input word.  This algorithm was used often before the HMM being used. The disadvantage is the large computing time. To overcome it, people figure out the VQ algorithm.

9.3 Basic idea of VQ algorithm (1)  VQ stands for Vector Quantization to contrast to scalar quantization.  The basic idea is to partition the whole feature space into a certain number (2 n ) of regions, and use the center of a region to represent any vector falling into the region. The calculation will become to looking up distance table if the table is pre-calculated before recognition. It will save some computing time.

Basic idea of VQ algorithm (2)  For doing that it uses some clustering algorithm like k-means algorithm to iteratively get the clustering centers and the membership of vectors until convergence.  The clustering centers will form the codebook. The membership of a vector will be a code label according to the minimal distance principle. Every vector will become a code label.

Basic idea of VQ algorithm (3)  The distance between two vectors will be represented by the distance between two centers. It could be pre-calculated as soon as the codebook is obtained. So during the recognition whenever the code is obtained the calculation will be a kind of looking up table operation.

Basic idea of VQ algorithm (4)  It will speed up the DTW process.  VQ get application not only in speech recognition but also in speech synthesis and speech coding.  VQ has also many applications in field of multimedia for data compression. Of course in this case there are some errors for restoration.

The LBG Algorithm (1)  (1) S = { x } is the set of all vector samples  (2) Set maximal number of iteration L  (3) Set threshold t  (4) Set m initial centers y 1 (0), y 2 (0), …, y m (0)  (5) Set initial distortion D(0) =∞  (6) Set iteration number k = 1  (7) Make partition of all samples into S 1, S 2, …, S m according to minimum distance principle.

The LBG Algorithm (2)  (8) Calculate the total distortion : D(k) = Σ i=1 m Σ x d(x,y i (k-1)) x ∈ S i (k)  (9) Calculate the relative improvement of distortion δ(k) = |D(k)-D(k-1)|/D(k)  (10) Calculate the new centers(codewords) y i (k)=(Σx)/N i (k), x ∈ S i (k), i=1~m  (11) if δ(k) < t goto (13) else goto (12)

The LBG Algorithm (3)  (12) if (++k<L) goto (7) else goto (13)  (13) output codewords y 1 (k), y 2 (k), …, y m (k) and D(k)  (14) end  The partition mode is called ‘Voronoi’ partition. It implied that for every iteration the total intra-class distances will be reduced.

The LBG Algorithm (4)  Approaches for setting the initial codebook  (1) Random initial codebook  It takes the initial centers arbitrarily. It might not be so good. We can ask the new center must have a distance to all other centers being larger than a threshold, and in this way the m initial centers will be set.

The LBG Algorithm (5)  (2) Splitting approach  At first get the center y 0 of all samples, then get y 1 with maximal distance to the y 0 and y 2 with maximal distance to y 1. Then get S 1 and S 2 for y 1 and y 2.  By using same way for S 1 and S 2, we can split them into 4 sets. The loop can be done until m=2 B initial centers are obtained by B iterations.  This is an often used way.

The improvement of VQ (1)  (1)VQ system by tree search (codebook structure) When doing recognition by VQ, for every vector a search is needed for getting its code or label. In general we need search for every center, it takes time. If we can create a tree of codebook and keep all levels of the codewords, then the search will be easy. The cost is about double storages.

The improvement of VQ (2)  It could be realized like this : At first a codebook of capacity 2 is generated: y 0 and y 1 and corresponding subset. Then for every these subsets the next level of centers are created : y 00,y 01,y 10,and y 11. This is the second level.

The improvement of VQ (3)  By repeating k steps a k levels tree will be created. It will have 2 k codewords.  The advantages are : search amount will be down to 2k (vs 2 k ) distance calculation and k (vs 2 k -1) comparison. Also the training amount will be reduced (every splitting only concerns two- codeword codebook)  The disadvantages are : average distortion is worse than the full search codebook and double storages.  Besides binary tree, other number of codewords for a level could be used.

The improvement of VQ (4)  (2) Tree codebook formed by full search codebook At first a full search codebook is created by LBG algorithm. Then m codewords are divided into m/2 pairs by minimal distance, and the center is found. Then next up level is created by same way. After k steps the tree is formed. It is better than previous, but some time it may make mistakes.

The improvement of VQ (5)  (3) Multi levels of VQ system  (4) Split VQ (suitable for LSP vector)  (5) Fast search in full search system