Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo
Vector Quantization Quantization Represents continuous range of values by a set of discrete values Example: floating-point representation of real numbers in computer Vector Quantization Represent a data space (vector space) by discrete set of vectors
Vector Quantizer A mapping from vector space onto a finite subset of the vector space Y = y 1,y 2,…,y N finite subset of IR k, referred to as the codebook of Q Q is usually determined by training data
Partition of Vector Quantizer The vector space is partitioned into N cells by the vector quantizer
Properties of Partition Vector quantizer Q defines a complete and disjoint partition of IR k into R 1,R 2,…,R N
Quantization Error Quantization Error for single vector x is a suitable distance measure Overall Quantization Error
Nearest-Neighbor Condition The minimum quantization error of a given codebook Y is given by partition x y1y1 y2y2 y3y3
Centroid Condition Centroid of a cell R i is minimized by choosing as the codebook For Euclidean distance d Centroid
Vector Quantizer Design – General Steps 1. Determine initial codebook Y 0 2. Adjust partition of sample data for the current codebook Y m using nearest-neighbor condition 3. Update the codebook Y m → Y m+1 using centroid condition 4. Check a certain condition of convergence. If it converges, return the current codebook Y m+1 ; otherwise go to step 2 Converge? Y N
Lloyd’s Algorithm
Lloyd’s Algorithm (Cont)
LBG Algorithm
LBG Algorithm (Cont)
k-Means Algorithm
k-Means Algorithm (Cont)
Mixture Density Model A mixture model of N random variables X 1,…,X N is defined as follows: is a random variables defined on N labels
Mixture Density Model Suppose the p.d.f.’s of X 1,…,X N are and then
Example: Gaussian Mixture Model of Two Components Histogram of samplesMixture Density
Estimation of Gaussian Mixture Model ML Estimation (Value X and label are given) Samples in the format of (-0.39, 0), (0.12, 0), (0.94, 1), (1.67, 0), (1.76, 1), … S 1 (Subset of ): (0.94, 1), (1.76, 1), … S 2 (Subset of ): (-0.39, 0), (0.12, 0), (1.67, 0), …
Estimation of Gaussian Mixture Model EM Algorithm (Value X is given, label is unknown) 1. Choose initial values of 2. E-Step: For each sample x k, label is missing. But we can estimate using its expected value Samples in the format of ; is missing (-0.39), (0.12), (0.94), (1.67), (1.76), … x1x1 x2x2
Estimation of Gaussian Mixture Model EM Algorithm (Value X is given, label is unknown) 3. M-Step: We can estimate again using the labels estimated in the E-Step:
Estimation of Gaussian Mixture Model EM Algorithm (Value X is given, label is unknown) 4. Termination: The log likelihood of n samples At the end of m-th iteration, if terminate; otherwise go to step 2 (the E-Step).