Digital Systems: Hardware Organization and Design

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Data Mining Techniques: Clustering
Clustering CMPUT 466/551 Nilanjan Ray. What is Clustering? Attach label to each observation or data points in a set You can say this “unsupervised classification”
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Spatial and Temporal Data Mining
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Presented by Tienwei Tsai July, 2005
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Machine Learning Queens College Lecture 7: Clustering.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Vector Quantization CAP5015 Fall 2005.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
S.R.Subramanya1 Outline of Vector Quantization of Images.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Machine Learning Clustering: K-means Supervised Learning
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
Digital Systems: Hardware Organization and Design
CSE 5243 Intro. to Data Mining
K-means and Hierarchical Clustering
Digital Systems: Hardware Organization and Design
John Nicholas Owen Sarah Smith
Digital Systems: Hardware Organization and Design
Revision (Part II) Ke Chen
Clustering and Multidimensional Scaling
Information Organization: Clustering
KAIST CS LAB Oh Jong-Hoon
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
4.0 More about Hidden Markov Models
Revision (Part II) Ke Chen
CS 188: Artificial Intelligence Fall 2008
Digital Systems: Hardware Organization and Design
Foundation of Video Coding Part II: Scalar and Vector Quantization
Dynamic Time Warping and training methods
Data Mining – Chapter 4 Cluster Analysis Part 2
Clustering Wei Wang.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Cluster Analysis.
Clustering Techniques
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
EM Algorithm and its Applications
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Introduction to Machine learning
Presentation transcript:

Digital Systems: Hardware Organization and Design 9/17/2018 Speech Recognition Vector Quantization and Clustering Architecture of a Respresentative 32 Bit Processor

Vector Quantization and Clustering Digital Systems: Hardware Organization and Design 9/17/2018 Vector Quantization and Clustering Introduction K-means clustering Clustering issues Hierarchical clustering Divisive (top-down) clustering Agglomerative (bottom-up) clustering Applications to speech recognition 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 Acoustic Modeling Signal Representation Vector Quantization Symbols Feature Vectors Waveform Signal representation produces feature vector sequence Multi-dimensional sequence can be processed by: Methods that directly model continuous space Quantizing and modeling of discrete symbols Main advantages and disadvantages of quantization: Reduced storage and computation costs Potential loss of information due to quantization 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Vector Quantization (VQ) Digital Systems: Hardware Organization and Design 9/17/2018 Vector Quantization (VQ) Used in signal compression, speech and image coding More efficient information transmission than scalar quantization (can achieve less that 1 bit/parameter) Used for discrete acoustic modeling since early 1980s Based on standard clustering algorithms: Individual cluster centroids are called codewords Set of cluster centroids is called a codebook Basic VQ is K-means clustering Binary VQ is a form of top-down clustering (used for efficient quantization) 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 VQ & Clustering Clustering is an example of unsupervised learning Number and form of classes {Ci} unknown Available data samples {xi} are unlabeled Useful for discovery of data structure before classification or tuning or adaptation of classifiers Results strongly depend on the clustering algorithm 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Acoustic Modeling Example Digital Systems: Hardware Organization and Design 9/17/2018 Acoustic Modeling Example 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues What defines a cluster? Is there a prototype representing each cluster? What defines membership in a cluster? What is the distance metric, d(x; y)? How many clusters are there? Is the number of clusters picked before clustering? How well do the clusters represent unseen data? How is a new data point assigned to a cluster? 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 K-Means Clustering Used to group data into K clusters, {C1,… ,CK} Each cluster is represented by mean of assigned data Iterative algorithm converges to a local optimum: Select K initial cluster means, {µ1,… ,µK} Iterate until stopping criterion is satisfied: Assign each data sample to the closest cluster X∈Ci; d(x;µi)≤d(x;µj), ∀i≠j Update K means from assigned samples µi = E(x), X∈Ci; 1 ≤ i ≤ K Nearest neighbor quantizer used for unseen data 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 K-Means Example: K = 3 Random selection of 3 data samples for initial means Euclidean distance metric between means and samples 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 K-Means Properties Usually used with a Euclidean distance metric The total distortion, D, is the sum of squared error D decreases between nth and n + 1st iteration Also known as Isodata, or generalized Lloyd algorithm Similarities with Expectation-Maximization (EM) algorithm for learning parameters from unlabeled data 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

K-Means Clustering: Initialization Digital Systems: Hardware Organization and Design 9/17/2018 K-Means Clustering: Initialization K-means converges to a local optimum Global optimum is not guaranteed Initial choices can influence final result Initial K-means can be chosen randomly Clustering can be repeated multiple times Hierarchical strategies often used to seed clusters Top-down (divisive) (e.g., binary VQ) Bottom-up (agglomerative) 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

K-Means Clustering: Stopping Criterion Digital Systems: Hardware Organization and Design 9/17/2018 K-Means Clustering: Stopping Criterion Many criterion can be used to terminate K-means : No changes in sample assignments Maximum number of iterations exceeded Change in total distortion, D, falls below a threshold 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Acoustic Clustering Example Digital Systems: Hardware Organization and Design 9/17/2018 Acoustic Clustering Example 12 clusters, seeded with agglomerative clustering Spectral representation based on auditory-model 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Clustering Issues: Number of Clusters Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues: Number of Clusters In general, the number of clusters is unknown Dependent on clustering criterion, space, computation or distortion requirements, or on recognition metric 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Clustering Issues: Clustering Criterion Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues: Clustering Criterion The criterion used to partition data into clusters plays a strong role in determining the final results 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Distance Threshold 17 September 2018 Veton Këpuska

Clustering Issues: Distance Metrics Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues: Distance Metrics A distance metric usually has the properties: 0 ≤ d(x;y) ≤ 1 d(x;y) = 0 iff x = y d(x;y) = d(y;x) d(x;y) ≤ d(x;z) + d(y;z) d(x+z;y+z) = d(x;y) (invariant) In practice, distance metrics may not obey some of these properties but are a measure of dissimilarity 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Clustering Issues: Distance Metrics Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues: Distance Metrics Distance metrics strongly influence cluster shapes: Normalized dot-product: Euclidean: Weighted Euclidean: Minimum distance (chain): min d(x; xi); xi∈Ci Representation specific 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Clustering Issues: Impact of Scaling Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues: Impact of Scaling Scaling feature vector dimensions can significantly impact clustering results Scaling can be used to normalize dimensions so a simple distance metric is a reasonable criterion for similarity 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Clustering Issues: Training and Test Data Digital Systems: Hardware Organization and Design 9/17/2018 Clustering Issues: Training and Test Data Training data performance can be arbitrarily good e.g., Independent test data needed to measure performance Performance can be measured by distortion, D, or some more relevant speech recognition metric Robust training will degrade minimally during testing Good training data closely matches test conditions Development data are often used for refinements, since through iterative testing they can implicitly become a form of training data 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Alternative Evaluation Criterion: LPC VQ Example Digital Systems: Hardware Organization and Design 9/17/2018 Alternative Evaluation Criterion: LPC VQ Example 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Hierarchical Clustering Digital Systems: Hardware Organization and Design 9/17/2018 Hierarchical Clustering Clusters data into a hierarchical class structure Top-down (divisive) or bottom-up (agglomerative) Often based on stepwise-optimal, or greedy, formulation Hierarchical structure useful for hypothesizing classes Used to seed clustering algorithms such as K-means 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 Divisive Clustering Creates hierarchy by successively splitting clusters into smaller groups On each iteration, one or more of the existing clusters are split apart to form new clusters The process repeats until a stopping criterion is met Divisive techniques can incorporate pruning and merging heuristics which can improve the final result 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example of Non-Uniform Divisive Clustering Digital Systems: Hardware Organization and Design 9/17/2018 Example of Non-Uniform Divisive Clustering 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example of Uniform Divisive Clustering Digital Systems: Hardware Organization and Design 9/17/2018 Example of Uniform Divisive Clustering 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Divisive Clustering Issues Digital Systems: Hardware Organization and Design 9/17/2018 Divisive Clustering Issues Initialization of new clusters Random selection from cluster samples Selection of member samples far from center Perturb dimension of maximum variance Perturb all dimensions slightly Uniform or non-uniform tree structures Cluster pruning (due to poor expansion) Cluster assignment (distance metric) Stopping criterion Rate of distortion decrease Cannot increase cluster size 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Divisive Clustering Example: Binary VQ Digital Systems: Hardware Organization and Design 9/17/2018 Divisive Clustering Example: Binary VQ Often used to create M = 2B size codebook (B bit codebook, codebook size M) Uniform binary divisive clustering used On each iteration each cluster is divided in two K-means used to determine cluster centroids Also known as LBG (Linde, Buzo, Gray) algorithm A more efficient version does K-means only within each binary split, and retains tree for efficient lookup 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Agglomerative Clustering Digital Systems: Hardware Organization and Design 9/17/2018 Agglomerative Clustering Structures N samples or seed clusters into a hierarchy On each iteration, the two most similar clusters are merged together to form a new cluster After N - 1 iterations, the hierarchy is completen The dendogram is at level k when C = N-k+1 N – total number of samples C – Number of clusters Structure displayed in the form of a dendrogram By keeping track of the similarity score when new clusters are created, the dendrogram can often yield insights into the natural grouping of the data 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Hierachical Clustering 17 September 2018 Veton Këpuska

Dendrogram Example (One Dimension) Digital Systems: Hardware Organization and Design 9/17/2018 Dendrogram Example (One Dimension) 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Agglomerative Clustering Issues Digital Systems: Hardware Organization and Design 9/17/2018 Agglomerative Clustering Issues Measuring distances between clusters Ci and Cj with respective number of tokens ni and nj Average distance: Maximum distance (compact): Minimum distance (chain): Distance between two representative vectors of each cluster such as their means: d(µi;µj) 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Stepwise-Optimal Clustering Digital Systems: Hardware Organization and Design 9/17/2018 Stepwise-Optimal Clustering Common to minimize increase in total distortion on each merging iteration: stepwise-optimal or greedy On each iteration, merge the two clusters which produce the smallest increase in distortion Distance metric for minimizing distortion, D, is: Tends to combine small clusters with large clusters before merging clusters of similar sizes 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Clustering for Segmentation Digital Systems: Hardware Organization and Design 9/17/2018 Clustering for Segmentation 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 Speaker Clustering 23 female and 53 male speakers from TIMIT corpus Vector based on F1 and F2 averages for 9 vowels Distance d(Ci,Cj) is average of distances between members 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 Velar Stop Allophones 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Velar Stop Allophones (con’t) Digital Systems: Hardware Organization and Design 9/17/2018 Velar Stop Allophones (con’t) 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Acoustic-Phonetic Hierarchy Digital Systems: Hardware Organization and Design 9/17/2018 Acoustic-Phonetic Hierarchy 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 Word Clustering 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 VQ Applications Usually used to reduce computation Can be used alone for classification Used in dynamic time warping (DTW) and discrete hidden Markov models (HMMs) Multiple codebooks are used when spaces are statistically independent (product codebooks) Matrix codebooks are sometimes used to capture correlation between successive frames Used for semi-parametric density estimation (e.g., semi-continuous mixtures) 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/17/2018 References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001. A. Gersho and R. Gray, Vector Quantization and Signal Compression, Kluwer Academic Press, 1992. R. Gray, Vector Quantization, IEEE ASSP Magazine, 1(2), 1984. B. Juang, D. Wang, A. Gray, Distortion Performance of Vector Quantization for LPC Voice Coding, IEEE Trans ASSP, 30(2), 1982. J. Makhoul, S. Roucos, H. Gish, Vector Quantization in Speech Coding, Proc. IEEE, 73(11), 1985. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993. 17 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor