Digital Systems: Hardware Organization and Design

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

CS479/679 Pattern Recognition Dr. George Bebis

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Visual Recognition Tutorial

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Summarized by Soo-Jin Kim

Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.

Presented By Wanchen Lu 2/25/2013

Principles of Pattern Recognition

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

CS479/679 Pattern Recognition Dr. George Bebis

Chapter 3: Maximum-Likelihood Parameter Estimation

Deep Feedforward Networks

Ch 12. Continuous Latent Variables ~ 12

LECTURE 11: Advanced Discriminant Analysis

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Principle Component Analysis (PCA) Networks (§ 5.8)

University of Ioannina

LECTURE 10: DISCRIMINANT ANALYSIS

Classification of unlabeled data:

Statistical Models for Automatic Speech Recognition

Principal Component Analysis (PCA)

Digital Systems: Hardware Organization and Design

Pattern Classification, Chapter 3

Digital Systems: Hardware Organization and Design Speaker Recognition

Digital Systems: Hardware Organization and Design

Digital Systems: Hardware Organization and Design

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Statistical Models for Automatic Speech Recognition

Principal Component Analysis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

EE513 Audio Signals and Systems

INTRODUCTION TO Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Feature space tansformation methods

Generally Discriminant Analysis

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Principal Components What matters most?.

LECTURE 09: DISCRIMINANT ANALYSIS

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Parametric Methods Berlin Chen, 2005 References:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Presentation transcript:

Digital Systems: Hardware Organization and Design 1/14/2019 Speech Recognition Pattern Classification 2 Architecture of a Respresentative 32 Bit Processor

Pattern Classification Digital Systems: Hardware Organization and Design 1/14/2019 Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Semi-Parametric Classifiers Digital Systems: Hardware Organization and Design 1/14/2019 Semi-Parametric Classifiers Mixture densities Maximum Likelihood (ML) parameter estimation Mixture implementations Expectation maximization (EM) 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 Mixture Densities PDF is composed of a mixture of m components densities {1,…,2}: Component PDF parameters and mixture weights P(j) are typically unknown, making parameter estimation a form of unsupervised learning. Gaussian mixtures assume Normal components: 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixture Example: One Dimension Digital Systems: Hardware Organization and Design 1/14/2019 Gaussian Mixture Example: One Dimension p(x)=0.6p1(x)+0.4p2(x) p1(x)~N(-,2) p2(x) ~N(1.5,2) 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 Gaussian Example First 9 MFCC’s from [s]: Gaussian PDF 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 Independent Mixtures [s]: 2 Gaussian Mixture Components/Dimension 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 Mixture Components [s]: 2 Gaussian Mixture Components/Dimension 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ML Parameter Estimation: 1D Gaussian Mixture Means Digital Systems: Hardware Organization and Design 1/14/2019 ML Parameter Estimation: 1D Gaussian Mixture Means 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixtures: ML Parameter Estimation Digital Systems: Hardware Organization and Design 1/14/2019 Gaussian Mixtures: ML Parameter Estimation The maximum likelihood solutions are of the form: 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixtures: ML Parameter Estimation Digital Systems: Hardware Organization and Design 1/14/2019 Gaussian Mixtures: ML Parameter Estimation The ML solutions are typically solved iteratively: Select a set of initial estimates for P(k), µk, k Use a set of n samples to re-estimate the mixture parameters until some kind of convergence is found Clustering procedures are often used to provide the initial parameter estimates Similar to K-means clustering procedure ˆ ˆ ˆ 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example: 4 Samples, 2 Densities Digital Systems: Hardware Organization and Design 1/14/2019 Example: 4 Samples, 2 Densities Data: X = {x1,x2,x3,x4} = {2,1,-1,-2} Init: p(x|1)~N(1,1), p(x|2)~N(-1,1), P(i)=0.5 Estimate: Recompute mixture parameters (only shown for 1): x1 x2 x3 x4 P(1|x) 0.98 0.88 0.12 0.02 P(2|x) p(X)  (e-0.5 + e-4.5)(e0 + e-2)(e0 + e-2)(e-0.5 + e-4.5)0.54 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example: 4 Samples, 2 Densities Digital Systems: Hardware Organization and Design 1/14/2019 Example: 4 Samples, 2 Densities Repeat steps 3,4 until convergence. 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

[s] Duration: 2 Densities Digital Systems: Hardware Organization and Design 1/14/2019 [s] Duration: 2 Densities 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixture Example: Two Dimensions Digital Systems: Hardware Organization and Design 1/14/2019 Gaussian Mixture Example: Two Dimensions 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Two Dimensional Mixtures... Digital Systems: Hardware Organization and Design 1/14/2019 Two Dimensional Mixtures... 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Two Dimensional Components Digital Systems: Hardware Organization and Design 1/14/2019 Two Dimensional Components 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Mixture of Gaussians: Implementation Variations Digital Systems: Hardware Organization and Design 1/14/2019 Mixture of Gaussians: Implementation Variations Diagonal Gaussians are often used instead of full-covariance Gaussians Can reduce the number of parameters Can potentially model the underlying PDF just as well if enough components are used Mixture parameters are often constrained to be the same in order to reduce the number of parameters which need to be estimated Richter Gaussians share the same mean in order to better model the PDF tails Tied-Mixtures share the same Gaussian parameters across all classes. Only the mixture weights P(i) are class specific. (Also known as semi-continuous) ˆ 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Richter Gaussian Mixtures Digital Systems: Hardware Organization and Design 1/14/2019 Richter Gaussian Mixtures [s] Log Duration: 2 Richter Gaussians 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Expectation-Maximization (EM) Digital Systems: Hardware Organization and Design 1/14/2019 Expectation-Maximization (EM) Used for determining parameters, , for incomplete data, X = {xi} (i.e., unsupervised learning problems) Introduces variable, Z = {zj}, to make data complete so can be solved using conventional ML techniques In reality, zj can only be estimated by P(zj|xi,), so we can only compute the expectation of log L() EM solutions are computed iteratively until convergence Compute the expectation of log L() Compute the values j, which maximize E 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

EM Parameter Estimation: 1D Gaussian Mixture Means Digital Systems: Hardware Organization and Design 1/14/2019 EM Parameter Estimation: 1D Gaussian Mixture Means Let zi be the component id, {j}, which xi belongs to Convert to mixture component notation: Differentiate with respect to k: 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 EM Properties Each iteration of EM will increase the likelihood of X Using Bayes rule and the Kullback-Liebler distance metric: 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 EM Properties Since ’ was determined to maximize E(log L()): Combining these two properties: p(X|’)≥ p(X|) 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Dimensionality Reduction

Dimensionality Reduction Digital Systems: Hardware Organization and Design 1/14/2019 Dimensionality Reduction Given a training set, PDF parameter estimation becomes less robust as dimensionality increases Increasing dimensions can make it more difficult to obtain insights into any underlying structure Analytical techniques exist which can transform a sample space to a different set of dimensions If original dimensions are correlated, the same information may require fewer dimensions The transformed space will often have more Normal distribution than the original space If the new dimensions are orthogonal, it could be easier to model the transformed space 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Principal Component Analysis The Principal Component (or Karhunen-Loéve transform) is computed on a full training data set that has:  - d dimensional vector, and  - d x d dimensinal covariance matrix Eigenvalues and Eigenvectors are computed as discussed in following: 14 January 2019 Veton Këpuska

Eigenvectors and Eigenvalues A very important class of matrixes have the following property: M – matrix (dxd) x – vector (d)  - scalar The solution vector x = ei and its corresponding scalar value  = i are called the eigenvector and associated eigenvalue. 14 January 2019 Veton Këpuska

Eigenvectors and Eigenvalues If M is real and symmetric, there are d (possibly nondistinct) solution vectors: {e1, e2, …, ed} each with associated eigenvalue: {1, 2, …, d} Under multiplication with M eigenvectors are only changed in magnitude not direction If M is diagonal, then the eigenvectors are parallel to the coordinate axes. 14 January 2019 Veton Këpuska

Eigenvectors and Eigenvalues One method of finding the eigenvectors and eigenvalues is to solve the characteristic equation: d (possibly nondistinct) roots are used by forming a set of linear equations to find associated eigevectors. 14 January 2019 Veton Këpuska

Principal Components Analysis Given a covariance matrix of a full training data set  we compute eigenvalues and its corresponding eigenvectors. Eigenvalues are ordered in descending order based on their absolute value. First k out of d (d>k) largest eigenvalues: {1, 2, …, k} and their corresponding eigenvectors {e1, e2, …, ek} are selected. Matrix W (d x k) is formed whose columns consist of eigenvectors. The representation of data with reduced dimensionality is obtained by projecting original data onto the k-dimensional subspace according to: 14 January 2019 Veton Këpuska

Principal Components Analysis Digital Systems: Hardware Organization and Design 1/14/2019 Principal Components Analysis Linearly transforms d-dimensional vector, x, to k dimensional vector, y, via orthonormal vectors, W y=Wt(x-) W={w1,…,wd’} WtW=I If k<d, x can be only partially reconstructed from y x=Wy+ ^ 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Principal Components Analysis Digital Systems: Hardware Organization and Design 1/14/2019 Principal Components Analysis Principal components, W, minimize the distortion, D, between x, and x, on training data X = {x1,…,xn} Also known as Karhunen-Loéve (K-L) expansion (wi’s are sinusoids for some stochastic processes) ^ 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 PCA Computation W corresponds to the first k eigenvectors, P, of  P= {e1,…,ed} =PPt wi = ei Full covariance structure of original space, , is transformed to a diagonal covariance structure ’ Eigenvalues, {1,…, k}, represents the variances in ’ 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 PCA Computation Axes in k-space contain maximum amount of variance 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 PCA Example Original feature vector mean rate response (d = 40) Data obtained from 100 speakers from TIMIT corpus First 10 components explains 98% of total variance 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 PCA Example 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

PCA for Boundary Classification Digital Systems: Hardware Organization and Design 1/14/2019 PCA for Boundary Classification Eight non-uniform averages from 14 MFCCs First 50 dimensions used for classification 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 PCA Issues PCA can be performed using Covariance matrixes  Correlation coefficients matrix P P is usually preferred when the input dimensions have significantly different ranges PCA can be used to normalize or whiten original d-dimensional space to simplify subsequent processing: PI Whitening operation can be done in one step: z=Vtx 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Significance Testing

Digital Systems: Hardware Organization and Design 1/14/2019 Significance Testing To properly compare results from different classifier algorithms, A1, and A2, it is necessary to perform significance tests Large differences can be insignificant for small test sets Small differences can be significant for large test sets General significance tests evaluate the hypothesis that the probability of being correct, pi, of both algorithms is the same The most powerful comparisons can be made using common train and test corpora, and common evaluation criterion Results reflect differences in algorithms rather than accidental differences in test sets Significance tests can be more precise when identical data are used since they can focus on tokens misclassified by only one algorithm, rather than on all tokens 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

McNemar’s Significance Test Digital Systems: Hardware Organization and Design 1/14/2019 McNemar’s Significance Test When algorithms A1 and A2 are tested on identical data we can collapse the results into a 2x2 matrix of counts Suppose that the true unknown classification error rate of the classifier (algorithm) is p. Suppose that in an experiment one observes that k out of n independent randomly drawn samples are misclassified. If the random variable k has a binomial distribution B(n,p) then the maximum likelihood estimation for p should be: A1/A2 Correct Incorrect n00 n01 n10 n11 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

McNemar’s Significance Test The statistical test for binomial distribution for a 0.05 significance level can be computed with the following equations to get the range (p1,p2) Above equations are cumbersome to solve. The normal test is used instead. 14 January 2019 Veton Këpuska

McNemar’s Significance Test Digital Systems: Hardware Organization and Design 1/14/2019 McNemar’s Significance Test To compare algorithms, we test the null hypothesis H0 that p1 = p2, or n01 = n10, or qij is defined as follows: q00 = P(A1 and A2 classify the data correctly) q01 = P(A1 classifies data correctly and A2 classifies the data incorrectly) q10 = P(A1 classifies the data incorrectly and A2 classifies the data correctly) q00 = P(A1 and A2 classify the data incorrectly) 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

McNemar’s Significance Test Digital Systems: Hardware Organization and Design 1/14/2019 McNemar’s Significance Test Given H0, the probability of observing k tokens asymmetrically classified out of n = n01 + n10 has a Binomial PMF McNemar’s Test measures the probability, P, of all cases that meet or exceed the observed asymmetric distribution, and tests P <  14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

McNemar’s Significance Test Digital Systems: Hardware Organization and Design 1/14/2019 McNemar’s Significance Test The probability, P, is computed by summing up the PMF tails For large n, a Normal distribution is often assumed. 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Significance Test Example (Gillick and Cox, 1989) Digital Systems: Hardware Organization and Design 1/14/2019 Significance Test Example (Gillick and Cox, 1989) Common test set of 1400 tokens Algorithms A1 and A2 make 72 and 62 errors Are the differences significant? 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 1/14/2019 References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, 1995. Gillick and Cox, Some Statistical Issues in the Comparison of Speech Recognition Algorithms, Proc. ICASSP, 1989. 14 January 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor