Digital Systems: Hardware Organization and Design

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Visual Recognition Tutorial
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning CMPT 726 Simon Fraser University
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 E. Fatemizadeh Statistical Pattern Recognition.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Machine Learning 5. Parametric Methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2. Bayesian Decision Theory
Univariate Gaussian Case (Cont.)
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Probability theory retro
LECTURE 03: DECISION SURFACES
CH 5: Multivariate Methods
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Digital Systems: Hardware Organization and Design Speaker Recognition
Digital Systems: Hardware Organization and Design
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Digital Systems: Hardware Organization and Design
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Generally Discriminant Analysis
LECTURE 23: INFORMATION THEORY REVIEW
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
LECTURE 05: THRESHOLD DECODING
Multivariate Methods Berlin Chen, 2005 References:
LECTURE 11: Exam No. 1 Review
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Digital Systems: Hardware Organization and Design 9/22/2018 Speech Recognition Pattern Classification Architecture of a Respresentative 32 Bit Processor

Pattern Classification Digital Systems: Hardware Organization and Design 9/22/2018 Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Pattern Classification Digital Systems: Hardware Organization and Design 9/22/2018 Pattern Classification Goal: To classify objects (or patterns) into categories (or classes) Types of Problems: Supervised: Classes are known beforehand, and data samples of each class are available Unsupervised: Classes (and/or number of classes) are not known beforehand, and must be inferred from data Feature Extraction Classifier Class i Feature Vectors x Observation s 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/22/2018 Probability Basics Discrete probability mass function (PMF): P(ωi) Continuous probability density function (PDF): p(x) Expected value: E(x) 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Kullback-Liebler Distance Digital Systems: Hardware Organization and Design 9/22/2018 Kullback-Liebler Distance Can be used to compute a distance between two probability mass distributions, P(zi), and Q(zi) Makes use of inequality log x ≤ x - 1 Known as relative entropy in information theory The divergence of P(zi) and Q(zi) is the symmetric sum 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/22/2018 Bayes Theorem Define: {i} a set of M mutually exclusive classes P(i) a priori probability for class i p(x|i) PDF for feature vector x in class i P(i|x) A posteriori probability of i given x 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/22/2018 Bayes Theorem From Bayes Rule: Where: 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Bayesian Decision Theory Reference: Pattern Classification – R. Duda, P. Hard & D. Stork, Wiley & Sons, 2001

Digital Systems: Hardware Organization and Design 9/22/2018 Bayes Decision Theory The probability of making an error given x is: P(error|x)=1-P(i|x) if decide class i To minimize P(error|x) (and P(error)): Choose i if P(i|x)>P(j|x) ∀j≠i 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/22/2018 Bayes Decision Theory For a two class problem this decision rule means: Choose 1 if else 2 This rule can be expressed as a likelihood ratio: 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/22/2018 Bayes Risk Define cost function λij and conditional risk R(ωi|x): λij is cost of classifying x as ωi when it is really ωj R(ωi|x) is the risk for classifying x as class ωi Bayes risk is the minimum risk which can be achieved: Choose ωi if R(ωi|x) < R(ωj|x) ∀i≠j Bayes risk corresponds to minimum P(error|x) when All errors have equal cost (λij = 1, i≠j) There is no cost for being correct (λii = 0) 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Discriminant Functions Digital Systems: Hardware Organization and Design 9/22/2018 Discriminant Functions Alternative formulation of Bayes decision rule Define a discriminant function, gi(x), for each class ωi Choose ωi if gi(x)>gj(x) ∀j ≠ i Functions yielding identical classiffication results: gi (x) = P(ωi |x) = p(x|ωi )P(ωi ) = log p(x|ωi )+log P(ωi ) Choice of function impacts computational costs Discriminant functions partition feature space into decision regions, separated by decision boundaries. 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/22/2018 Density Estimation Used to estimate the underlying PDF p(x|ωi) Parametric methods: Assume a specific functional form for the PDF Optimize PDF parameters to fit data Non-parametric methods: Determine the form of the PDF from the data Grow parameter set size with the amount of data Semi-parametric methods: Use a general class of functional forms for the PDF Can vary parameter set independently from data Use unsupervised methods to estimate parameters 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Parametric Classifiers Digital Systems: Hardware Organization and Design 9/22/2018 Parametric Classifiers Gaussian distributions Maximum likelihood (ML) parameter estimation Multivariate Gaussians Gaussian classifiers 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum Likelihood Parameter Estimation

Gaussian Distributions Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Distributions Gaussian PDF’s are reasonable when a feature vector can be viewed as perturbation around a reference Simple estimation procedures for model parameters Classification often reduced to simple distance metrics Gaussian distributions also called Normal 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions: One Dimension Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Distributions: One Dimension One-dimensional Gaussian PDF’s can be expressed as: The PDF is centered around the mean The spread of the PDF is determined by the variance 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Univariate Gaussians The Gaussian distribution, also known as the normal distribution, is the bell-curve function familiar from basic statistics. A Gaussian distribution is a function parameterized by a mean, or average value, and a variance, which characterizes the average spread or dispersal from the mean. We will use m to indicate the mean, and s2 to indicate the variance, giving the following formula for a Gaussian function: 22 September 2018 Veton Këpuska

Univariate Gaussian PDFs 22 September 2018 Veton Këpuska

Gaussian PDFs Mean of a Random variable X is the expected value of X. For a discrete variable X. this is the weighted sum over the values of X: The variance of a random variable X is the weighted squared average deviation from the mean: 22 September 2018 Veton Këpuska

Gaussian PDFs When a Gaussian function is used as a probability density function, the area under the curve is constrained to be equal to one. Then the probability that a random variable takes on any particular range of values can be computed by summing the area under the curve for that range of values. Fig. 9.19 shows the probability expressed by the area under an interval of a Gaussian pdf. 22 September 2018 Veton Këpuska

Maximum Likelihood Parameter Estimation Digital Systems: Hardware Organization and Design 9/22/2018 Maximum Likelihood Parameter Estimation Maximum likelihood parameter estimation determines an estimate θ for parameter θ by maximizing the likelihood L(θ) of observing data X = {x1,...,xn} Assuming independent, identically distributed data ML solutions can often be obtained via the derivative: ^ 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum Likelihood Parameter Estimation Digital Systems: Hardware Organization and Design 9/22/2018 Maximum Likelihood Parameter Estimation For Gaussian distributions log L(θ) is easier to solve 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian ML Estimation: One Dimension Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian ML Estimation: One Dimension The maximum likelihood estimate for μ is given by: 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian ML Estimation: One Dimension Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian ML Estimation: One Dimension The maximum likelihood estimate for σ is given by: 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian ML Estimation: One Dimension Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian ML Estimation: One Dimension 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ML Estimation: Alternative Distributions Digital Systems: Hardware Organization and Design 9/22/2018 ML Estimation: Alternative Distributions 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ML Estimation: Alternative Distributions Digital Systems: Hardware Organization and Design 9/22/2018 ML Estimation: Alternative Distributions 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions: Multiple Dimensions (Multivariate) Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Distributions: Multiple Dimensions (Multivariate) A multi-dimensional Gaussian PDF can be expressed as: d is the number of dimensions x={x1,…,xd} is the input vector μ= E(x)= {μ1,...,μd } is the mean vector Σ= E((x-μ )(x-μ)t) is the covariance matrix with elements σij , inverse Σ-1 , and determinant |Σ| σij = σji = E((xi - μi )(xj - μj )) = E(xixj ) - μiμj 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions: Multi-Dimensional Properties Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Distributions: Multi-Dimensional Properties If the ith and jth dimensions are statistically or linearly independent then E(xixj)= E(xi)E(xj) and σij =0 If all dimensions are statistically or linearly independent, then σij=0 ∀i≠j and Σ has non-zero elements only on the diagonal If the underlying density is Gaussian and Σ is a diagonal matrix, then the dimensions are statistically independent and 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Diagonal Covariance Matrix: Σ=σ2I Digital Systems: Hardware Organization and Design 9/22/2018 Diagonal Covariance Matrix: Σ=σ2I 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Diagonal Covariance Matrix: σij=0 ∀i≠j Digital Systems: Hardware Organization and Design 9/22/2018 Diagonal Covariance Matrix: σij=0 ∀i≠j 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

General Covariance Matrix: σij≠0 Digital Systems: Hardware Organization and Design 9/22/2018 General Covariance Matrix: σij≠0 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

2D Gaussian PDF’s 22 September 2018 Veton Këpuska

22 September 2018 Veton Këpuska

Multivariate ML Estimation Digital Systems: Hardware Organization and Design 9/22/2018 Multivariate ML Estimation The ML estimates for parameters θ = {θ1,...,θl } are determined by maximizing the joint likelihood L(θ) of a set of i.i.d. data x = {x1,..., xn} To find θ we solve θL(θ)= 0, or θ log L(θ)= 0 The ML estimates of  and  are ^ 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multivariate Gaussian Classifier Digital Systems: Hardware Organization and Design 9/22/2018 Multivariate Gaussian Classifier Requires a mean vector i, and a covariance matrix Σi for each of M classes {ω1, ··· ,ωM } The minimum error discriminant functions are of the form: Classification can be reduced to simple distance metrics for many situations. 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi = σ2I Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Classifier: Σi = σ2I Each class has the same covariance structure: statistically independent dimensions with variance σ2 The equivalent discriminant functions are: If each class is equally likely, this is a minimum distance classifier, a form of template matching The discriminant functions can be replaced by the following linear expression: where 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi = σ2I Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Classifier: Σi = σ2I For distributions with a common covariance structure the decision regions a hyper-planes. 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi=Σ Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Classifier: Σi=Σ Each class has the same covariance structure Σ The equivalent discriminant functions are: If each class is equally likely, the minimum error decision rule is the squared Mahalanobis distance The discriminant functions remain linear expressions: where 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi Arbitrary Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Classifier: Σi Arbitrary Each class has a different covariance structure Σi The equivalent discriminant functions are: The discriminant functions are inherently quadratic: where 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi Arbitrary Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian Classifier: Σi Arbitrary For distributions with arbitrary covariance structures the decision regions are defined by hyper-spheres. 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

3 Class Classification (Atal & Rabiner, 1976) Digital Systems: Hardware Organization and Design 9/22/2018 3 Class Classification (Atal & Rabiner, 1976) Distinguish between silence, unvoiced, and voiced sounds Use 5 features: Zero crossing count Log energy Normalized first autocorrelation coefficient First predictor coefficient, and Normalized prediction error Multivariate Gaussian classifier, ML estimation Decision by squared Mahalanobis distance Trained on four speakers (2 sentences/speaker), tested on 2 speakers (1 sentence/speaker) 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum A Posteriori Parameter Estimation

Maximum A Posteriori Parameter Estimation Maximum likelihood estimation approaches assume the form of the PDF p(x|θ) is known, but the value of θ is not Maximum A Posteriori (MAP) Parameter Estimation techniques assume that parameter vector θ is a random variable with a prior distribution p(θ) 22 September 2018 Veton Këpuska

Maximum A Posteriori Parameter Estimation Digital Systems: Hardware Organization and Design 9/22/2018 Maximum A Posteriori Parameter Estimation Knowledge of θ is contained in: An initial a priori PDF p(θ) A set of independent and identically distributed (i.i.d.) data X = {x1,...,xn} with PDF p(X|θ) According to Bayes rule, posterior distribution of θ is defined as: One intuitive interpretation is that a prior pdf p(θ) represent the relative likelihood before the values of x1,...,xn have been observed. Posterior PDF p(θ|X) represents the relative likelihood after the values of x1,...,xn have been observed. Choosing that maximizes is intuitively consistent. 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum A Posteriori Parameter Estimation The previous estimator defines MAP 22 September 2018 Veton Këpuska

Gaussian MAP Estimation: One Dimension Digital Systems: Hardware Organization and Design 9/22/2018 Gaussian MAP Estimation: One Dimension For a Gaussian distribution with unknown mean μ: MAP estimates of μ and x are given by As n increases, p(μ|X) converges to μ, and p(x,X) converges to the ML estimate ~ N(μ,2) ^ ^ 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Significance Testing Significance testing in parameter estimation and in model comparisons is an important topic of SR and is part of Statistical Inference. For more information check reference [1] chapter 3.3. 22 September 2018 Veton Këpuska

Digital Systems: Hardware Organization and Design 9/22/2018 References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001. Atal and Rabiner, A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition, IEEE Trans ASSP, 24(3), 1976. 22 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor