Digital Systems: Hardware Organization and Design Speaker Recognition

Digital Systems: Hardware Organization and Design Speaker Recognition
11/16/2018 Speech Processing Speaker Recognition Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design
11/16/2018 Speaker Recognition Definitions: Speaker Identification: For given a set of models obtained for a number of known speakers, which voice models best characterizes a speaker. Speaker Verification: Decide whether a speaker corresponds to a particular known voice or to some other unknown voice. Claimant – an individual who is correctly posing as one of known speakers Impostor – unknown speaker who is posing as a known speaker. False Acceptance False Rejection 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Speaker Recognition Steps in Speaker Recognition: Model Building For each target speaker (claimant) A number of background (imposter) speakers Speaker-Dependent Features Oral and Nasal tract length and cross section during different sounds Vocal fold mass and shape Location and size of the false vocal folds Accurately measured from the speech waveform. Training Data + Model Building Procedure ⇨ Generate Models 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Speaker Recognition In practice difficult to derive speech anatomy features from the speech waveform. Use conventional methods to extract features: Constant-Q filter bank Spectral Based Features 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Speaker Recognition System
Digital Systems: Hardware Organization and Design 11/16/2018 Speaker Recognition System Training Speech Data Linda Target & Background Speaker Models Feature Extraction Training Kay Joe Testing Speech Data Decision: Tom Not Tom Feature Extraction Recognition Tom 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Spectral Features for Speaker Recognition
Digital Systems: Hardware Organization and Design 11/16/2018 Spectral Features for Speaker Recognition Attributes of Human Voice: High-level – difficult to extract from speech waveform: Clarity Roughness Magnitude Animation Prosody – pitch intonation, articulation rate, and dialect Low-level – easy to extract from speech waveform: Vocal tract Spectrum Instantaneous pitch Glottal flow excitation Source Event Onset Times Modulations in Format Trajectories 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 11/16/2018 Spectral Features for Speaker Recognition Want the feature set to reflect the unique characteristics of a speaker. The short-time Fourier transform (STFT): STFT Magnitude: Vocal tract resonances Vocal tract anti-resonances – important for speaker identifiability. General trend of the envelope of the STFT Magnitude is influenced by the coarse component of the glottal flow derivative. Fine structure of STFT characterized by speaker-dependent features: Pitch Glottal-flow Distributed Acoustic Effects 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 11/16/2018 Spectral Features for Speaker Recognition Speaker Recognition Systems use smooth representation of the STFT magnitude: Vocal tract resonances Spectral Tilt Auditory-based features superior to the conventional features: All-pole LPC spectrum Homomorphic filtered spectrum Homomorphic prediction, etc. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Mel-Cepstrum Davies & Mermelstein: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Short-Time Fourier Analysis (Time-Dependent Fourier Transform)
Digital Systems: Hardware Organization and Design 11/16/2018 Short-Time Fourier Analysis (Time-Dependent Fourier Transform) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Rectangular Window 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Hamming Window 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Comparison of Windows 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Comparison of Windows (cont’d)
Digital Systems: Hardware Organization and Design 11/16/2018 Comparison of Windows (cont’d) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

A Wideband Spectrogram
Digital Systems: Hardware Organization and Design 11/16/2018 A Wideband Spectrogram 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

A Narrowband Spectrogram
Digital Systems: Hardware Organization and Design 11/16/2018 A Narrowband Spectrogram 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Discrete Fourier Transform
Digital Systems: Hardware Organization and Design 11/16/2018 Discrete Fourier Transform In general, the number of input points, N, and the number of frequency samples, M, need not be the same. If M>N , we must zero-pad the signal If M<N , we must time-alias the signal 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Examples of Various Spectral Representations
Digital Systems: Hardware Organization and Design 11/16/2018 Examples of Various Spectral Representations 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Cepstral Analysis of Speech
Digital Systems: Hardware Organization and Design 11/16/2018 Cepstral Analysis of Speech The speech signal is often assumed to be the output of an LTI system; i.e., it is the convolution of the input and the impulse response. If we are interested in characterizing the signal in terms of the parameters of such a model, we must go through the process of de-convolution. Cepstral, analysis is a common procedure used for such de-convolution. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Cepstral Analysis Cepstral analysis for convolution is based on the observation that: x[n]= x1[n] * x2[n] ⇒ X (z)= X1(z)X2(z) By taking the complex logarithm of X(z), then log{X (z)} =log{X1(z)} + log{X2(z)} = If the complex logarithm is unique, and if is a valid z-transform, then The two convolved signals will be additive in this new, cepstral domain. If we restrict ourselves to the unit circle, z = ej, then: It can be shown that one approach to dealing with the problem of uniqueness is to require that arg{X(ejω)} be a continuous, odd, periodic function of ω. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Cepstral Analysis (cont’d)
Digital Systems: Hardware Organization and Design 11/16/2018 Cepstral Analysis (cont’d) ^ To the extent that X(z)=log{X(z)} is valid, It can easily be shown that c[n] is the even part of x[n]. If x[n] is real and causal then x[n], be recovered from c[n]. This is known as the Minimum Phase condition. ^ ^ ^ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Mel-Frequency Cepstral Representation (Mermelstein & Davis 1980)
Digital Systems: Hardware Organization and Design 11/16/2018 Mel-Frequency Cepstral Representation (Mermelstein & Davis 1980) Some recognition systems use Mel-scale cepstral coefficients to mimic auditory processing. (Mel frequency scale is linear up to 100 Hz and logarithmic thereafter.) This is done by multiplying the magnitude (or log magnitude) of S(ej) with a set of filter weights as shown below: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 References Tohkura, Y., “A Weighted Cepstral Distance Measure for Speech Recognition," IEEE Trans. ASSP, Vol. ASSP-35, No. 10, , 1987. Mermelstein, P. and Davis, S., “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Trans. ASSP, Vol. ASSP-28, No. 4, , 1980. Meng, H., The Use of Distinctive Features for Automatic Speech Recognition,SM Thesis, MIT EECS, 1991. Leung, H., Chigier, B., and Glass, J., “A Comparative Study of Signal Represention and Classi.cation Techniques for Speech Recognition," Proc. ICASSP,Vol.II, , 1993. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Pattern Classification
Digital Systems: Hardware Organization and Design 11/16/2018 Pattern Classification Goal: To classify objects (or patterns) into categories (or classes) Types of Problems: Supervised: Classes are known beforehand, and data samples of each class are available Unsupervised: Classes (and/or number of classes) are not known beforehand, and must be inferred from data Feature Extraction Classifier Class i Feature Vectors x Observation s 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Probability Basics Discrete probability mass function (PMF): P(ωi) Continuous probability density function (PDF): p(x) Expected value: E(x) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Kullback-Liebler Distance
Digital Systems: Hardware Organization and Design 11/16/2018 Kullback-Liebler Distance Can be used to compute a distance between two probability mass distributions, P(zi), and Q(zi) Makes use of inequality log x ≤ x - 1 Known as relative entropy in information theory The divergence of P(zi) and Q(zi) is the symmetric sum 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Bayes Theorem Define: {i} a set of M mutually exclusive classes P(i) a priori probability for class i p(x|i) PDF for feature vector x in class i P(i|x) A posteriori probability of i given x 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Bayes Theorem From Bayes Rule: Where: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Bayes Decision Theory The probability of making an error given x is: P(error|x)=1-P(i|x) if decide class i To minimize P(error|x) (and P(error)): Choose i if P(i|x)>P(j|x) ∀j≠i For a two class problem this decision rule means: Choose 1 if else 2 This rule can be expressed as a likelihood ratio: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Bayes Risk Define cost function λij and conditional risk R(ωi|x): λij is cost of classifying x as ωi when it is really ωj R(ωi|x) is the risk for classifying x as class ωi Bayes risk is the minimum risk which can be achieved: Choose ωi if R(ωi|x) < R(ωj|x) ∀i≠j Bayes risk corresponds to minimum P(error|x) when All errors have equal cost (λij = 1, i≠j) There is no cost for being correct (λii = 0) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Discriminant Functions
Digital Systems: Hardware Organization and Design 11/16/2018 Discriminant Functions Alternative formulation of Bayes decision rule Define a discriminant function, gi(x), for each class ωi Choose ωi if gi(x)>gj(x) ∀j = i Functions yielding identical classiffication results: gi (x) = P(ωi |x) = p(x|ωi )P(ωi ) = log p(x|ωi )+log P(ωi ) Choice of function impacts computation costs Discriminant functions partition feature space into decision regions, separated by decision boundaries. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Density Estimation Used to estimate the underlying PDF p(x|ωi) Parametric methods: Assume a specific functional form for the PDF Optimize PDF parameters to fit data Non-parametric methods: Determine the form of the PDF from the data Grow parameter set size with the amount of data Semi-parametric methods: Use a general class of functional forms for the PDF Can vary parameter set independently from data Use unsupervised methods to estimate parameters 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Parametric Classifiers
Digital Systems: Hardware Organization and Design 11/16/2018 Parametric Classifiers Gaussian distributions Maximum likelihood (ML) parameter estimation Multivariate Gaussians Gaussian classifiers 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Distributions Gaussian PDF’s are reasonable when a feature vector can be viewed as perturbation around a reference Simple estimation procedures for model parameters Classification often reduced to simple distance metrics Gaussian distributions also called Normal 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions: One Dimension
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Distributions: One Dimension One-dimensional Gaussian PDF’s can be expressed as: The PDF is centered around the mean The spread of the PDF is determined by the variance 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum Likelihood Parameter Estimation
Digital Systems: Hardware Organization and Design 11/16/2018 Maximum Likelihood Parameter Estimation Maximum likelihood parameter estimation determines an estimate θ for parameter θ by maximizing the likelihood L(θ) of observing data X = {x1,...,xn} Assuming independent, identically distributed data ML solutions can often be obtained via the derivative: ^ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum Likelihood Parameter Estimation
Digital Systems: Hardware Organization and Design 11/16/2018 Maximum Likelihood Parameter Estimation For Gaussian distributions log L(θ) is easier to solve 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian ML Estimation: One Dimension
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian ML Estimation: One Dimension The maximum likelihood estimate for μ is given by: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian ML Estimation: One Dimension The maximum likelihood estimate for σ is given by: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian ML Estimation: One Dimension 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ML Estimation: Alternative Distributions
Digital Systems: Hardware Organization and Design 11/16/2018 ML Estimation: Alternative Distributions 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions: Multiple Dimensions (Multivariate)
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Distributions: Multiple Dimensions (Multivariate) A multi-dimensional Gaussian PDF can be expressed as: d is the number of dimensions x={x1,…,xd} is the input vector μ= E(x)= {μ1,...,μd } is the mean vector Σ= E((x-μ )(x-μ)t) is the covariance matrix with elements σij , inverse Σ-1 , and determinant |Σ| σij = σji = E((xi - μi )(xj - μj )) = E(xixj ) - μiμj 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Distributions: Multi-Dimensional Properties
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Distributions: Multi-Dimensional Properties If the ith and jth dimensions are statistically or linearly independent then E(xixj)= E(xi)E(xj) and σij =0 If all dimensions are statistically or linearly independent, then σij=0 ∀i≠j and Σ has non-zero elements only on the diagonal If the underlying density is Gaussian and Σ is a diagonal matrix, then the dimensions are statistically independent and 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Diagonal Covariance Matrix: Σ=σ2I
Digital Systems: Hardware Organization and Design 11/16/2018 Diagonal Covariance Matrix: Σ=σ2I 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Diagonal Covariance Matrix: σij=0 ∀i≠j
Digital Systems: Hardware Organization and Design 11/16/2018 Diagonal Covariance Matrix: σij=0 ∀i≠j 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

General Covariance Matrix: σij≠0
Digital Systems: Hardware Organization and Design 11/16/2018 General Covariance Matrix: σij≠0 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multivariate ML Estimation
Digital Systems: Hardware Organization and Design 11/16/2018 Multivariate ML Estimation The ML estimates for parameters θ = {θ1,...,θl } are determined by maximizing the joint likelihood L(θ) of a set of i.i.d. data x = {x1,..., xn} To find θ we solve θL(θ)= 0, or θ log L(θ)= 0 The ML estimates of  and  are ^ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multivariate Gaussian Classifier
Digital Systems: Hardware Organization and Design 11/16/2018 Multivariate Gaussian Classifier Requires a mean vector i, and a covariance matrix Σi for each of M classes {ω1, ··· ,ωM } The minimum error discriminant functions are of the form: Classification can be reduced to simple distance metrics for many situations. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi = σ2I
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Classifier: Σi = σ2I Each class has the same covariance structure: statistically independent dimensions with variance σ2 The equivalent discriminant functions are: If each class is equally likely, this is a minimum distance classifier, a form of template matching The discriminant functions can be replaced by the following linear expression: where 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi = σ2I
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Classifier: Σi = σ2I For distributions with a common covariance structure the decision regions a hyper-planes. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi=Σ
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Classifier: Σi=Σ Each class has the same covariance structure Σ The equivalent discriminant functions are: If each class is equally likely, the minimum error decision rule is the squared Mahalanobis distance The discriminant functions remain linear expressions: where 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi Arbitrary
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Classifier: Σi Arbitrary Each class has a different covariance structure Σi The equivalent discriminant functions are: The discriminant functions are inherently quadratic: where 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Classifier: Σi Arbitrary
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Classifier: Σi Arbitrary For distributions with arbitrary covariance structures the decision regions are defined by hyper-spheres. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

3 Class Classification (Atal & Rabiner, 1976)
Digital Systems: Hardware Organization and Design 11/16/2018 3 Class Classification (Atal & Rabiner, 1976) Distinguish between silence, unvoiced, and voiced sounds Use 5 features: Zero crossing count Log energy Normalized first autocorrelation coefficient First predictor coefficient, and Normalized prediction error Multivariate Gaussian classifier, ML estimation Decision by squared Mahalanobis distance Trained on four speakers (2 sentences/speaker), tested on 2 speakers (1 sentence/speaker) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Maximum A Posteriori Parameter Estimation
Digital Systems: Hardware Organization and Design 11/16/2018 Maximum A Posteriori Parameter Estimation Bayesian estimation approaches assume the form of the PDF p(x|θ) is known, but the value of θ is not Knowledge of θ is contained in: An initial a priori PDF p(θ) A set of i.i.d. data X = {x1,...,xn} The desired PDF for x is of the form The value posteriori θ that maximizes p(θ|X) is called the maximum a posteriori (MAP) estimate of θ ^ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian MAP Estimation: One Dimension
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian MAP Estimation: One Dimension For a Gaussian distribution with unknown mean μ: MAP estimates of μ and x are given by As n increases, p(μ|X) converges to μ, and p(x,X) converges to the ML estimate ~ N(μ,2) ^ ^ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001. Atal and Rabiner, A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition, IEEE Trans ASSP, 24(3), 1976. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Speaker Recognition Algorithm
Digital Systems: Hardware Organization and Design 11/16/2018 Speaker Recognition Algorithm Minimum-Distance Classifier: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Vector Quantization Class dependent Distance 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixture Model (GMM)
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Mixture Model (GMM) Speech Production not Deterministic: Phones are never produced by a speaker with exactly the same vocal tract shape and glottal flow due to variations in: Context Coarticulation Anatomy Fluid Dynamics On of the best ways to represent variability is through multi-dimensinal Gaussian pdf’s. In general a Mixture of Gaussians is used to represent a class pdf. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Mixture Densities PDF is composed of a mixture of m components densities {1,…,2}: Component PDF parameters and mixture weights P(j) are typically unknown, making parameter estimation a form of unsupervised learning. Gaussian mixtures assume Normal components: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixture Example: One Dimension
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Mixture Example: One Dimension p(x)=0.6p1(x)+0.4p2(x) p1(x)~N(-,2) p2(x) ~N(1.5,2) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Gaussian Example First 9 MFCC’s from [s]: Gaussian PDF 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Independent Mixtures [s]: 2 Gaussian Mixture Components/Dimension 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Mixture Components [s]: 2 Gaussian Mixture Components/Dimension 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ML Parameter Estimation: 1D Gaussian Mixture Means
Digital Systems: Hardware Organization and Design 11/16/2018 ML Parameter Estimation: 1D Gaussian Mixture Means 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixtures: ML Parameter Estimation
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Mixtures: ML Parameter Estimation The maximum likelihood solutions are of the form: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixtures: ML Parameter Estimation
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Mixtures: ML Parameter Estimation The ML solutions are typically solved iteratively: Select a set of initial estimates for P(k), µk, k Use a set of n samples to re-estimate the mixture parameters until some kind of convergence is found Clustering procedures are often used to provide the initial parameter estimates Similar to K-means clustering procedure ˆ ˆ ˆ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example: 4 Samples, 2 Densities
Digital Systems: Hardware Organization and Design 11/16/2018 Example: 4 Samples, 2 Densities Data: X = {x1,x2,x3,x4} = {2,1,-1,-2} Init: p(x|1)~N(1,1), p(x|2)~N(1-,1), P(i)=0.5 Estimate: Recompute mixture parameters (only shown for 1): x1 x2 x3 x4 P(1|x) 0.98 0.88 0.12 0.02 P(2|x) p(X)  (e e-4.5)(e0 + e-2)(e0 + e-2)(e e-4.5)0.54 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example: 4 Samples, 2 Densities
Digital Systems: Hardware Organization and Design 11/16/2018 Example: 4 Samples, 2 Densities Repeat steps 3,4 until convergence. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

[s] Duration: 2 Densities
Digital Systems: Hardware Organization and Design 11/16/2018 [s] Duration: 2 Densities 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Gaussian Mixture Example: Two Dimensions
Digital Systems: Hardware Organization and Design 11/16/2018 Gaussian Mixture Example: Two Dimensions 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Two Dimensional Mixtures...
Digital Systems: Hardware Organization and Design 11/16/2018 Two Dimensional Mixtures... 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Two Dimensional Components
Digital Systems: Hardware Organization and Design 11/16/2018 Two Dimensional Components 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Mixture of Gaussians: Implementation Variations
Digital Systems: Hardware Organization and Design 11/16/2018 Mixture of Gaussians: Implementation Variations Diagonal Gaussians are often used instead of full-covariance Gaussians Can reduce the number of parameters Can potentially model the underlying PDF just as well if enough components are used Mixture parameters are often constrained to be the same in order to reduce the number of parameters which need to be estimated Richter Gaussians share the same mean in order to better model the PDF tails Tied-Mixtures share the same Gaussian parameters across all classes. Only the mixture weights P(i) are class specific. (Also known as semi-continuous) ˆ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Richter Gaussian Mixtures
Digital Systems: Hardware Organization and Design 11/16/2018 Richter Gaussian Mixtures [s] Log Duration: 2 Richter Gaussians 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Expectation-Maximization (EM)
Digital Systems: Hardware Organization and Design 11/16/2018 Expectation-Maximization (EM) Used for determining parameters, , for incomplete data, X = {xi} (i.e., unsupervised learning problems) Introduces variable, Z = {zj}, to make data complete so can be solved using conventional ML techniques In reality, zj can only be estimated by P(zj|xi,), so we can only compute the expectation of log L() EM solutions are computed iteratively until convergence Compute the expectation of log L() Compute the values j, which maximize E 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

EM Parameter Estimation: 1D Gaussian Mixture Means
Digital Systems: Hardware Organization and Design 11/16/2018 EM Parameter Estimation: 1D Gaussian Mixture Means Let zi be the component id, {j}, which xi belongs to Convert to mixture component notation: Differentiate with respect to k: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 EM Properties Each iteration of EM will increase the likelihood of X Using Bayes rule and the Kullback-Liebler distance metric: 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 EM Properties Since ’ was determined to maximize E(log L()): Combining these two properties: p(X|’)≥ p(X|) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Dimensionality Reduction
Digital Systems: Hardware Organization and Design 11/16/2018 Dimensionality Reduction Given a training set, PDF parameter estimation becomes less robust as dimensionality increases Increasing dimensions can make it more difficult to obtain insights into any underlying structure Analytical techniques exist which can transform a sample space to a different set of dimensions If original dimensions are correlated, the same information may require fewer dimensions The transformed space will often have more Normal distribution than the original space If the new dimensions are orthogonal, it could be easier to model the transformed space 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Principal Components Analysis
Digital Systems: Hardware Organization and Design 11/16/2018 Principal Components Analysis Linearly transforms d-dimensional vector, x, to d’ dimensional vector, y, via orthonormal vectors, W y=Wtx W={w1,…,wd’} WtW=I If d’<d, x can be only partially reconstructed from y x=Wy ^ ^ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Principal Components Analysis
Digital Systems: Hardware Organization and Design 11/16/2018 Principal Components Analysis Principal components, W, minimize the distortion, D, between x, and x, on training data X = {x1,…,xn} Also known as Karhunen-Loéve (K-L) expansion (wi’s are sinusoids for some stochastic processes) 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 PCA Computation W corresponds to the first d’ eigenvectors, P, of  P= {e1,…,ed} =PPt wi = ei Full covariance structure of original space, , is transformed to a diagonal covariance structure ’ Eigenvalues, {1,…, d’}, represents the variances in ’ 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 PCA Computation Axes in d’-space contain maximum amount of variance 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 PCA Example Original feature vector mean rate response (d = 40) Data obtained from 100 speakers from TIMIT corpus First 10 components explains 98% of total variance 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 PCA Example 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

PCA for Boundary Classification
Digital Systems: Hardware Organization and Design 11/16/2018 PCA for Boundary Classification Eight non-uniform averages from 14 MFCCs First 50 dimensions used for classification 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 PCA Issues PCA can be performed using Covariance matrixes  Correlation coefficients matrix P P is usually preferred when the input dimensions have significantly different ranges PCA can be used to normalize or whiten original d-dimensional space to simplify subsequent processing: PI Whitening operation can be done in one step: z=Vtx 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 Significance Testing To properly compare results from different classifier algorithms, A1, and A2, it is necessary to perform significance tests Large differences can be insignificant for small test sets Small differences can be significant for large test sets General significance tests evaluate the hypothesis that the probability of being correct, pi, of both algorithms is the same The most powerful comparisons can be made using common train and test corpora, and common evaluation criterion Results reflect differences in algorithms rather than accidental differences in test sets Significance tests can be more precise when identical data are used since they can focus on tokens misclassified by only one algorithm, rather than on all tokens 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

McNemar’s Significance Test
Digital Systems: Hardware Organization and Design 11/16/2018 McNemar’s Significance Test When algorithms A1 and A2 are tested on identical data we can collapse the results into a 2x2 matrix of counts To compare algorithms, we test the null hypothesis H0 that p1 = p2, or n01 = n10, or A1/A2 Correct Incorrect n00 n01 n10 n11 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 11/16/2018 McNemar’s Significance Test Given H0, the probability of observing k tokens asymmetrically classified out of n = n01 + n10 has a Binomial PMF McNemar’s Test measures the probability, P, of all cases that meet or exceed the observed asymmetric distribution, and tests P <  16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 11/16/2018 McNemar’s Significance Test The probability, P, is computed by summing up the PMF tails For large n, a Normal distribution is often assumed. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Significance Test Example (Gillick and Cox, 1989)
Digital Systems: Hardware Organization and Design 11/16/2018 Significance Test Example (Gillick and Cox, 1989) Common test set of 1400 tokens Algorithms A1 and A2 make 72 and 62 errors Are the differences significant? 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

11/16/2018 References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, 1995. Gillick and Cox, Some Statistical Issues in the Comparison of Speech Recognition Algorithms, Proc. ICASSP, 1989. 16 November 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design Speaker Recognition

Similar presentations

Presentation on theme: "Digital Systems: Hardware Organization and Design Speaker Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Digital Systems: Hardware Organization and Design Speaker Recognition

Similar presentations

Presentation on theme: "Digital Systems: Hardware Organization and Design Speaker Recognition"— Presentation transcript:

Similar presentations

About project

Feedback