Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell

EE3J2 Data Mining Objectives  To review basic statistical modelling  To review the notion of probability distribution  To review the notion of probability density function  To introduce mixture densities  To introduce the multivariate Gaussian density

EE3J2 Data Mining Discrete variables  Suppose that Y is a random variable which can take any value in a discrete set X={x 1,x 2,…,x M }  Suppose that y 1,y 2,…,y N are samples of the random variable Y  If c m is the number of times that the y n = x m then an estimate of the probability that y n takes the value x m is given by:

EE3J2 Data Mining Discrete Probability Mass Function Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098

EE3J2 Data Mining Continuous Random Variables  In most practical applications the data are not restricted to a finite set of values – they can take any value in N-dimensional space  Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities…  …but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods

EE3J2 Data Mining Continuous Random Variables  An alternative is to use a parametric model  In a parametric model, probabilities are defined by a small set of parameters  Simplest example is a normal, or Gaussian model  A Gaussian probability density function (PDF) is defined by two parameters – its mean  and variance 

EE3J2 Data Mining Gaussian PDF  ‘Standard’ 1-dimensional Guassian PDF: – mean  =0 –variance  =1

EE3J2 Data Mining Gaussian PDF a b P(a  x  b)

EE3J2 Data Mining Gaussian PDF  For a 1-dimensional Gaussian PDF p with mean  and variance  : Constant to ensure area under curve is 1 Defines ‘bell’ shape

EE3J2 Data Mining More examples  =0.1  =1.0  =10.0  =5.0

EE3J2 Data Mining Fitting a Gaussian PDF to Data  Suppose y = y 1,…,y n,…,y N is a set of N data values  Given a Gaussian PDF p with mean  and variance , define:  How do we choose  and  to maximise this probability?

EE3J2 Data Mining Fitting a Gaussian PDF to Data Poor fitGood fit

EE3J2 Data Mining Maximum Likelihood Estimation  Define the best fitting Gaussian to be the one such that p(y| ,  ) is maximised.  Terminology: –p(y| ,  ), thought of as a function of y is the probability (density) of y –p(y| ,  ), thought of as a function of ,  is the likelihood of ,   Maximising p(y| ,  ) with respect to ,  is called Maximum Likelihood (ML) estimation of , 

EE3J2 Data Mining ML estimation of ,   Intuitively: –The maximum likelihood estimate of  should be the average value of y 1,…,y N, (the sample mean) –The maximum likelihood estimate of  should be the variance of y 1,…,y N. (the sample variance)  This turns out to be true: p(y| ,  ) is maximised by setting:

EE3J2 Data Mining Multi-modal distributions  In practice the distributions of many naturally occurring phenomena do not follow the simple bell- shaped Gaussian curve  For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults)  These peaks are the modes of the distribution and the distribution is called multi-modal

EE3J2 Data Mining Gaussian Mixture PDFs  Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions.  A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs  For example, if p 1 and p 2 are Gaussiam PDFs, then p(y) = w 1 p 1 (y) + w 2 p 2 (y) defines a 2 component Gaussian mixture PDF

EE3J2 Data Mining Gaussian Mixture - Example  2 component mixture model –Component 1:  =0,  =0.1 –Component 2:  =2,  =1 –w 1 = w 2 =0.5

EE3J2 Data Mining Example 2  2 component mixture model –Component 1:  =0,  =0.1 –Component 2:  =2,  =1 –w 1 = 0.2 w 2 =0.8

EE3J2 Data Mining Example 3  2 component mixture model –Component 1:  =0,  =0.1 –Component 2:  =2,  =1 –w 1 = 0.2 w 2 =0.8

EE3J2 Data Mining Example 4  5 component Gaussian mixture PDF

EE3J2 Data Mining Gaussian Mixture Model  In general, an M component Gaussian mixture PDF is defined by: where each p m is a Gaussian PDF and

EE3J2 Data Mining Estimating the parameters of a Gaussian mixture model  A Gaussian Mixture Model with M components has: –M means:  1,…,  M –M variances  1,…,  M –M mixture weights w 1,…,w M.  Given a set of data y = y 1,…,y N, how can we estimate these parameters?  I.e. how do we find a maximum likelihood estimate of  1,…,  M,  1,…,  M, w 1,…,w M ?

EE3J2 Data Mining Parameter Estimation  If we knew which component each sample y t came from, then parameter estimation would be easy: –Set  m to be the average value of the samples which belong to the m th component –Set  m to be the variance of the samples which belong to the m th component –Set w m to be the proportion of samples which belong to the m th component  But we don’t know which component each sample belongs to.

EE3J2 Data Mining Solution – the E-M algorithm  Guess initial values  For each n calculate the probabilities  Use these probabilities to estimate how much each sample y n ‘belongs to’ the m th component  Calculate: This is a measure of how much y n ‘belongs to’ the m th component REPEAT

EE3J2 Data Mining The E-M algorithm Parameter set  p(y |  )  (0) …  (i) local optimum

EE3J2 Data Mining Multivariate Gaussian PDFs  All PDFs so far have been 1-dimensional  They take scalar values  But most real data will be represented as D- dimensional vectors  The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF

EE3J2 Data Mining Multivariate Gaussian PDFs Contours of equal probability 1- dimensional Gaussian PDFs

EE3J2 Data Mining Multivariate Gaussian PDFs 1- dimensional Gaussian PDFs

EE3J2 Data Mining Multivariate Gaussian PDF  The parameters of a multivariate Gaussian PDF are: –The (vector) mean  –The (vector) variance  –The covariance The covariance matrix 

EE3J2 Data Mining Multivariate Gaussian PDFs  Multivariate Gaussian PDFs are commonly used in pattern processing and data mining  Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs  The E-M algorithm works for multivariate Gaussian mixture PDFs

EE3J2 Data Mining Summary  Basic statistical modelling  Probability distributions  Probability density function  Gaussian PDFs  Gaussian mixture PDFs and the E-M algorithm  Multivariate Gaussian PDFs

EE3J2 Data Mining Summary

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

Similar presentations

Presentation on theme: "Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

Similar presentations

Presentation on theme: "Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell."— Presentation transcript:

Similar presentations

About project

Feedback