Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

Similar presentations


Presentation on theme: "Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell."— Presentation transcript:

1 Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell

2 Slide 2 EE3J2 Data Mining Objectives  To review basic statistical modelling  To review the notion of probability distribution  To review the notion of probability density function  To introduce mixture densities  To introduce the multivariate Gaussian density

3 Slide 3 EE3J2 Data Mining Discrete variables  Suppose that Y is a random variable which can take any value in a discrete set X={x 1,x 2,…,x M }  Suppose that y 1,y 2,…,y N are samples of the random variable Y  If c m is the number of times that the y n = x m then an estimate of the probability that y n takes the value x m is given by:

4 Slide 4 EE3J2 Data Mining Discrete Probability Mass Function Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098

5 Slide 5 EE3J2 Data Mining Continuous Random Variables  In most practical applications the data are not restricted to a finite set of values – they can take any value in N-dimensional space  Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities…  …but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods

6 Slide 6 EE3J2 Data Mining Continuous Random Variables  An alternative is to use a parametric model  In a parametric model, probabilities are defined by a small set of parameters  Simplest example is a normal, or Gaussian model  A Gaussian probability density function (PDF) is defined by two parameters – its mean  and variance 

7 Slide 7 EE3J2 Data Mining Gaussian PDF  ‘Standard’ 1-dimensional Guassian PDF: – mean  =0 –variance  =1

8 Slide 8 EE3J2 Data Mining Gaussian PDF a b P(a  x  b)

9 Slide 9 EE3J2 Data Mining Gaussian PDF  For a 1-dimensional Gaussian PDF p with mean  and variance  : Constant to ensure area under curve is 1 Defines ‘bell’ shape

10 Slide 10 EE3J2 Data Mining More examples  =0.1  =1.0  =10.0  =5.0

11 Slide 11 EE3J2 Data Mining Fitting a Gaussian PDF to Data  Suppose y = y 1,…,y n,…,y N is a set of N data values  Given a Gaussian PDF p with mean  and variance , define:  How do we choose  and  to maximise this probability?

12 Slide 12 EE3J2 Data Mining Fitting a Gaussian PDF to Data Poor fitGood fit

13 Slide 13 EE3J2 Data Mining Maximum Likelihood Estimation  Define the best fitting Gaussian to be the one such that p(y| ,  ) is maximised.  Terminology: –p(y| ,  ), thought of as a function of y is the probability (density) of y –p(y| ,  ), thought of as a function of ,  is the likelihood of ,   Maximising p(y| ,  ) with respect to ,  is called Maximum Likelihood (ML) estimation of , 

14 Slide 14 EE3J2 Data Mining ML estimation of ,   Intuitively: –The maximum likelihood estimate of  should be the average value of y 1,…,y N, (the sample mean) –The maximum likelihood estimate of  should be the variance of y 1,…,y N. (the sample variance)  This turns out to be true: p(y| ,  ) is maximised by setting:

15 Slide 15 EE3J2 Data Mining Multi-modal distributions  In practice the distributions of many naturally occurring phenomena do not follow the simple bell- shaped Gaussian curve  For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults)  These peaks are the modes of the distribution and the distribution is called multi-modal

16 Slide 16 EE3J2 Data Mining Gaussian Mixture PDFs  Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions.  A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs  For example, if p 1 and p 2 are Gaussiam PDFs, then p(y) = w 1 p 1 (y) + w 2 p 2 (y) defines a 2 component Gaussian mixture PDF

17 Slide 17 EE3J2 Data Mining Gaussian Mixture - Example  2 component mixture model –Component 1:  =0,  =0.1 –Component 2:  =2,  =1 –w 1 = w 2 =0.5

18 Slide 18 EE3J2 Data Mining Example 2  2 component mixture model –Component 1:  =0,  =0.1 –Component 2:  =2,  =1 –w 1 = 0.2 w 2 =0.8

19 Slide 19 EE3J2 Data Mining Example 3  2 component mixture model –Component 1:  =0,  =0.1 –Component 2:  =2,  =1 –w 1 = 0.2 w 2 =0.8

20 Slide 20 EE3J2 Data Mining Example 4  5 component Gaussian mixture PDF

21 Slide 21 EE3J2 Data Mining Gaussian Mixture Model  In general, an M component Gaussian mixture PDF is defined by: where each p m is a Gaussian PDF and

22 Slide 22 EE3J2 Data Mining Estimating the parameters of a Gaussian mixture model  A Gaussian Mixture Model with M components has: –M means:  1,…,  M –M variances  1,…,  M –M mixture weights w 1,…,w M.  Given a set of data y = y 1,…,y N, how can we estimate these parameters?  I.e. how do we find a maximum likelihood estimate of  1,…,  M,  1,…,  M, w 1,…,w M ?

23 Slide 23 EE3J2 Data Mining Parameter Estimation  If we knew which component each sample y t came from, then parameter estimation would be easy: –Set  m to be the average value of the samples which belong to the m th component –Set  m to be the variance of the samples which belong to the m th component –Set w m to be the proportion of samples which belong to the m th component  But we don’t know which component each sample belongs to.

24 Slide 24 EE3J2 Data Mining Solution – the E-M algorithm  Guess initial values  For each n calculate the probabilities  Use these probabilities to estimate how much each sample y n ‘belongs to’ the m th component  Calculate: This is a measure of how much y n ‘belongs to’ the m th component REPEAT

25 Slide 25 EE3J2 Data Mining The E-M algorithm Parameter set  p(y |  )  (0) …  (i) local optimum

26 Slide 26 EE3J2 Data Mining Multivariate Gaussian PDFs  All PDFs so far have been 1-dimensional  They take scalar values  But most real data will be represented as D- dimensional vectors  The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF

27 Slide 27 EE3J2 Data Mining Multivariate Gaussian PDFs Contours of equal probability 1- dimensional Gaussian PDFs

28 Slide 28 EE3J2 Data Mining Multivariate Gaussian PDFs 1- dimensional Gaussian PDFs

29 Slide 29 EE3J2 Data Mining Multivariate Gaussian PDF  The parameters of a multivariate Gaussian PDF are: –The (vector) mean  –The (vector) variance  –The covariance The covariance matrix 

30 Slide 30 EE3J2 Data Mining Multivariate Gaussian PDFs  Multivariate Gaussian PDFs are commonly used in pattern processing and data mining  Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs  The E-M algorithm works for multivariate Gaussian mixture PDFs

31 Slide 31 EE3J2 Data Mining Summary  Basic statistical modelling  Probability distributions  Probability density function  Gaussian PDFs  Gaussian mixture PDFs and the E-M algorithm  Multivariate Gaussian PDFs

32 Slide 32 EE3J2 Data Mining Summary


Download ppt "Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell."

Similar presentations


Ads by Google