A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Let X 1, X 2,..., X n be a set of independent random variables having a common distribution, and let E[ X i ] = . then, with probability 1 Strong law.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Outline input analysis input analyzer of ARENA parameter estimation
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Visual Recognition Tutorial
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Time Series Basics Fin250f: Lecture 3.1 Fall 2005 Reading: Taylor, chapter
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
Probability and Statistics Review
The moment generating function of random variable X is given by Moment generating function.
Modern Navigation Thomas Herring
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Review of Probability.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 26.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
7 sum of RVs. 7-1: variance of Z Find the variance of Z = X+Y by using Var(X), Var(Y), and Cov(X,Y)
HMM - Part 2 The EM algorithm Continuous density HMM.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Lecture 2: Statistical learning primer for biologists
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Maximum Likelihood Estimation
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning 5. Parametric Methods.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.
Conditional Expectation
Probability and Statistics for Biomedical Engineering ( 의료통계학 ) 확률변수 Probability Distribution Probability & Statistics for Biomedical Engineering (2015.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Basics of Multivariate Probability
Stat 223 Introduction to the Theory of Statistics
MECH 373 Instrumentation and Measurements
CS 2750: Machine Learning Review
Maximum Likelihood Estimate
Oliver Schulte Machine Learning 726
Stat 223 Introduction to the Theory of Statistics
Probability Theory and Parameter Estimation I
CS 2750: Machine Learning Density Estimation
Ch3: Model Building through Regression
CS 2750: Machine Learning Probability Review Density Estimation
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Bayesian Models in Machine Learning
Expectation Maximization Mixture Models HMMs
POINT ESTIMATOR OF PARAMETERS
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Stat 223 Introduction to the Theory of Statistics
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj Week Apr /18797.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

A gentle introduction to Gaussian distribution

Review Random variable Coin flip experiment X = 0X = 1 X: Random variable

Review Probability mass function (discrete) x 01 P(x) P(x) >= 0 Example: Coin flip experiment Any other constraints? Hint: What is the sum?

Review Probability density function (continuous) f(x) x f(x) >= 0 Examples? Unlike discrete, Density function does not represent probability but its rate of change called the “likelihood”

Review Probability density function (continuous) f(x) x f(x) >= 0 x0x0 X 0 +dx P( x 0 < x < x 0 +dx ) = f(x 0 ).dx But, P( x = x 0 ) = 0 & Integrates to 1.0

The Gaussian Distribution Courtesy:

A 2D Gaussian

Central Limit Theorem The distribution of the sum of N i.i.d. random variables becomes increasingly Gaussian as N grows. Example: N uniform [0,1] random variables.

Central Limit Theorem (Coin flip) Flip coin N times Each outcome has an associated random variable X i (=1, if heads, otherwise 0) Number of heads N H is a random variable –Sum of N i.i.d. random variables N H = x 1 + x 2 + …. + x N

Central Limit Theorem (Coin flip) Probability mass function of N H –P(Head) = 0.5 (fair coin) N = 5N = 10N = 40

Geometry of the Multivariate Gaussian

Moments of the Multivariate Gaussian (1) thanks to anti-symmetry of z

Moments of the Multivariate Gaussian (2)

Maximum likelihood Fit a probability density model p(x | θ) to the data –Estimate θ Given independent identically distributed (i.i.d.) data X = (x 1, x 2, …, x N ) –Likelihood –Log likelihood Maximum likelihood: Maximize ln p(X | θ) w.r.t. θ

Maximum Likelihood for the Gaussian (1) Given i.i.d. data, the log likelihood function is given by Sufficient statistics

Maximum Likelihood for the Gaussian (2) Set the derivative of the log likelihood function to zero, and solve to obtain Similarly

Mixtures of Gaussians (1) Old Faithful data set Single GaussianMixture of two Gaussians

Mixtures of Gaussians (2) Combine simple models into a complex model: Component Mixing coefficient K=3

Mixtures of Gaussians (3)

Mixtures of Gaussians (4) Determining parameters ¹, §, and ¼ using maximum log likelihood Solution: use standard, iterative, numeric optimization methods or the expectation maximization algorithm (Chapter 9). Log of a sum; no closed form maximum.

Thank you!