A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Let X 1, X 2,..., X n be a set of independent random variables having a common distribution, and let E[ X i ] = . then, with probability 1 Strong law.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Outline input analysis input analyzer of ARENA parameter estimation

Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.

Visual Recognition Tutorial

Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.

Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Time Series Basics Fin250f: Lecture 3.1 Fall 2005 Reading: Taylor, chapter

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Machine Learning CMPT 726 Simon Fraser University

Probability and Statistics Review

The moment generating function of random variable X is given by Moment generating function.

Modern Navigation Thomas Herring

Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.

Review of Probability.

Chapter Two Probability Distributions: Discrete Variables

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

PATTERN RECOGNITION AND MACHINE LEARNING

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 26.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Consistency An estimator is a consistent estimator of θ, if , i.e., if

7 sum of RVs. 7-1: variance of Z Find the variance of Z = X+Y by using Var(X), Var(Y), and Cov(X,Y)

HMM - Part 2 The EM algorithm Continuous density HMM.

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

Lecture 2: Statistical learning primer for biologists

Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.

Maximum Likelihood Estimation

Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.

Machine Learning 5. Parametric Methods.

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.

Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.

Conditional Expectation

Probability and Statistics for Biomedical Engineering ( 의료통계학 ) 확률변수 Probability Distribution Probability & Statistics for Biomedical Engineering (2015.

Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.

Basics of Multivariate Probability

Stat 223 Introduction to the Theory of Statistics

MECH 373 Instrumentation and Measurements

CS 2750: Machine Learning Review

Maximum Likelihood Estimate

Oliver Schulte Machine Learning 726

Stat 223 Introduction to the Theory of Statistics

Probability Theory and Parameter Estimation I

CS 2750: Machine Learning Density Estimation

Ch3: Model Building through Regression

CS 2750: Machine Learning Probability Review Density Estimation

Special Topics In Scientific Computing

Latent Variables, Mixture Models and EM

Bayesian Models in Machine Learning

Expectation Maximization Mixture Models HMMs

POINT ESTIMATOR OF PARAMETERS

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Stat 223 Introduction to the Theory of Statistics

LECTURE 09: BAYESIAN LEARNING

LECTURE 07: BAYESIAN ESTIMATION

Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj Week Apr /18797.

Clustering (2) & EM algorithm

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

A gentle introduction to Gaussian distribution

Review Random variable Coin flip experiment X = 0X = 1 X: Random variable

Review Probability mass function (discrete) x 01 P(x) P(x) >= 0 Example: Coin flip experiment Any other constraints? Hint: What is the sum?

Review Probability density function (continuous) f(x) x f(x) >= 0 Examples? Unlike discrete, Density function does not represent probability but its rate of change called the “likelihood”

Review Probability density function (continuous) f(x) x f(x) >= 0 x0x0 X 0 +dx P( x 0 < x < x 0 +dx ) = f(x 0 ).dx But, P( x = x 0 ) = 0 & Integrates to 1.0

The Gaussian Distribution Courtesy:

A 2D Gaussian

Central Limit Theorem The distribution of the sum of N i.i.d. random variables becomes increasingly Gaussian as N grows. Example: N uniform [0,1] random variables.

Central Limit Theorem (Coin flip) Flip coin N times Each outcome has an associated random variable X i (=1, if heads, otherwise 0) Number of heads N H is a random variable –Sum of N i.i.d. random variables N H = x 1 + x 2 + …. + x N

Central Limit Theorem (Coin flip) Probability mass function of N H –P(Head) = 0.5 (fair coin) N = 5N = 10N = 40

Geometry of the Multivariate Gaussian

Moments of the Multivariate Gaussian (1) thanks to anti-symmetry of z

Moments of the Multivariate Gaussian (2)

Maximum likelihood Fit a probability density model p(x | θ) to the data –Estimate θ Given independent identically distributed (i.i.d.) data X = (x 1, x 2, …, x N ) –Likelihood –Log likelihood Maximum likelihood: Maximize ln p(X | θ) w.r.t. θ

Maximum Likelihood for the Gaussian (1) Given i.i.d. data, the log likelihood function is given by Sufficient statistics

Maximum Likelihood for the Gaussian (2) Set the derivative of the log likelihood function to zero, and solve to obtain Similarly

Mixtures of Gaussians (1) Old Faithful data set Single GaussianMixture of two Gaussians

Mixtures of Gaussians (2) Combine simple models into a complex model: Component Mixing coefficient K=3

Mixtures of Gaussians (3)

Mixtures of Gaussians (4) Determining parameters ¹, §, and ¼ using maximum log likelihood Solution: use standard, iterative, numeric optimization methods or the expectation maximization algorithm (Chapter 9). Log of a sum; no closed form maximum.

Thank you!