Computer vision: models, learning and inference Chapter 3 Common probability distributions.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Computer vision: models, learning and inference Chapter 8 Regression.
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
Bayesian Estimation in MARK
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Presenting: Assaf Tzabari
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Computer vision: models, learning and inference
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Pattern Recognition Topic 2: Bayes Rule Expectant mother:
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Lecture II-2: Probability Review
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Computer vision: models, learning and inference Chapter 5 The Normal Distribution.
Crash Course on Machine Learning
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Chapter Two Probability Distributions: Discrete Variables
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Computer vision: models, learning and inference
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
MathematicalMarketing Slide 2.1 Descriptive Statistics Chapter 2: Descriptive Statistics We will be comparing the univariate and matrix formulae for common.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Computer vision: models, learning and inference Chapter 19 Temporal models.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Probability and statistics review ASEN 5070 LECTURE.
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Dirichlet Distribution
Fitting normal distribution: ML 1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Inference: Multiple Parameters
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Computer vision: models, learning and inference
Probability Theory and Parameter Estimation I
Appendix A: Probability Theory
CH 5: Multivariate Methods
Computer vision: models, learning and inference
Computer vision: models, learning and inference
Bayes Net Learning: Bayesian Approaches
Computer vision: models, learning and inference
Distributions and Concepts in Probability Theory
Computer vision: models, learning and inference
More about Posterior Distributions
Advanced Pattern Recognition
The Multivariate Normal Distribution, Part 2
More Parameter Learning, Multinomial and Continuous Variables
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Mathematical Foundations of BME Reza Shadmehr
Probabilistic Surrogate Models
Presentation transcript:

Computer vision: models, learning and inference Chapter 3 Common probability distributions

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Why model these complicated quantities? Because we need probability distributions over model parameters as well as over data and world state. Hence, some of the distributions describe the parameters of the others: 3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Why model these complicated quantities? Because we need probability distributions over model parameters as well as over data and world state. Hence, some of the distributions describe the parameters of the others: Example: Models mean Models variance Parameters modelled by: 4

Bernoulli Distribution or For short we write: Bernoulli distribution describes situation where only two possible outcomes y =0/ y =1 or failure/success Takes a single parameter 5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Beta Distribution Defined over data (i.e. parameter of Bernoulli) Two parameters  both > 0 Mean depends on relative values E[ ] = . Concentration depends on magnitude For short we write: 6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution or can think of data as vector with all elements zero except k th e.g. e 4 = [0,0,0,1,0] For short we write: Categorical distribution describes situation where K possible outcomes y=1… y=k. Takes K parameters where 7

Dirichlet Distribution Defined over K values where Or for short: Has k parameters  k >0 8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Univariate Normal Distribution For short we write: Univariate normal distribution describes single continuous variable. Takes 2 parameters  and  2 >0 9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Normal Inverse Gamma Distribution Defined on 2 variables  and  2 >0 or for short Four parameters  and  10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Multivariate Normal Distribution For short we write: Multivariate normal distribution describes multiple continuous variables. Takes 2 parameters a vector containing mean position,  a symmetric “positive definite” covariance matrix  Positive definite: is positive for any real 11

Types of covariance Covariance matrix has three forms, termed spherical, diagonal and full 12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Normal Inverse Wishart Defined on two variables: a mean vector  and a symmetric positive definite matrix, . or for short: Has four parameters a positive scalar,  a positive definite matrix  a positive scalar,  a vector  13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Samples from Normal Inverse Wishart 14 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince (dispersion)(ave. Covar)(disper of means)(ave. of means)

Conjugate Distributions The pairs of distributions discussed have a special relationship: they are conjugate distributions Beta is conjugate to Bernouilli Dirichlet is conjugate to categorical Normal inverse gamma is conjugate to univariate normal Normal inverse Wishart is conjugate to multivariate normal 15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Conjugate Distributions When we take product of distribution and it’s conjugate, the result has the same form as the conjugate. For example, consider the case where then a constantA new Beta distribution 16 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

When we take product of distribution and it’s conjugate, the result has the same form as the conjugate. Example proof 17 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Bayes’ Rule Terminology Posterior – what we know about y after seeing x Prior – what we know about y before seeing x Likelihood – propensity for observing a certain value of x given a certain value of y Evidence – a constant to ensure that the left hand side is a valid distribution 18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Importance of the Conjugate Relation 1 Learning parameters: 1.Choose prior that is conjugate to likelihood 2. Implies that posterior must have same form as conjugate prior distribution 3. Posterior must be a distribution which implies that evidence must equal constant  from conjugate relation 19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Importance of the Conjugate Relation 2 Marginalizing over parameters 1. Chosen so conjugate to other term 2. Integral becomes easy --the product becomes a constant times a distribution Integral of constant times probability distribution = constant times integral of probability distribution = constant x 1 = constant 20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Conclusions 21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Presented four distributions which model useful quantities Presented four other distributions which model the parameters of the first four They are paired in a special way – the second set is conjugate to the other In the following material we’ll see that this relationship is very useful