Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Basics of Statistical Estimation
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Parameter Estimation using likelihood functions Tutorial #1
Visual Recognition Tutorial
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Bayesianness, cont’d Part 2 of... 4?. Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?
Bayesian wrap-up (probably). Administrivia My schedule has been chaos... Thank you for your understanding... Feedback on the student lectures? HW2 not.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Bayesian learning finalized (with high probability)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
Computer vision: models, learning and inference
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Today Wrap up of probability Vectors, Matrices. Calculus
Recitation 1 Probability Review
Chapter Two Probability Distributions: Discrete Variables
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
Machine Learning Queens College Lecture 3: Probability and Statistics.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Tim Marks, Dept. of Computer Science and Engineering Random Variables and Random Vectors Tim Marks University of California San Diego.
Confidence Interval & Unbiased Estimator Review and Foreword.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Applied statistics Usman Roshan.
Usman Roshan CS 675 Machine Learning
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
More about Posterior Distributions
CSCI B609: “Foundations of Data Science”
Computing and Statistical Data Analysis / Stat 8
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Bayesian Wrap-Up (probably)

Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15

Retrospective/prospectiv e Last time: Maximum likelihood IID samples The MLE recipe Today: Finish up MLE recipe Bayesian posterior estimation

Exercise Find the maximum likelihood estimator of μ for the univariate Gaussian: Find the maximum likelihood estimator of β for the degenerate gamma distribution: Hint: consider the log of the likelihood fns in both cases

Solutions PDF for one data point: Joint likelihood of N data points:

Solutions Log-likelihood:

Solutions Log-likelihood: Differentiate w.r.t.  :

Solutions Log-likelihood: Differentiate w.r.t.  :

Solutions Log-likelihood: Differentiate w.r.t.  :

Solutions Log-likelihood: Differentiate w.r.t.  :

Example 1-d Gaussian w/ σ=1, unknown μ x 1 =4.35

Example 1-d Gaussian w/ σ=1, unknown μ x 1 =4.35

Example 1-d Gaussian w/ σ=1, unknown μ x 1 =4.35

Example 1-d Gaussian w/ σ=1, unknown μ x 1 =4.35

L(μ,x1)L(μ,x1)

Example 1-d Gaussian w/ σ=1, unknown μ x 1 =4.35, x 2 =3.12, x 3 =4.91

Solutions What about for the gamma PDF?

Putting the parts together [X,Y][X,Y] complete training data

Putting the parts together Assumed distribution family (hyp. space) w/ parameters Θ Parameters for class a: Specific PDF for class a

Putting the parts together

Gaussian Distributions

5 minutes of math... Recall your friend the Gaussian PDF: I asserted that the d-dimensional form is: Let’s look at the parts...

5 minutes of math...

Ok, but what do the parts mean? Mean vector, : mean of data along each dimension

5 minutes of math... Covariance matrix Like variance, but describes spread of data

5 minutes of math... Note: covariances on the diagonal of are same as standard variances on that dimension of data But what about skewed data?

5 minutes of math... Off-diagonal covariances ( ) describe the pairwise variance How much x i changes as x j changes (on avg)

5 minutes of math... Calculating from data: In practice: you want to measure the covariance between every pair of random variables (dimensions): Or, in linear algebra:

5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one RV (regardless of what happens to the others) Marginal PDF of or :

5 minutes of math... Conditional probabilities Suppose you have a joint PDF, f(H,W) Now you get to see one of the values, e.g., H=“183cm” What’s your probability estimate of W, given this new knowledge?

5 minutes of math... Conditional probabilities Suppose you have a joint PDF, f(H,W) Now you get to see one of the values, e.g., H=“183cm” What’s your probability estimate of A, given this new knowledge?

5 minutes of math... From cond prob. rule, it’s 2 steps to Bayes’ rule: (Often helps algebraically to think of “given that” operator, “|”, as a division operation)

Everything’s random... Basic Bayesian viewpoint: Treat (almost) everything as a random variable Data/independent var: X vector Class/dependent var: Y Parameters: Θ E.g., mean, variance, correlations, multinomial params, etc. Use Bayes’ Rule to assess probabilities of classes Allows us to say: “It is is very unlikely that the mean height is 2 light years”

Uncertainty over params Maximum likelihood treats parameters as (unknown) constants Job is just to pick the constants so as to maximize data likelihood Fullblown Bayesian modeling treats params as random variables PDF over parameter variables tells us how certain/uncertain we are about the location of that parameter Also allows us to express prior beliefs (probabilities) about params