1 Bayesian Essentials Slides by Peter Rossi and David Madigan.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Bayesian inference of normal distribution
Pattern Recognition and Machine Learning
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian Estimation in MARK
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Sampling Distributions (§ )
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Bayesian Essentials and Bayesian Regression
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
A Discussion of the Bayesian Approach Reference: Chapter 1 and notes from Dr. David Madigan.
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Introduction to Bayesian Parameter Estimation
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Mixture Modeling Chongming Yang Research Support Center FHSS College.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Bayesian Essentials and Bayesian Regression. 2 Distribution Theory 101 Marginal and Conditional Distributions: X Y 1 1 uniform.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Bayesian Inference: Multiple Parameters
Markov Chain Monte Carlo in R
MCMC Output & Metropolis-Hastings Algorithm Part I
Probability Theory and Parameter Estimation I
Artificial Intelligence
Multiple Imputation.
CAP 5636 – Advanced Artificial Intelligence
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Bayesian Inference, Basics
Statistical NLP: Lecture 4
CS 188: Artificial Intelligence
Pattern Recognition and Machine Learning
CHAPTER 15 SUMMARY Chapter Specifics
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
CS639: Data Management for Data Science
Applied Statistics and Probability for Engineers
Classical regression review
Presentation transcript:

1 Bayesian Essentials Slides by Peter Rossi and David Madigan

2 Distribution Theory 101 Marginal and Conditional Distributions: X Y 1 1 uniform

3 Simulating from Joint To draw from the joint: i. Draw from marginal on X ii. Condition on this draw, and draw from conditional of Y|X library(triangle) x <- rtriangle(NumDraws,0,1,1) y <- runif(NumDraws,0,x) plot(x,y)

4 Triangular Distribution If U~ unif(0,1), then: sqrt(U) has the standard triangle distribution If U1, U2 ~ unif(0,1), then: Y=max{U1,U2} has the standard triangle distribution

Sampling Importance Resampling 5 f g draw a big sample from g sub-sample from that sample with probability f/g

Metropolis 6 start with current = 0.5 to get the next value: draw a “proposal” from g keep with probability f(proposal)/f(current) else keep current f g

7 The Goal of Inference Make inferences about unknown quantities using available information. Inference -- make probability statements unknowns -- parameters, functions of parameters, states or latent variables, “future” outcomes, outcomes conditional on an action Information – data-based non data-based theories of behavior; subjective views; mechanism parameters are finite or in some range

8 p(θ|D) α p(D| θ) p(θ) Posterior α “Likelihood” × Prior Modern Bayesian computing– simulation methods for generating draws from the posterior distribution p(θ|D). Bayes theorem

9 Summarizing the posterior Output from Bayesian Inference: A possibly high dimensional distribution Summarize this object via simulation: marginal distributions of don’t just compute Contrast with Sampling Theory: point est/standard error summary of irrelevant dist bad summary (normal) Limitations of asymptotics

10 Metropolis Start somewhere with θ current To get the next value, generate a proposal θ proposal Accept with “probability”: else keep currrent

11 Example Believe these measurements (D) come from N(μ,1): Prior for μ? p(μ) = 2μ

12 Example continued p(D|μ)? y 1,…,y 10 switch to R… other priors? unif(0,1), norm(0,1), norm(0,100) generating good candidates?

13 Prediction See D, compute : “Predictive Distribution” future observable

14 Bayes/Classical Estimators Prior washes out – locally uniform!!! Bayes is consistent unless you have dogmatic prior.

15 Bayesian Computations Before simulation methods, Bayesians used posterior expectations of various functions as summary of posterior. If p(θ|D) is in a convenient form (e.g. normal), then I might be able to compute this for some h.

16 Conjugate Families Models with convenient analytic properties almost invariably come from conjugate families. Why do I care now? - conjugate models are used as building blocks - build intuition re functions of Bayesian inference Definition: A prior is conjugate to a likelihood if the posterior is in the same class of distributions as prior. Basically, conjugate priors are like the posterior from some imaginary dataset with a diffuse prior.

17 Beta-Binomial model Need a prior!

18 Beta distribution

19 Posterior

20 Prediction

21 Regression model

22 Bayesian Regression Prior: Inverted Chi-Square: Interpretation as from another dataset. Draw from prior?

23 Posterior

24 Combining quadratic forms

25 Posterior

26 IID Simulations 3) Repeat 1) Draw [  2 | y, X] 2) Draw [  |  2,y, X] Scheme: [y|X, ,  2 ] [  |  2 ] [  2 ] [ ,  2 |y,X]  [  2 | y,X] [  |  2,y,X]

27 IID Simulator, cont.