Bayes for Beginners Luca Chech and Jolanda Malamud

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
A/Prof Geraint Lewis A/Prof Peter Tuthill
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Expected Value (Mean), Variance, Independence Transformations of Random Variables Last Time:
Estimating the Transfer Function from Neuronal Activity to BOLD Maria Joao Rosa SPM Homecoming 2008 Wellcome Trust Centre for Neuroimaging.
Hypothesis Testing. Central Limit Theorem Hypotheses and statistics are dependent upon this theorem.
Graziella Quattrocchi & Louise Marshall Methods for Dummies 2014
Statistical Decision Theory
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Bayes for Beginners Presenters: Shuman ji & Nick Todd.
METHODSDUMMIES BAYES FOR BEGINNERS. Any given Monday at pm “I’m sure this makes sense, but you lost me about here…”
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
BA 201 Lecture 6 Basic Probability Concepts. Topics Basic Probability Concepts Approaches to probability Sample spaces Events and special events Using.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Bayes for Beginners Reverend Thomas Bayes ( ) Velia Cardin Marta Garrido.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
DCM – the theory. Bayseian inference DCM examples Choosing the best model Group analysis.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Making sense of randomness
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Populations III: evidence, uncertainty, and decisions Bio 415/615.
Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharmaceuticals Inc.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Hypothesis Testing. Central Limit Theorem Hypotheses and statistics are dependent upon this theorem.
Course on Bayesian Methods in Environmental Valuation
Probability. Probability Probability is fundamental to scientific inference Probability is fundamental to scientific inference Deterministic vs. Probabilistic.
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Bayes for Beginners Anne-Catherine Huys M. Berk Mirza Methods for Dummies 20 th January 2016.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Bayesian Estimation and Confidence Intervals Lecture XXII.
Lecture 1.31 Criteria for optimal reception of radio signals.
Bayesian Estimation and Confidence Intervals
Chapter 4 Probability.
Presented by: Karen Miller
Bayes Net Learning: Bayesian Approaches
Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic
Basic Probability Theory
Learn to let go. That is the key to happiness. ~Jack Kornfield
Review of Probability and Estimators Arun Das, Jason Rebello
Two-sided p-values (1.4) and Theory-based approaches (1.5)
More about Posterior Distributions
Advanced Artificial Intelligence
Bayesian Inference, Basics
Statistical NLP: Lecture 4
Wellcome Trust Centre for Neuroimaging
LECTURE 07: BAYESIAN ESTIMATION
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Bayes for Beginners Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13th February 2019

Outline Probability distributions Joint probability Marginal probability Conditional probability Bayes’ theorem Bayesian inference Coin toss example

“Probability is orderly opinion and inference from data is nothing other than the revision of such opinion in the light of relevant new information.” Eliezer S. Yudkowsky

Some applications

Probability distribution Discrete Continuous X P(X) PDF Height 2 1 1/100 1 … UK POPULATION 2 1/100 100 … … 100 1/100 P(X) PMF 𝑋 𝑃𝑀𝐹 𝑋 =1 X P(X) 1.8 m 1/100 P given by the area X 1.75≤𝑋≤1.85 1 2 … 100

Probability Probability of A occurring: P(A) Probability of B occurring: P(B) Joint probability (A AND B both occurring): P(A,B) So, you can have frequentist, or traditional, statistics which will calculate probability based on data without taking into account any previous data or knowledge. Prior knowledge is therefore ignored. 6

 

Marginal probability x x Y Y 𝑥,𝑦 𝑃 𝑋=𝑥,𝑌=𝑦 =1 0.5 𝑃 𝑌=1 =0.1+0.3=0.4 1 joint probability : 𝑃 𝑋=0,𝑌=1 =0.1 1 x Y 0.5 0.1 0.3 disease symptoms disease x 𝑥,𝑦 𝑃 𝑋=𝑥,𝑌=𝑦 =1 Y symptoms 𝑃 𝑌=1 =0.1+0.3=0.4 𝑃 𝑋=0 =0.1+0.5=0.6 𝑃 𝑋=𝑥 = 𝑦 𝑃(𝑋=𝑥,𝑌=𝑦)

Conditional probability What is the probability of A occurring, given that B has occurred? Probability of A given B? I.E you calculate the joint probability of A and B occurring 9

Conditional Probability joint probability : 𝑃 𝑋=0,𝑌=1 joint probability : 𝑃 𝑋=0,𝑌=1 =0.1 1 x Y 0.5 0.1 0.3 disease symptoms Conditional probability: 𝑃 𝑋=1 𝑌=1 = 0.3 0.1+0.3 = 3 4 𝑃 𝑋=1 𝑌=1 = 0.3 0.1+0.3 𝑃 𝑋=1 𝑌=1 =0.3 𝑃 𝑋=1 𝑌=1 𝑃 𝑋=0 𝑌=1 = 0.1 0.1+0.3 = 1 4 𝑃 𝑋=0 𝑌=1 = 0.1 0.1+0.3 𝑃 𝑋=0 𝑌=1 =0.1 𝑃 𝑋=0 𝑌=1 P(X|Y)= 𝑃(𝑋=𝑥,𝑌=𝑦) 𝑃(𝑌=𝑦)

Conditional probability: Example 𝑃 𝐶 = 1 100 𝑃 𝑁𝐶 = 99 100 𝑃 +|𝐶 = 90 100 𝑃 +|𝑁𝐶 = 8 100 𝑃 𝐶|+ = ??? 𝑃 + 𝐶 = 𝑃(+,𝐶) 𝑃(𝐶) 𝑃 𝐶,+ =𝑃 + 𝐶 ×𝑃 𝐶 = 9 1000 𝑃 𝐶,+ =𝑃 + 𝐶 ×𝑃 𝐶 = 90 100 × 1 100 𝑃 𝐶|+ = 𝑃(𝐶,+) 𝑃(+) 𝑃 + 𝐶 = 𝑃 𝐶,+ =𝑃(+|𝐶)×𝑃(𝐶) 𝑥 𝑃(𝑋,+ )=𝑃 𝐶,+ +𝑃(𝑁𝐶,+) 𝑥 𝑃(𝑋,+ ) 𝑃 + = 𝑃 + 𝑁𝐶 = 𝑃(+,𝑁𝐶) 𝑃(𝑁𝐶) 𝑃 +,𝑁𝐶 =𝑃 + 𝑁𝐶 ×𝑃 𝑁𝐶 = 8 100 × 99 100 = 792 10000 𝑃 +,𝑁𝐶 =𝑃(+|𝑁𝐶)×𝑃(𝑁𝐶)

Conditional probability: Example 𝑃 𝐶 = 1 100 𝑃 𝑁𝐶 = 99 100 𝑃 +|𝐶 = 90 100 𝑃 +|𝑁𝐶 = 8 100 𝑃 𝐶|+ = ??? 𝑃 𝐶|+ = 𝑃(𝐶,+) 𝑃(+) = 9 1000 9 1000 + 792 10000 ≅0.1

Derivation of Bayes’ theorem 𝑃 𝐴 𝐵 = 𝑃(𝐵|𝐴)×𝑃(𝐴) 𝑃(𝐵) 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵) 𝑃(𝐵) = 𝑃(𝐵|𝐴)×𝑃(𝐴) 𝑃(𝐵) 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵) 𝑃(𝐵) 1 𝑃 𝐵 𝐴 = 𝑃(𝐵∩𝐴) 𝑃(𝐴) = 𝑃(𝐴∩𝐵) 𝑃(𝐴) 𝑃 𝐵 𝐴 = 𝑃(𝐵∩𝐴) 𝑃(𝐴) 2 𝑃 𝐴∩𝐵 =𝑃 𝐵 𝐴 ×𝑃(𝐴)

Bayes’ theorem, alternative form 𝑃 𝐴 𝐵 = 𝑃(𝐵|𝐴)×𝑃(𝐴) 𝑃(𝐵)

Bayes’ theorem problems

Example 1 P(A) = probability of liver disease = 0.10 10% of patients in a clinic have liver disease. Five percent of the clinic’s patients are alcoholics. Amongst those patients diagnosed with liver disease, 7% are alcoholics. You are interested in knowing the probability of a patient having liver disease, given that he is an alcoholic. P(A) = probability of liver disease = 0.10 P(B) = probability of alcoholism = 0.05 P(B|A) = 0.07 P(A|B) = ? 𝑃 𝐴 𝐵 = 𝑃 𝐵 𝐴 ×𝑃 𝐴 𝑃 𝐵 = 0.07 × 0.10 0.05 =0.14 In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14 (14%)

Example 2 A disease occurs in 0.5% of the population A diagnostic test gives a positive result in: 99% of people with the disease 5% of people without the disease (false positive) A person receives a positive result What is the probability of them having the disease, given a positive result? 17

𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 = 0.99 𝑃(𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 0.005 𝑃 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 = 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ×𝑃 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 We know: 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 = 0.99 𝑃(𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 0.005 𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡) = ??? 18

𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 =𝑃 𝐷 𝑃𝑇 ×𝑃 𝐷 +𝑃 𝑃𝑇 ~𝐷 ×𝑃 ~𝐷 = 0.99×0.005 + 0.05×0.995 =0.005 Where: 𝑃 𝐷 = chance of having the disease 𝑃 ~𝐷 = chance of not having the disease Remember: 𝑃 ~𝐷 =1 −𝑃 𝐷 𝑃 𝑃𝑇 𝐷 = chance of positive test given that disease is present 𝑃 𝑃𝑇 ~𝐷 = chance of positive test given that the disease isn’t present In this case you calculate the denominator using the following variation on Bayes’s theorem. 19

𝑃 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 =0.99 ×0.005=0.09 𝑖.𝑒. 9% Therefore: 𝑃 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡 =0.99 ×0.005=0.09 𝑖.𝑒. 9% 20

Frequentist vs. Bayesian statistics

Frequentist models in practice Data X is random variable, while parameters 𝜽 are unknown but fixed We assume there is a true set of parameters, or true model of the world, and we are concerned with getting the best possible estimate We are interested in point estimates of parameters given the data 22

Bayesian models in practice Data X is fixed, while parameters 𝜃 are considered to be random variables There is no single set of parameters that denotes a true model of the world - we have parameters that are more or less probable We are interested in distribution of parameters given the data 23

Bayesian Inference Provides a dynamic model through which our belief is constantly updated as we add more data Ultimate goal is to calculate the posterior probability density, which is proportional to the likelihood (of our data being correct) and our prior knowledge Can be used as model for the brain (Bayesian brain), history and human behaviour

Bayes rule 𝑃 𝜃 𝐷 = 𝑃 𝐷 𝜃 × 𝑃 𝜃 𝑃 𝐷 ∝𝑃 𝐷 𝜃 × 𝑃 𝜃 Likelihood Prior Posterior 𝑃 𝜃 𝐷 = 𝑃 𝐷 𝜃 × 𝑃 𝜃 𝑃 𝐷 ∝𝑃 𝐷 𝜃 × 𝑃 𝜃 𝑃 𝐷 𝜃 × 𝑃 𝜃 𝑑𝜃 Evidence How good are our parameters given the data Prior knowledge is incorporated and used to update our beliefs about the parameters 25

Generative models Specify a joint probability distribution over all variables (observations and parameters)  requires a likelihood function and a prior: 𝑃 𝐷, 𝜃 𝑚 =𝑃 𝐷 𝜃, 𝑚 × 𝑃 𝜃 𝑚 ∝𝑃 𝜃 𝐷,𝑚 Model comparison based on the model evidence: 𝑃 𝐷 𝑚 = 𝑃 𝐷 𝜃, 𝑚 × 𝑃 𝜃 𝑚 𝑑𝜃

Principles of Bayesian Inference Formulation of a generative model Observation of data Model inversion – updating one’s belief Likelihood function 𝑃 𝐷 𝜃 Model Prior distribution 𝑃(𝜃) Measurement data D Posterior distribution 𝑃 𝜃 𝐷 ∝𝑃 𝐷 𝜃 ×𝑃(𝜃) Model evidence

Priors Priors can be of different sorts, e.g. empirical (previous data) uninformed principled (e.g. positivity constraints) shrinkage Conjugate priors = posterior 𝑃 𝐷 𝜃 is in the same family as the prior 𝑃 𝜃

𝑃 𝜃 𝐷 ∝𝑃 𝐷 𝜃 × 𝑃 𝜃 ∝𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑝𝑟𝑖𝑜𝑟 𝑃 𝜃 𝐷 ∝𝑃 𝐷 𝜃 × 𝑃 𝜃 ∝𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑝𝑟𝑖𝑜𝑟 effect of more informative prior distributions on the posterior distribution

𝑃 𝜃 𝐷 ∝𝑃 𝐷 𝜃 × 𝑃 𝜃 ∝𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑝𝑟𝑖𝑜𝑟 𝑃 𝜃 𝐷 ∝𝑃 𝐷 𝜃 × 𝑃 𝜃 ∝𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑝𝑟𝑖𝑜𝑟 effect of larger sample sizes on the posterior distribution

Example: Coin flipping model Someone flips a coin We don’t know if the coin is fair or not We are told only the outcome of the coin flipping

Example: Coin flipping model 1st Hypothesis: Coin is fair, 50% Heads or Tails 2nd Hypothesis: Both sides of the coin are heads, 100% Heads

Example: Coin flipping model 1st Hypothesis: Coin is fair, 50% Heads or Tails 𝑃 𝐴=𝑓𝑎𝑖𝑟 𝑐𝑜𝑖𝑛 =0.99 2nd Hypothesis: Both sides of the coin are heads, 100% Heads 𝑃 𝐴=𝑢𝑛𝑓𝑎𝑖𝑟 𝑐𝑜𝑖𝑛 =0.01

Example: Coin flipping model  

Example: Coin flipping model  

Example: Coin flipping model Coin is flipped a second time and it is heads again Posterior in the previous time step becomes the new prior!!

Example: Coin flipping model  

Hypothesis testing Classical Define the null hypothesis H0: Coin is fair θ=0.5 Bayesian Inference Define a hypothesis H: θ>0.1     0.1

Example: Coin flipping model 𝐷= 𝑇 𝐻 𝑇 𝐻 𝑇 𝑇 𝑇 𝑇 𝑇 𝑇 and we think a priori that the coin is fair: 𝑃 𝑓𝑎𝑖𝑟 =0.8, 𝑃 𝑏𝑒𝑛𝑡 =0.2 Evidence for a fair model is: 𝑃 𝐷 𝑓𝑎𝑖𝑟 = 0.5 10 ≈0.001 And for a bent model: 𝑃 𝐷 𝑏𝑒𝑛𝑡 = 𝑃 𝑏𝑒𝑛𝑡 𝜃,𝐷 ×𝑃 𝜃 𝑏𝑒𝑛𝑡 𝑑𝜃 = 𝜃 2 × (1−𝜃) 8 𝑑𝜃=𝐵(3,9)≈0.002 Posterior for the models: 𝑃 𝑓𝑎𝑖𝑟 𝐷 ∝0.001 × 0.8=0.0008 𝑃 𝑏𝑒𝑛𝑡 𝐷 ∝0.002 × 0.2=0.0004

vaguely expecting a horse, "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule."

References Previous MfD slides Bayesian statistics (a very brief introduction) – Ken Rice http://www.statisticshowto.com/bayes-theorem-problems/ Slides “Bayesian inference and generative models” of K.E. Stephan Introslides to probabilistic & unsupervised learning of M. Sahani Animations: https://blog.stata.com/2016/11/01/introduction-to-bayesian-statistics-part-1-the-basic-concepts/