Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
AI – CS364 Uncertainty Management Introduction to Uncertainty Management 21 st September 2006 Dr Bogdan L. Vrusias
Confidence Intervals © Scott Evans, Ph.D..
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
A/Prof Geraint Lewis A/Prof Peter Tuthill
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
Binomial Probability Distribution.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Expected Value (Mean), Variance, Independence Transformations of Random Variables Last Time:
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Probability, Bayes’ Theorem and the Monty Hall Problem
Graziella Quattrocchi & Louise Marshall Methods for Dummies 2014
1 Probability and Statistics  What is probability?  What is statistics?
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Bayes for Beginners Presenters: Shuman ji & Nick Todd.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
11-1 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Probability and Statistics Chapter 11.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Bayesian statistics Probabilities for everything.
NLP. Introduction to NLP Formula for joint probability –p(A,B) = p(B|A)p(A) –p(A,B) = p(A|B)p(B) Therefore –p(B|A)=p(A|B)p(B)/p(A) Bayes’ theorem is.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Uncertainty in Expert Systems
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
4 Proposed Research Projects SmartHome – Encouraging patients with mild cognitive disabilities to use digital memory notebook for activities of daily living.
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
INTRODUCTION TO CLINICAL RESEARCH Introduction to Statistical Inference Karen Bandeen-Roche, Ph.D. July 12, 2010.
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 71 Lecture 7 Bayesian methods: a refresher 7.1 Principles of the Bayesian approach 7.2 The beta distribution.
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
Course on Bayesian Methods in Environmental Valuation
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Probability. Probability Probability is fundamental to scientific inference Probability is fundamental to scientific inference Deterministic vs. Probabilistic.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Bayes for Beginners Anne-Catherine Huys M. Berk Mirza Methods for Dummies 20 th January 2016.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Theoretical distributions: the Normal distribution.
Bayes’ Theorem Suppose we have estimated prior probabilities for events we are concerned with, and then obtain new information. We would like to a sound.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Lecture 1.31 Criteria for optimal reception of radio signals.
Statistics 200 Objectives:
Bayesian Estimation and Confidence Intervals
Bayesian approach to the binomial distribution with a discrete prior
BAYES and FREQUENTISM: The Return of an Old Controversy
Chapter 4 Probability.
Bayesian data analysis
Computer vision: models, learning and inference
Bayes Net Learning: Bayesian Approaches
Review of Probability and Estimators Arun Das, Jason Rebello
Bayesian Inference, Basics
Wellcome Trust Centre for Neuroimaging
Bayes for Beginners Luca Chech and Jolanda Malamud
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
28th September 2005 Dr Bogdan L. Vrusias
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic Supervisor: Dr Peter Zeidman 7th December 2016

What is Bayes? Frequentist statistics – p values and confidence intervals Bayesian statistics deal with conditional variability. Apply variable parameters to fixed data – update our beliefs based on new knowledge Wide range of applications So, you can have frequentist, or traditional, statistics which will calculate probability based on data without taking into account any previous data or knowledge. Prior knowledge is therefore ignored. 2

Probability Probability of A occurring: P(A) Probability of B occurring: P(B) Joint probability (A AND B both occurring): P(A,B) 3

 

Marginal Probability Diabetes No Diabetes Neuropathy 0.3 0.1 No neuropathy 0.5 Say you study a particular ward population. You will categorise patients into 4 distinct groups. Note that the total sum of probabilities must always be one. 5

Marginal Probability Joint Probability What are the chances of a patient having neuropathy? P(neuropathy) = 0.4 P(neuropathy) = ∑ P(neuropathy , diabetes) diabetes P(A) = ∑ P(A , B) B Comma denotes a joint probability Marginality = removed a variable, e.g removed dependence from X You can inference the marginal probability from the joint probability using the following equation. Sum rule. Joint Probability P(diabetes,neuropathy) = 0.5 6

Conditional probability What is the probability of a patient having neuropathy, given that he has diabetes? Probability of A given B? I.E you calculate the joint probability of A and B occurring 7

Conditional Probability This also works in reverse, i.e. Joint probability can be expressed as: P(A,B) = P(A|B)*P(B) P(A,B) = P(B|A)*P(A)   P(A,B) P(A)

Bayes’s equation Prior Likelihood Posterior Denominator First, prior knowledge is quantified in the form of a prior probability distribution for every parameter. Prior knowledge for various parameters can be based either through quantifying the knowledge of experts (expert elicitation) as well as from published studies or other sources (e.g., textbooks). Second, the prior gets updated by how likely the raw data are, called likelihood. Likelihood function is the Bayesian term for the results from analyzing the new data with a certain model (ie given that value of A, what is likelihood of reaching B?). This new knowledge is again expressed as a probability distribution. Third, updating priors with likelihood yields a posterior probability distribution, which is our final probabilistic estimate for the parameters of interest. Posterior expresses our beliefs. Posterior Denominator 9

Posterior probability distribution is the combination of the prior and likelihood probabilities using Bayes’ formula. When we do not have plenty of data the posterior distribution is overwhelmed by the prior distribution. In contrast, when we have a lot of new information, the new data take the leading role in the formation of the posterior distribution. 10

P(A) = probability of liver disease = 0.10 Example 1 10% of patients in a clinic have liver disease. Five percent of the clinic’s patients are alcoholics. Amongst those patients diagnosed with liver disease, 7% are alcoholics. You are interested in knowing the probability of a patient having liver disease, given that he is an alcoholic. P(A) = probability of liver disease = 0.10 P(B) = probability of alcoholism = 0.05 P(B|A) = 0.07 P(A|B) = ? (A|B) = (0.07 * 0.1)/0.05 = 0.14 In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14 (14%)

Example 2 A disease occurs in 0.5% of population A diagnostic test gives a positive result in: 99% of people with the disease 5% of people without the disease (false positive) A person receives a positive result. What is the probability of them having the disease, given a positive result? 12

P(positive test/disease) = 0.99 P(disease) = 0.005 P(disease|positive test result) = P(positive test/disease) x P (disease) P(positive test) We know: P(positive test/disease) = 0.99 P(disease) = 0.005 P(positive test) = ??? 13

P(B) = P(B|A) * P(A) + P(B|~A) * P(~A) = (0.99 * 0.005) + (0.05 * 0.995) = 0.055 Where: P(A) = chance of disease P(~A) = chance of not having the disease. Remember: P (~A) = 1 – P(A) P(B|A) = chance of positive test given that disease is present P(B|~A) = chance of positive test given that the disease isn’t present In this case you calculate the denominator using the following variation on Bayes’s theorem. 14

P(disease/positive test result) = 0.99 x 0.005 = 0.09 i.e. 9% 0.055 Therefore: P(disease/positive test result) = 0.99 x 0.005 = 0.09 i.e. 9% 0.055 15

Bayesian Statistics Provides a dynamic model through which our belief is constantly updated as we add more data. Ultimate goal is to calculate the posterior probability density, which is proportional to the likelihood (of our data being correct) and our prior knowledge. Can be used as model for the brain (Bayesian brain), history and human behaviour.

Frequentist vs. Bayesian statistics

Frequentist models in practice Model: y = Xθ + ε Data, X, is random variable, while parameters, θ, are fixed. Hence, we assume there is a true set of parameters, or true model of the world, and we are concerned with getting the best possible estimate. We are interested in point estimates of parameters given the data. 18

Bayesian models in practice Model: y = Xθ + ε Data, X, is fixed, while parameters, θ, are considered to be random variables. There is no single set of parameters that denotes a true model of the world - we have parameters that are more or less probable. We are interested in distribution of parameters given the data. 19

Bayes rule – slightly different form Likelihood Prior P(X|θ) x P(θ) Posterior P(θ|X) = P(X) Marginal How good are our parameters given the data Prior knowledge is incorporated and used to update our beliefs about the parameters 20

Coin flipping model Someone flips coin. We don’t know if the coin is fair or not. We are told only the outcome of the coin flipping.

Coin flipping model 1st Hypothesis: Coin is fair, 50% Heads or Tails 2nd Hypothesis: Both side of the coin is heads, 100% Heads

Coin flipping model 1st Hypothesis: Coin is fair, 50% Heads or Tails 2nd Hypothesis: Both side of the coin is heads, 100% Heads P(A = fair coin) = 0.99 P(-A = unfair coin) = 0.01

Coin flipping model  

Coin flipping model  

Coin flipping model  

Coin flipping model Coin is flipped a second time and it is heads again. Posterior in the previous time step becomes the new prior!!

Coin flipping model  

Hypothesis testing Classical Define the null hypothesis H0: Coin is fair θ=0.5 Bayesian Inference Define a hypothesis H: θ>0.1     0.1

Model Selection

Model Selection Marginal likelihood Bayes Factor  

Model Selection

Bayesian Models of Cognition

Multi-modal sensory integration How wide is the pen? The pen is 8 mm wide There is a 95% chance that the pen is between 7.5 and 8.49 mm wide Probability density function (PDF) Represents both the average estimate of the quantity itself and the confidence in that estimate Probability precision O’Reilly et al, EJN 2012(35), 1169-72

Multi-modal sensory integration Humans do show near-Bayesian behaviour in multi-sensory integration tasks Non-optimal bias to give more weight to one sensory modality than another VISION PROPRIOCEPTION Van Beers et al, Exp Brain Res 1999;125:43-9

P(width|touch, vision) P(touch, vision|width) * P(width) Multi-modal sensory integration P(width|touch, vision) P(touch, vision|width) * P(width) The posterior estimate is biased towards the prior mean Prior permits to increase accuracy, useful considering uncertainty of observations Posterior Observed Prior 5 7 Width (mm) O’Reilly et al, EJN 2012(35), 1169-72

Multi-modal sensory integration The Muller-Lyer Illusion Priors could be acquired trough long experience with the environment Some others priors seem to be innate

References Previous MfD slides Bayesian statistics: a comprehensive course – video tutorials https://www.youtube.com/watch?v=U1HbB0ATZ_A&index=1&list=PLFDbGp5YzjqXQ4oE4w9GVWdiokWB9gEpm Bayesian statistics (a very brief introduction) – Ken Rice http://www.statisticshowto.com/bayes-theorem-problems/