FIXETH LIKELIHOODS this is correct. Bayesian methods I: theory.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Evaluating Classifiers
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Inference Sampling distributions Hypothesis testing.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Chapter 10: Hypothesis Testing
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Inference about a Mean Part II
BCOR 1020 Business Statistics
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
14. Introduction to inference
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Apr. 8 Stat 100. To do Read Chapter 21, try problems 1-6 Skim Chapter 22.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Hypotheses tests for means
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
CHAPTER 9 Testing a Claim
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Rejecting Chance – Testing Hypotheses in Research Thought Questions 1. Want to test a claim about the proportion of a population who have a certain trait.
Chapter 21: More About Tests
© Copyright McGraw-Hill 2004
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
Objective Evaluation of Intelligent Medical Systems using a Bayesian Approach to Analysis of ROC Curves Julian Tilbury Peter Van Eetvelt John Curnow Emmanuel.
Sampling Distributions Statistics Introduction Let’s assume that the IQ in the population has a mean (  ) of 100 and a standard deviation (  )
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
A correction on notation (Thanks Emma and Melissa)
Review Statistical inference and test of significance.
Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Estimation and Confidence Intervals
CHAPTER 9 Testing a Claim
Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic
Wellcome Trust Centre for Neuroimaging
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

FIXETH LIKELIHOODS this is correct

Bayesian methods I: theory

Road map Bayes’ equations Attempt to explain equations Bayesian = priors, likelihood = no priors Bayesian = integration; maximum likelihood = maximization How to: grid method How to: SIR (one variety) How to: MCMC (one variety)

Not even enough to be dangerous In three lectures I can’t really fully explain the philosophy, pitfalls, or methods completely I’m a practitioner, not a statistician I can give you a flavor for Bayesian methods I can remove your fear factor (hopefully not increase it!) when someone shows you a Bayesian analysis I can show you how to find Bayesian posteriors using the grid, MCMC, and SIR algorithms

What I want you to learn I want you to understand the concepts of likelihood, prior, and posterior The theory and philosophy is hard to grasp and I don’t expect everyone to master them I do expect everyone to understand how to program the SIR and MCMC algorithms, and find Bayesian posteriors (it’s easier than understanding the theory!)

Key fisheries references Punt AE & Hilborn R (1997) Fisheries stock assessment and decision analysis: the Bayesian approach. Reviews in Fish Biology and Fisheries 7:35-63 André Punt Ray HilbornMarc Mangel Hilborn R & Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, Princeton, New Jersey. Chapters 9-10 Murdoch McAllister McAllister MK, Pikitch EK, Punt AE & Hilborn R (1994) A Bayesian approach to stock assessment and harvest decisions using the Sampling/Importance Resampling algorithm. CJFAS 51: Ellen PikitchAndré PuntRay Hilborn

Probability If I flip a fair coin 10 times, what is the probability of it landing heads up every time? Given the fixed parameter (p = 0.5), what is the probability of different outcomes? Probabilities add up to 1. I flipped a coin 10 times and obtained 10 heads. What is the likelihood that the coin is fair? Given the fixed outcomes (data), what is the likelihood of different parameter values? Likelihoods do not add up to 1. Hypotheses (parameter values) are compared using likelihood values (higher = better). Likelihood FISH 458 background

Probability What is the probability that 5 ≤ x ≤ 10 given a normal distribution with µ = 13 and σ = 4? Answer: What is the probability that –1000 ≤ x ≤ 1000 given a normal distribution with µ = 13 and σ = 4? Answer: What is the likelihood that µ = 13 and σ = 4 if you observed a value of (a) x = 10 (answer: the likelihood is 0.075) (b) x = 14 (answer: the likelihood is 0.097) Conclusion: if the observed value was 14, it is more likely that the parameters are µ = 13 and σ = 4, because is higher than Likelihood Area under curve between 5 and 10 Height of curve at x = 14 Height of curve at x = 10

458: we’ve focused on likelihoods If hypothesis H i is true, what is the likelihood of observing particular types of data? H i are different values of the model parameters Maximum likelihood: which hypothesis H i would result in the maximum likelihood? AICc: which hypothesis is the best? AICc weight: how much weight should be given to each competing hypotheses? likelihood

Some philosophy What is a 95% confidence interval? Up until now we have answered this in a frequentist philosophy: there is some fixed true value of the parameters that does not move around (e.g. µ = 13). We collect some data and estimate a 95% confidence interval. Meaning: If we were to repeat our experiment an infinite number of times, each time collecting some data each time, and calculating a 95% confidence interval, then the true value of µ would fall within the interval 95% of the time.

What people think 95% CI means Most people think that a particular calculated 95% CI has a 95% probability of including the true mean µ. This is not true. A particular 95% CI either includes µ (100% probability) or does not include µ (0% probability). The concept only makes sense when you think about repeating your experiment an infinite number of times, this is the frequentist paradigm. In reality we only have a single data set and a single calculated 95% CI.

What does p<0.05 mean? This concept is also frequentist. If the true value of the parameters were µ = 0 and σ = 1 (normal distribution), and I repeated my experiment an infinite number of times, then the mean of my data will be greater than 1.96 or smaller than no more than 5% of the time. Q: If I run the experiment once, and obtain a mean of 2.5, is µ = 0? A: the p-value is <0.05 so µ is not 0.

Why do we not trust the detector? We are not frequentists We are Bayesians We have some prior knowledge about the sun: it is 4.6 billion years old; the probability of it exploding in any particular night is less than 1/(365*4.6 billion) = 5.9× Instinctively we weighed up this very small probability against the much higher probability that the neutron detector lied

The Bayesian paradigm What is the probability that the sun exploded? How can we combine our prior knowledge with the new data and likelihood to estimate the posterior probability that the sun exploded? The parameter values are not viewed as fixed, therefore we can estimate the probability of the parameter being a particular value

Bayes’ Theorem What is the posterior probability that hypothesis H i is correct? This is really what we want to know. posterior likelihoodprior some horrible term

Bayes’ Theorem: another form posterior likelihoodprior sum of {likelihood of all hypotheses given the data × prior information}

A digression on notation

Bayes’ Theorem (probabilities) What is the posterior probability that hypothesis H i is correct? posterior probabilityprior probability of observing the data

Bayes’ Theorem (likelihoods) posterior likelihoodprior probability of observing the data likelihood of H i given the observed data = probability of the data given known values of H i Therefore because this is the same height on the same probability distribution

Two hypotheses: H 1 (the sun exploded) and H 2 (sun did not explode) Prior(H 1 ) = 5.9×10 -13, Prior(H 2 ) = 1–5.9× P(YES|H 1 ) = 35/36 = P(YES|H 2 ) = 1/36 = P(H 1 |YES) = 2.1× (Bayesian answer: extremely low probability that the sun exploded)

Updated Bayesian beliefs The prior probability of the sun exploding was 5.9× The posterior probability of the sun exploding was 2.1× The Bayesian statistician still believes H 1 (the sun exploded) is unlikely but his belief is slightly lower than it was before the experiment Given repeated questioning, if the machine keeps on saying YES the Bayesian will eventually believe the sun has exploded

Likelihood framework: A murder is committed, DNA found at the scene is compared with a DNA database of people and a match is found. If that person is innocent, the chance of the match is 1 in 3 million. Maximum likelihood: the matched person is guilty. This is L(data|H i ), the likelihood of a match given that the person is innocent. Bayesian framework: the DNA database includes 10 million people. The prior probability, prior(H 1 ), of any one person in the database being innocent is very high ( )/10 7. We would expect 3-4 matches in the database purely by chance, so it is very likely that some poor innocent person will be matched to the DNA evidence! P(data) is the probability of finding a match in the database. The posterior probability P(H 1 |data) of that person being innocent given the DNA match is There is a 77% chance they are innocent! If the DNA were tested only against the 10 people in the neighborhood, and there was a match, the prior probability of innocence would be 9/10, much lower. Prosecutor’s fallacy

Complications P(data) is the probability of the data: the probability of there being a match (either the murderer is in the database, or there is a false match with an innocent person). In this case the calculation is “straightforward” For most problems we face it will be very hard or impossible to directly calculate P(data) Instead of calculating P(data), we can circumvent this calculation by finding the posterior P(H i |data) numerically Algorithms for finding the posterior include grid, MCMC, and SIR posterior likelihoodprior probability of the data

What is hypothesis H i ? Imagine a grid of parameters r and K, where r = 0, 0.05, 0.1 and K = 1000, 2000, There are 3×3 = 9 possible hypotheses to evaluate Now increase the number of values of r and K that are evaluated With “infinite” points the “hypotheses” are a continuous range of all possible values of r and K. We want the posterior probability of every value of r and K