A correction on notation (Thanks Emma and Melissa)

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.

Point estimation, interval estimation

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.

458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~

Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.

Topic 2: Statistical Concepts and Market Returns

Inference about a Mean Part II

458 Fitting models to data – III (More on Maximum Likelihood Estimation) Fish 458, Lecture 10.

Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.

Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,

INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Statistical Decision Theory

The Triangle of Statistical Inference: Likelihoood

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

Random Sampling, Point Estimation and Maximum Likelihood.

Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.

INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.

Lab 3b: Distribution of the mean

Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.

Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Confidence Interval & Unbiased Estimator Review and Foreword.

26134 Business Statistics Tutorial 12: REVISION THRESHOLD CONCEPT 5 (TH5): Theoretical foundation of statistical inference:

Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.

INTRODUCTION TO CLINICAL RESEARCH Introduction to Statistical Inference Karen Bandeen-Roche, Ph.D. July 12, 2010.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.

Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.

Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.

Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:

FIXETH LIKELIHOODS this is correct. Bayesian methods I: theory.

SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

Bayesian Estimation and Confidence Intervals Lecture XXII.

Lecture 1.31 Criteria for optimal reception of radio signals.

Bayesian Estimation and Confidence Intervals

Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING

Probability Theory and Parameter Estimation I

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

More about Posterior Distributions

Modelling data and curve fitting

Discrete Event Simulation - 4

CS639: Data Management for Data Science

Presentation transcript:

A correction on notation (Thanks Emma and Melissa)

Bayes’ Theorem (probabilities) What is the posterior probability that hypothesis H i is correct? This is the classical Bayes’ Theorem. posterior probabilityprior probability of observing the data

Bayes’ Theorem (likelihoods) posterior likelihoodprior probability of observing the data likelihood of H i given the observed data = probability of the data given known values of H i Therefore because this is the same height on the same probability distribution

Bayesian methods II

Bayes in words If you observe some data, then you can calculate the posterior probability of a particular hypothesis H i as a function of the likelihood of the hypothesis given the observed data, combined with the prior information you have about the likelihood of that hypothesis

Application of Bayesian methods Gmail uses Bayesian spam filtering – Every user has unique prior probabilities that each word is a spam word – These are calculated from your actions in either replying to a message, saving it, or marking it as spam – E.g. for me the prior probability of spam for different words is probably something like this “fisheries”, “fish458”, “blue whales”, “ONE HUNDRED MILLION DOLLARS”, 0.1 “journal”, 0.9 “CONGRATULATION!!!” 0.7 “to formally invite you” – Take all the prior probabilities for me for all the words in a message, multiply them by the likelihood each word is spam, and each message gets a posterior probability that the whole is spam Spam that sneaks through is likely to be something like “Free blue whale cruise for university professors teaching quantitative modeling, click here!”

Recovering Antarctic blue whales? Largest animal to ever live Whaling 1904–1973 Catches peaked at 30,365 in 1930 Current abundance Are they recovering? Branch TA (2007) Abundance of Antarctic blue whales south of 60°S from three complete circumpolar sets of surveys. Journal of Cetacean Research and Management 9:

Why Bayesian? Prior on r allows use of outside information – What is the fastest they could increase? – How fast do other whale populations increase? What is the probability of competing hypotheses? (i.e. that r > 0)

Model and data Model years , parameters N 1973 and r the exponential rate of increase No catches (ended in 1972) Three abundance estimates, with known CV Lognormal likelihood No process error, CV add Branch, TA et al. (2004) Evidence for increases in Antarctic blue whales based on Bayesian modelling. Marine Mammal Science 20: Branch TA (2007) Abundance of Antarctic blue whales south of 60°S from three complete circumpolar sets of surveys. Journal of Cetacean Research and Management 9:

e.g. N 1981 term, take log, remove constant terms We will be running the model and calculating the likelihood thousands or even millions of times. Simplifying it really helps

Total NLL for all data years

Maximum likelihood estimates MLE r = N 1973 = 178 NLL = Antarctic blue grid.xlsx Year Abundance

Grid method 25 Antarctic blue grid.xlsx

Grid method for likelihoods Grid cell is likelihood for each value of r and N Dividing by the highest likelihood gives scaled likelihood for each cell To get an approximate likelihood profile on r, find the highest scaled likelihood for each value of r 26 Antarctic blue grid.xlsx, sheet “many cells”

Likelihoods: grid surface Approximate likelihood profile Each cell in the grid is the scaled likelihood for (a large number of) discrete values of r and N 1973 The line is the discrete N 1973 producing the highest likelihood Scaled likelihood close to zero Likelihood profile on this line MLE 26 Antarctic blue grid.xlsx, sheet “many cells” Scaled likelihood close to zero

Grid likelihood: r profile Grid gives only an approximate likelihood profile because for each value of r we are taking the lowest NLL among a finite number of N1973 values. As the number of N1973 values considered increases, the approximate profile gets smoother and smoother. The true profile would involve for each value of r finding the exact N1973 that minimizes the NLL. 26 Antarctic blue grid.xlsx, sheet “many cells” Value of r Scaled likelihood

Grid likelihood: r profile 26 Antarctic blue grid.xlsx, sheet “few cells” Small number of discrete r and N 1973 values leads to poor approximation of r profile Value of r Scaled likelihood

Grid method for posterior Cells contain likelihood×prior for each value of r and N Each cell is a hypothesis H i. The posterior probability of H i is: To get marginal posterior for a value of r, get the sum of each column divided by sum of all cells This is integration Sum of all cells Value in each cell is hypothesis H i of each value of r and N 1973 Posterior probability of individual pairs of r and N 1973 values 26 Antarctic blue grid.xlsx, sheet “many cells”

Bayesian: two differences Integration instead of maximization Priors

Integration not maximization Likelihood: for each value of r, search for N 1973 with the best NLL (stars) Bayesian: for each value of r, integrate (“add up”) cells across values of N 1973 Where the green/yellow area is very narrow, Bayesian integration will have smaller summed probability compared to the maximum value used in a likelihood profile Maximization line Integration column 26 Antarctic blue grid.xlsx, sheet “many cells” Integration column

Value of r Scaled likelihood Posterior distribution Likelihood profile vs. Bayesian posterior 26 Antarctic blue grid.xlsx, sheet “many cells”

Normal prior on r = N[0.062, ] Punt et al. (2010) looked at actual increase rates in depleted whale populations and found a mean of 6.2% and SD of 2.9% Multiply the likelihood by a prior for r that is normal with mean and SD Dropping constants, Punt AE & Allison C (2010) Appendix 2. Revised outcomes from the Bayesian meta-analysis, Annex D: Report of the sub-committee on the revised management procedure. Journal of Cetacean Research and Management (Suppl. 2) 11: No prior Normal prior Antarctic blue grid.xlsx, sheet “many cells”

Uniform r prior to MLE Zerbini et al. (2011) showed it is impossible for baleen whales to increase at more than 11.8% per year. Multiply the likelihood by the prior. The prior is 0 if r 0.118; and a uniform constant for ≤ r ≤ The constant is 1/( ) = Zero likelihood here Zerbini AN et al. (2010) Assessing plausible rates of population growth in humpback whales from life-history data. Mar. Biology 157: Antarctic blue grid.xlsx, sheet “many cells”

Effect of different priors Value of r Posterior probability 26 Antarctic blue grid.xlsx, sheet “compare all priors”

Likelihood profileBayesian posterior Uniform prior U[-0.1, 0.2] Prior N(0.062, ) median % credible interval median % credible interval MLE estimate % confidence interval Integration not maximization Informative prior Likelihood profile 95% confidence interval “out of 100 experiments, 95 of the calculated intervals will contain the true fixed value of r. Any single interval either contains the value or it does not.” Bayesian 95% credibility interval “there is a 95% probability that the true value of r is within the interval” 26 Antarctic blue grid.xlsx, sheet “many cells”

Grid method Two parameters, calculated 301 × 200 = 60,200 complete model calls to get those posteriors, took about 1 second If there were 10 parameters we would need points and it would take 30 quadrillion years to calculate all the points Need a more general method 26 Antarctic blue grid.xlsx, sheet “many cells”

SIR method

Problem with grid method You don’t know how fine to make the grid steps You really want steps to be continuous Instead of systematic sampling, the SIR method randomly samples the grid region Good guesses (draws) are kept and bad draws are discarded When enough draws have been saved from SIR to get a smooth posterior (1000 or 5000), then stop 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior Accept pair with probability X/Y, otherwise reject Note that X/Y = exp([-lnY] – [-lnX]) = exp(NLL(Y) – NLL(X) Accepted pairs are the posterior Repeat until you have sufficient accepted pairs 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

SIR: accepted, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r

20,000 samples, 296 accepted r = 0.072, 95% interval = – Grid method 0.072, N 1973 = 320, 95% interval = LOTS of rejected function calls (waste) Tricks almost always employed to increase acceptance rates – Accept with probability X/Z where Z is smaller than Y, will accept more draws, and some draws will be duplicated in the posterior (no time now) – Sample parameter values from the priors and compare ratios of likelihood only (no time now) 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

SIR threshold to increase acceptance rate Choose threshold Z where Z < maximum likelihood Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior If X ≤ Z, accept pair with probability X/Z If X > Z, accept multiple copies of X E.g. if X/Z = 4.6 then save 4 copies with probability 0.4 or 5 copies with probability Antarctic blue SIR.xlsx, sheet “Normal prior”

Accepted multiple times, accepted once, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r

Advantage of discrete samples Each draw that is saved is a sample from the posterior distribution We can take these pairs of (r, N 1973 ) and project the model into the future for each pair This gives us future predictions for the joint values of the parameters Takes into account correlations between parameter values