Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.

Slides:



Advertisements
Similar presentations
STAT 101 Dr. Kari Lock Morgan
Advertisements

Tests of Significance and Measures of Association
Introducing Hypothesis Tests
Statistics: Unlocking the Power of Data Lock 5 Testing Goodness-of- Fit for a Single Categorical Variable Kari Lock Morgan Section 7.1.
Hypothesis Testing: Intervals and Tests
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Chapter 9 Hypothesis Testing.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Bayesian Inference SECTION 11.1, 11.2 Bayes rule (11.2) Bayesian inference.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Confidence Intervals and Hypothesis Tests
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Probability SECTIONS 11.1 Probability (11.1) Odds, odds ratio (not in book)
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Synthesis and Review 3/26/12 Multiple Comparisons Review of Concepts Review of Methods - Prezi Essential Synthesis 3 Professor Kari Lock Morgan Duke University.
Chapter 13: Inference in Regression
More Randomization Distributions, Connections
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Inference Concepts Hypothesis Testing. Confidence IntervalsSlide #2 Inference Sample Statistic Population Parameter Hypothesis/Significance Testing –assess.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/4/12 Bayesian Inference SECTION 11.1, 11.2 More probability rules (11.1)
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Bayesian Inference I 4/23/12 Law of total probability Bayes Rule Section 11.2 (pdf)pdf Professor Kari Lock Morgan Duke University.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Confidence intervals and hypothesis testing Petter Mostad
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
MATH 2400 Ch. 15 Notes.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Lecture: Forensic Evidence and Probability Characteristics of evidence Class characteristics Individual characteristics  features that place the item.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Probability SECTIONS 11.1, 11.2 Probability (11.1, 11.2) Odds, Odds Ratio.
INTRODUCTION TO CLINICAL RESEARCH Introduction to Statistical Inference Karen Bandeen-Roche, Ph.D. July 12, 2010.
PHANTOMS: A Method of Testing Hypotheses
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : 
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Review Statistical inference and test of significance.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
CHAPTER 12 More About Regression
Statistical inference: distribution, hypothesis testing
When we free ourselves of desire,
Lecture: Forensic Evidence and Probability Characteristics of evidence
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
CS639: Data Management for Data Science
Inference for Distributions of Categorical Data
Presentation transcript:

Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University

Project 2 Paper (Today, 5pm) Project 2 Paper Project 2 peer evaluations (Friday, 5pm) FINAL: Monday, 4/30, 9 – 12 To Do

Breast Cancer Screening 1% of women at age 40 who participate in routine screening have breast cancer. 80% of women with breast cancer get positive mammographies. 9.6% of women without breast cancer get positive mammographies.

4 CancerCancer-free Positive Result Negative Result If we randomly pick a ball from the Cancer bin, it’s more likely to be red/positive. If we randomly pick a ball the Cancer-free bin, it’s more likely to be green/negative. Everyone We randomly pick a ball from the Everyone bin. C C C C C FFFFFFFFFFFF FFFFFFFFFFF FFFFFFFFFF FFFFFFFFF FFFFFFFF FFFFFFF FFFFFF FFFFF If the ball is red/positive, is it more likely to be from the Cancer or Cancer-free bin?

5 100,000 women in the population 1% Thus, 800/(800+9,504) = 7.8% of positive results have cancer 1000 have cancer99,000 cancer-free 99% 80%20% 800 test positive 200 test negative 9.6%90.4% 9,504 test positive 89,496 test negative

Hypotheses H 0 : no cancer H a : cancer Data: positive mammography p-value = P(statistic as extreme as observed if H 0 true) = P(positive mammography if no cancer) = The probability of getting a positive mammography just by random chance, if the woman does not have cancer, is

Hypotheses H 0 : no cancer H a : cancer Data: positive mammography You don’t really want the p-value, you want the probability that the woman has cancer! You want P(H 0 true if data), not P(data if H 0 true)

Hypotheses H 0 : no cancer H a : cancer Data: positive mammography Using Bayes Rule: P(H a true if data) = P(cancer if data) = P(H 0 true if data) = P(no cancer | data) = This tells a very different story than a p-value of 0.096!

Frequentist Inference Frequentist Inference considers what would happen if the data collection process (sampling or experiment) was repeated many times Probability is considered to be the proportion of times an event would happen if repeated many times In frequentist inference, we condition on some unknown truth, and find the probability of our data given this unknown truth

Frequentist Inference Everything we have done so far in class is based on frequentist inference A confidence interval is created to capture the truth for a specified proportion of all samples A p-value is the proportion of times you would get results as extreme as those observed, if the null hypothesis were true

Bayesian Inference Bayesian inference does not think about repeated sampling or repeating the experiment, but only what you can tell from your single observed data set Probability is considered to be the subjective degree of belief in some statement In Bayesian inference we condition on the data, and find the probability of some unknown parameter, given the data

Fixed and Random In frequentist inference, the parameter is considered fixed and the sample statistic is random In Bayesian inference, the statistic is considered fixed, and the parameter is considered random

Bayesian Inference Frequentist: P(data if truth) Bayesian: P(truth if data) How are they connected?

Bayesian Inference PRIOR Probability POSTERIOR Probability Prior probability: probability of a statement being true, before looking at the data Posterior probability: probability of the statement being true, after updating the prior probability based on the data

Breast Cancer Before getting the positive result from her mammography, the prior probability that the woman has breast cancer is 1% Given data (the positive mammography), update this probability using Bayes rule: The posterior probability of her having breast cancer is

Paternity A woman is pregnant. However, she slept with two different guys (call them Al and Bob) close to the time of conception, and does not know who the father is. What is the prior probability that Al is the father? The baby is born with blue eyes. Al has brown eyes and Bob has blue eyes. Update based on this information to find the posterior probability that Al is the father.

Eye Color In reality eye color comes from several genes, and there are several possibilities but let’s simplify here: Brown is dominant, blue is recessive One gene comes from each parent BB, bB, Bb would all result in brown eyes Only bb results in blue eyes To make it a bit easier: You know that Al’s mother and the mother of the child both have blue eyes.

Paternity What is the probability that Al is the father? a)1/2 b)1/3 c)1/4 d)1/5 e)No idea

Paternity 1/2 Al must be Bb, so 1/2 P(blue eyes) = P(blue eyes and Al) + P(blue eyes and Bob) = P(blue eyes if Al) × P(Al) + P(blue eyes if Bob) × P(Bob) = 1/2 × 1/2 + 1 × 1/2 = 3/4

Bayesian Inference Why isn’t everyone a Bayesian? Need some “prior belief” for the probability of the truth Also, until recently, it was hard to be a Bayesian (needed complicated math.) Now, we can let computers do the work for us! ???

Inference Both kinds of inference have the same goal, and it is a goal fundamental to statistics: to use information from the data to gain information about the unknown truth

REVIEW

Data Collection The way the data are/were collected determines the scope of inference For generalizing to the population: was it a random sample? Was there sampling bias? For assessing causality: was it a randomized experiment? Collecting good data is crucial to making good inferences based on the data

Exploratory Data Analysis Before doing inference, always explore your data with descriptive statistics Always visualize your data! Visualize your variables and relationships between variables Calculate summary statistics for variables and relationships between variables – these will be key for later inference The type of visualization and summary statistics depends on whether the variable(s) are categorical or quantitative

Estimation For good estimation, provide not just a point estimate, but an interval estimate which takes into account the uncertainty of the statistic Confidence intervals are designed to capture the true parameter for a specified proportion of all samples A P% confidence interval can be created by bootstrapping (sampling with replacement from the sample) and using the middle P% of bootstrap statistics

Hypothesis Testing A p-value is the probability of getting a statistic as extreme as observed, if H 0 is true The p-value measures the strength of the evidence the data provide against H 0 “If the p-value is low, the H 0 must go” If the p-value is not low, then you can not reject H 0 and have an inconclusive test

p-value A p-value can be calculated by A randomization test: simulate statistics assuming H 0 is true, and see what proportion of simulated statistics are as extreme as that observed Calculating a test statistic and comparing that to a theoretical reference distribution (normal, t,  2, F)

Regression Regression is a way to predict one response variable with multiple explanatory variables Regression fits the coefficients of the model The model can be used to Analyze relationships between the explanatory variables and the response Predict Y based on the explanatory variables

What Next? If you are interested in learning more about REGRESSION AND MODELING: STAT 210 PROBABILITY: STAT 230 the MATHEMATICAL THEORY behind what we’ve learned: STAT 230, 250