Mitchell Hoffman UC Berkeley. Statistics: Making inferences about populations (infinitely large) using finitely large data. Crucial for Addressing Causal.

Slides:



Advertisements
Similar presentations
Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.
Advertisements

Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Objectives (BPS chapter 24)
MF-852 Financial Econometrics
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
The Simple Regression Model
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Chapter 9 Hypothesis Testing.
Lecture 6. Hypothesis tests for the population mean  Similar arguments to those used to develop the idea of a confidence interval allow us to test the.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
June 19, 2008Stat Lecture 12 - Testing 21 Introduction to Inference More on Hypothesis Tests Statistics Lecture 12.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Hypothesis Testing.
Lecture 14 Testing a Hypothesis about Two Independent Means.
STT 315 This lecture is based on Chapter 6. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
More About Significance Tests
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
RDPStatistical Methods in Scientific Research - Lecture 11 Lecture 1 Interpretation of data 1.1 A study in anorexia nervosa 1.2 Testing the difference.
Inferential Statistics 2 Maarten Buis January 11, 2006.
User Study Evaluation Human-Computer Interaction.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Four Ending Wednesday, September 19 (Assignment 4 which is included in this study guide.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
How confident are we in the estimation of mean/proportion we have calculated?
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population.
Foundations of Sociological Inquiry Statistical Analysis.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
CHAPTER 9 Testing a Claim
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Copyright © 2011 Pearson Education, Inc. Putting Statistics to Work.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Statistical Techniques
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Testing a Single Mean Module 16. Tests of Significance Confidence intervals are used to estimate a population parameter. Tests of Significance or Hypothesis.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Hypothesis Testing and Statistical Significance
Review Statistical inference and test of significance.
More about tests and intervals CHAPTER 21. Do not state your claim as the null hypothesis, instead make what you’re trying to prove the alternative. The.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Statistical Inference
Unit 5: Hypothesis Testing
Statistical Inference
Inferential Statistics
Type I and Type II Errors
Statistical inference
Presentation transcript:

Mitchell Hoffman UC Berkeley

Statistics: Making inferences about populations (infinitely large) using finitely large data. Crucial for Addressing Causal Questions, e.g. : - Does smoking cause cancer? - Does pay for performance (P4P) lead to improved healthcare outcomes?

 Are children who get vaccinations more likely to graduate primary than children who don’t?  Suppose you hear in the news: “The graduation rate for children with vaccinations was 75% compared to only 25% for children without.” - What do we learn? 75% is much much larger than 25%. - What if the study had 4 kids in each group, with 3 of 4 vaccinated kids and 1 of 4 unvaccinated kids graduating from school? - What if the study had 4,000 kids in each group?

 What can we learn from a given observed set of outcomes of a given sample size?  How confident can we be of inference based on 4 kids in each group?  How confident can we be of inference based on 4,000 kids in each group?  Confidence matters for evaluating whether a policy is working or not.

 Suppose have data with N observations.  Mean (or average) = (1/N)*(Sum of all Data)  Median: Middle value.  Mode: Most common value.  Minimum: Lowest value.  Maximum: Highest value.

 Suppose we had data on the number of vaccinations provided to women from 5 different clinics: 23, 23, 30, 20, and 14.  Mean=(1/5)*( )=22  Median = 23 (middle value)  Mode = 23  Minimum =14  Maximum = 30

 In all hypothesis testing, we test an “alternative hypothesis” against a “null hypothesis.” In most real-world settings, the null hypothesis is one of no effect.

 Let p be the probability that a mother attends all her recommended neonatal visits.  Null Hypothesis: Probability of attendance is the same for the Pay-For-Performance and the Control clinics.  Alternative Hypothesis: Probability of attendance is different between the Pay-For- Performance and the Control clinics.

 Null Hypothesis : Probability of attendance is the same for the Pay-For-Performance and the Control clinics.  Alternative Hypothesis : Probability of attendance is higher for the Pay-For- Performance than for the Control clinics.

 Basic idea is that use one-sided only if have really strong idea about whether the effect will be positive or negative.  In practice, almost always use two-sided tests (this is what Stata gives as default). Two- sided tests make rejection harder.

 If you’re sample isn’t the entire population, then your estimates will vary based on what sample you draw.  Suppose want to know whether P4P clinics are providing more services than control clinics. The answer you will get to this question will depend on the number of clinics you sample.

 A p value = The probability that, assuming the null hypothesis were true, that one would get an estimate as or more extreme.  Low p values means that it is less likely the null hypothesis is true.

 A result is usually called statistically significant if it is unlikely to be due to chance alone. ◦ By chance alone, it is possible that P4P clinics will do better than control. However,  Most common standard for whether something is statistically significant is p<0.05.

 Just because you get statistical significance does not mean something is really significant for the real world. Mostly an issue when have lots and lots of data.  Suppose in a hypothetical example that P4P increased women receiving vaccines from 46 to 47 percent. With enough data, this will be statistically significant. However, this is a small change, and may be said to lack real- world significance.

 Regression attempts to find the line that best fits the data. It offers the simplest and most common way to do hypothesis testing.

 A small number of “outliers” or unusual points can have a large effect on a regression. - Regression tries to find the line that best fits the data. One point way off the line can really shift things.  Good idea to make a picture and do regression excluding the outliers.

 Frequently, you will see equations like y = a + b*x + u Examples for pay for performance: y = Share of Mothers Receiving All Prenatal Visits. x = Whether the person was in treatment or control.

 Suppose y is a 0 or 1 variable. Instead of share of women who get all prenatal vaccinations, suppose we wanted to model whether each woman got all the vaccinations.  For this, it is best to use a logit or probit regression instead. ◦ > Very easy to implement in Stata.

 y = a + b*x + u Null Hypothesis: P4P has no effect on share of women getting vaccinations (b equals zero). Alternative Hypothesis: P4P affects the share of women getting vaccinations (b doesn’t equal zero).

 A t-statistic represents the sampling distribution for a regression under the null hypothesis.  We use t-statistics to get from regression coefficients to p-values. ◦ Stata does this for us.

 Even if a result is highly statistically significant, not necessarily the case that you have demonstrated true causation.

 Regression seems to suggest  More likely to be High FatCancer High Income High Fat Cancer

 Help separate correlation and causation.