Class 8: Tues., Oct. 5 Causation, Lurking Variables in Regression (Ch. 2.4, 2.5) Inference for Simple Linear Regression (Ch. 10.1) Where we’re headed:

Slides:



Advertisements
Similar presentations
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Advertisements

Chapter 12 Inference for Linear Regression
Mean, Proportion, CLT Bootstrap
Confidence Intervals This chapter presents the beginning of inferential statistics. We introduce methods for estimating values of these important population.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Chapter 4: Designing Studies
Chapter 19 Confidence Intervals for Proportions.
Math 161 Spring 2008 What Is a Confidence Interval?
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Chapter 12 Simple Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 5 Outline: Thu, Sept 18 Announcement: No office hours on Tuesday, Sept. 23rd after class. Extra office hour: Tuesday, Sept. 23rd from 12-1 p.m.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 9: Thurs., Oct. 7 Inference in regression (Ch ) –Confidence intervals for slope –Hypothesis test for slope –Confidence intervals for mean.
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
AP STATISTICS LESSON INFERENCE FOR A POPULATION PROPORTION.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
CONFIDENCE STATEMENT MARGIN OF ERROR CONFIDENCE INTERVAL 1.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Linear Regression Hypothesis testing and Estimation.
Topic 10 - Linear Regression
Cautions About Correlation and Regression
CHAPTER 12 More About Regression
Week 10 Chapter 16. Confidence Intervals for Proportions
Chapter 2 Looking at Data— Relationships
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Review for Exam 2 Some important themes from Chapters 6-9
Simple Linear Regression
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Inferences for Regression
Presentation transcript:

Class 8: Tues., Oct. 5 Causation, Lurking Variables in Regression (Ch. 2.4, 2.5) Inference for Simple Linear Regression (Ch. 10.1) Where we’re headed: –This Week: Inference (Ch. 10) –Next Week: Transformations and Polynomial Regression (Ch. 2.6), Example Regression Analysis –Tue., Oct. 19: Review for Midterm I –Thu., Oct. 21: Midterm I –Fall Break!

Regression without Center City Philadelphia

The Question of Causation The community that ran this regression would like to increase property values. If low crime rates increase property values, the community might be able to cover the costs of increased police protection by gains in tax revenue from higher property values. The regression without Center City Philadelphia is Linear Fit HousePrice = CrimeRate The community concludes that if it can cut its crime rate from 30 down to 20 incidents per 1000 population, it will increase its average house price by $ *10=$22,887. Is the community’s conclusion justified?

Potential Outcomes Model Let Y i 30 denote what the house price for community i would be if its crime rate was 30 and all other aspects of community i were held fixed and let Y i 20 denote what the house price for community i would be if its crime rate was 20 and all other aspects of community I were held fixed. X (crime rate) causes a change in Y (house price) for community i if. A decrease in crime rate causes an increase in house price for community i if

Association is Not Causation A regression model tells us about how the mean of Y|X is associated with changes in X. A regression model does not tell us what would happen if we actually changed X. Possible Explanations for an Observed Association Between Y and X 1.X causes Y 2.Y causes X 3.There is a lurking variable Z that is associated with changes in both X and Y. Any combination of the three explanations may apply to an observed association.

Y Causes X Perhaps it is changes in house price that cause changes in crime rate. When house prices increase, the residents of a community have more to lose by engaging in criminal activities; this is called the economic theory of crime.

Lurking Variables Lurking variable for the causal relationship between X and Y: A variable Z that is associated with both X and Y. Example of lurking variable in Philadelphia crime rate data: Level of education. Level of education may be associated with both house prices and crime rate. The effect of crime rate on house price is confounded with the effect of education on house price. If we just look at data on house price and crime rate, we can’t distinguish between the effect of crime rate on house price and the effect of education on house price. Lurking variables are sometimes called confounding variables.

Weekly Wages (Y) and Education (X) in March 1988 CPS Will getting an extra year of education cause an increase of $50.41 on average in your weekly wage? What are some potential lurking variables?

Establishing Causation Best method is an experiment, but many times that is not ethically or practically possible (e.g., smoking and cancer, education and earnings).

Establishing Causation from an Observational Study Main strategy for learning about causation when we can’t do an experiment: Consider all lurking variables you can think of. Look at how Y is associated with X when the lurking variables are held “fixed.” We will study methods for doing this when we study multiple regression in Chapter 11.

Statistics and Smoking Doctors had long observed a strong association between smoking and death from lung cancer. But did smoking cause lung cancer? There were many possible lurking variables – smokers have worse diet, drink more alcohol, get less exercise than nonsmokers. The possibility that there was a genetic factor that predisposes people both to nicotine addiction and lung cancer was also raised. Statistical evidence from observational studies formed an essential part of the Surgeon General’s report in 1964 that declared that smoking causes lung cancer. How were objections to the association between lung cancer and smoking being entirely the result of observational studies overcome? This smoker said the findings “didn’t frighten him at all.”

Criteria for Establishing Causation Without an Experiment The association is strong. The association is consistent. Higher doses are associated with stronger responses. The alleged cause precedes the effect in time. The alleged cause is plausible.

Random Samples and Inference The Current Population Survey is a monthly sample survey of the labor force behavior of American households. The data in cpswages.JMP is the monthly wages and education for a random sample of 25,631 men from the March 1988 Current Population Survey. Suppose we take random subsamples of size 25 from this data: In JMP, we can take a random sample of data by clicking Tables, Subset, then click Random Sample and put the size of the sample you want in the box Sampling Rate or Sample Size. Then click OK and a new data table will be created that consists of a random sample of the rows in the original data.

Four Random Samples of Size 25 from cpswage.JMP

Least Squares Slopes in 1000 Random Samples of Size 25

Inference Based on Sample The whole Current Population Survey (25,631 men ages 18-70) is a random sample from the U.S. population (roughly 75 million men ages 18-70). In most regression analyses, the data, we have is a sample from some larger (hypothetical) population. We are interested in the true regression line for the larger population. Inference Questions: –How accurate is the least squares estimate of the slope for the true slope in the larger population? –What is a plausible range of values for the true slope in the larger population based on the sample? –Is it plausible that the slope equals a particular value (e.g., 0) based on the sample? Regression Applet: ch/webpage/teachingApplets/ciSLR/index.html

Model for Inference For inference, we assume the simple linear regression model is true. We should first check the assumptions using residual plots and also look for outliers and influential points before making inferences. Simple Linear Regression Model: Simple linear regression model: – – has a normal distribution with mean 0 and standard deviation (SD) –The subpopulation of Y with corresponding X=X i has a normal distribution with mean and SD – Technical note: For inference for simple linear regression, we assume we take repeated samples from the simple linear regression model with the X’s set equal to the X’s in the data,

Standard Error for the Slope True model: From the sample of size n, we estimate by the least squares estimate In repeated samples of size n with X’s set equal to, standard error is the “typical” absolute value of the error made in estimating by

Full Data Set Random Sample of Size 25

Confidence Intervals Confidence interval: A range of values that are plausible for a parameter given the data. 95% confidence interval: An interval that 95% of the time will contain the true parameter. Approximate 95% confidence interval: Estimate of parameter 2*SE(Estimate of parameter). Approximate 95% confidence interval for slope: For wage-education data,, approximate 95% CI = Interpretation of 95% confidence interval: It is most plausible that the true slope is in the 95% confidence interval. It is possible that the true slope is outside the 95% confidence interval but unlikely; the confidence interval will fail to contain the true slope only 5% of the time in repeated samples.

Conf. Intervals for Slope in JMP After Fit Line, right click in the parameter estimates table, go to Columns and click on Lower 95% and Upper 95%. The exact 95% confidence interval is close to but not equal to

Confidence Intervals and the Polls Margin of Error = 2*SE(Estimate). 95% CI for Bush-Kerry difference: 95% CI for difference between Bush and Kerry’s proportions:

Why Do the Polls Sometimes Disagree So Much?

Assumptions for Validity of Confidence Interval The margin of error in a confidence interval covers only random sampling errors according to the assumed random sampling model; the confidence interval’s “95% guarantee” assumes the model is correct. In presidential polls, it must be determined who is “likely to vote.” Different polls use different models for determining who is likely to vote. The margin of error in the confidence interval assumes that the poll’s model for who is likely to vote is correct. For simple linear regression, the confidence interval for the slope assumes the simple linear regression model is correct; if the simple linear regression model is not correct, the confidence interval’s “95% guarantee” that 95% of the time, it will contain the true slope, is not valid. Always check the assumptions of the simple linear regression model before doing inference.