Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
EPI 809/Spring Probability Distribution of Random Error.
Heteroskedasticity The Problem:
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Sociology 601 Class 17: October 28, 2009 Review (linear regression) –new terms and concepts –assumptions –reading regression computer outputs Correlation.
Objectives (BPS chapter 24)
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Multiple Regression [ Cross-Sectional Data ]
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Chapter Topics Types of Regression Models
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
REGRESSION AND CORRELATION
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
SIMPLE LINEAR REGRESSION
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Linear Regression/Correlation
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Correlation & Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
Returning to Consumption
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Simple Linear Regression ANOVA for regression (10.2)
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 12 Simple Linear Regression.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
Inference for Least Squares Lines
PENGOLAHAN DAN PENYAJIAN
SIMPLE LINEAR REGRESSION
Inference for Regression
Presentation transcript:

Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model Assumptions, and their effects (9.6) 1

9.5 Inference for a slope. Problem: we have measures for the strength of association between two linear variables, but no measures for the statistical significance of that association. We know the slope & intercept for our sample; what can we say about the slope & intercept for the population? Solution: hypothesis tests for a slope and confidence intervals for a slope. Need a standard error for the coefficients Difficulties: additional assumptions, complications with estimating a standard error for a slope. 2

Assumptions Needed to make Population Inferences for slopes. The sample is selected randomly. X and Y are interval scale variables. The mean of Y is related to X by the linear equation E{Y} =  +  X. The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) The conditional distribution of Y at each value of X is normal. There is no error in the measurement of X. 3

Common Ways to Violate These Assumptions The sample is selected randomly. o Cluster sampling (e.g., census tracts / neighborhoods) causes observations in any cluster to be more similar than to observations outside the cluster. o Two or more siblings in the same family. o Sample = populations (e.g., states in the U.S.) X and Y are interval scale variables. o Ordinal scale attitude measures o Nominal scale categories (e.g., race/ethnicity, religion) 4

Common Ways to Violate These Assumptions (2) The mean of Y is related to X by the linear equation E{Y} =  +  X. o U-shape: e.g., Kuznets inverted-U curve (inequality <- GDP/capita) o Thresholds: o Logarithmic (e.g., earnings <- education) The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) o earnings <- education o hours worked <- time o adult child occupational status <- parental occupational status 5

Common Ways to Violate These Assumptions (3) The conditional distribution of Y at each value of X is normal. o earnings (skewed) <- education o Y is binary, or a % There is no error in the measurement of X. o almost everything o what is the effect of measurement error in x on b? 6

The Null hypothesis for slopes Null hypothesis: the variables are statistically independent. H o :  = 0. The null hypothesis is that there is no linear relationship between X and Y. Implication for  : E{Y} =  + 0*X =  ;  = . (Draw figure of distribution of Y, X when H o is true) 7

Test Statistic for slopes What is the range of b’s we would get if we take repeated samples from a population and calculate b for each of those samples? That is, what is the standard error of the sample slope b’s? Test statistic: t = b /  hat b o where  hat b is the standard error of the sample slope b. o df for the t statistic (with one x – variable) is n-2 o when n is large, the t statistic is asymptotically equivalent to a z-statistic What would make  hat b smaller? 8

Calculating the s.e. of b  hat b =  hat / (s X *sqrt(n-1)) where  hat = sqrt(SSE/n-2)(= root MSE) the standard error of b is smaller when… o the sample size is large o the standard deviation of X is large (there is a wide range of X values) o the conditional standard deviation of Y is small. 9

Conclusions about Population P-value: calculated as in any t-test, but remember df = n-2 a z-test is appropriate when n > 30 or so Conclusions: evaluate p-value based o n a previously selected alpha level Rule of thumb: b should be at least 2x standard error. 10

Example of Inference about a Slope In an analysis of poverty and crime in the 50 states plus DC, a computer output provides the following: E{Murder rate} = *{Poverty rate} (Poverty rate in %, murder rate per 100,000) SSE = SST = N = 51S x = Do a hypothesis test to determine whether there is a linear relationship between crime rates and poverty rates. 11

Stata Example of Inference about a Slope In an analysis of poverty and crime in the 50 states plus DC, stata computer output provides the following: regress murder poverty Source | SS df MS Number of obs = F( 1, 49) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] poverty | _cons | Interpret whether there is a linear relationship between crime rates and poverty rates. 12

Example of Inference about a Slope SSE = SST = N = 51S x = 4.58 b=

Example of Inference about a Slope SSE = SST = N = 51S x = 4.58 b= se b = sqrt (SSE / (n-2) ) / (s x * sqrt(n-1)) = sqrt (3904.3/49) / ( 4.585*sqrt(50) ) = sqrt (79.68) / (4.585 * 7.071) = / = t = b / se b = / = 4.81 p < % confidence interval for b = to

Confidence interval for a slope. Confidence interval for a slope: c.i. = b ± t*  hat b the standard t-score for a 95% confidence interval is t.025, with df = n-2 An alternative to a confidence interval is to report both b and  hat b. 15

Example of Confidence Interval of a Slope SSE = SST = N = 51S x = 4.58 b = se b = % confidence interval for b= *0.275 = = to

Inference for a slope using STATA. regress attend regul Source | SS df MS Number of obs = F( 1, 16) = 9.65 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = attend | Coef. Std. Err. t P>|t| [95% Conf. Interval] regul | _cons | The significance test and confidence interval for b appear on the line with the name of the x-variable. Can you find SSE and SST? df for the model? r? 17

Inferences for correlation. Inferences for a Pearson correlation The t-score for r is the same as the t-score for b. We don’t focus on inferences for correlation in this class. 18

Things to watch out for: extrapolation. Extrapolation beyond observed values of X is dangerous. The pattern may be nonlinear. Even if the pattern is linear, the standard errors become increasingly wide. Be especially careful interpreting the Y-intercept: it may lie outside the observed data. o e.g., year zero o e.g., zero education in the U.S. o e.g., zero parity 19

Things to watch out for: outliers Influential observations and outliers may unduly influence the fit of the model. The slope and standard error of the slope may be affected by influential observations. This is an inherent weakness of least squares regression. You may wish to evaluate two models; one with and one without the influential observations. 20

Things to watch out for: outlier example Example: discussion between Kahn and Udry 1986 (American Sociological Review 51(5): ) and Jasso 1986 (ASR 51(5): ). Topic: time and age trends in marital coital frequency Issues: outliers, sample truncation, nonlinear effects. 21

Things to watch out for: truncated samples Truncated samples cause the opposite problems of influential observations and outliers. Truncation on the X axis reduces the correlation coefficient for the remaining data. Truncation on the Y axis is a worse problem, because it violates the assumption of normally distributed errors. Examples: Topcoded income data, health as measured by number of days spent in a hospital in a year. 22

Things to watch out for: measurement error Error in measurement of the X variable creates a bias that makes the correlation appear weaker. This problem can be a measurement issue or an interpretation issue. 23