Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Objectives (BPS chapter 24)
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Department of Applied Economics National Chung Hsing University
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Lecture 5 Correlation and Regression
Chapter 8: Bivariate Regression and Correlation
Lecture 16 Correlation and Coefficient of Correlation
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Hypothesis Testing in Linear Regression Analysis
Linear Regression Inference
Regression Analysis (2)
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
CORRELATION & REGRESSION
Correlation.
Inferences for Regression
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
CHAPTER 14 MULTIPLE REGRESSION
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Sociology 5811: Lecture 14: ANOVA 2
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Multiple Regression 5 Sociology 5811 Lecture 26 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Sociology 5811: Lecture 11: T-Tests for Difference in Means Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Simple linear regression Tron Anders Moger
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
ANOVA, Regression and Multiple Regression March
Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Regression 1 Sociology 8811 Copyright © 2007 by Evan Schofer
CHAPTER 29: Multiple Regression*
Presentation transcript:

Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements Proposals Due Today

Review: Regression Regression coefficient formulas: Question: What is the interpretation of a regression slope? Answer: It indicates the typical increase in Y for any 1-point increase along the X-variable –Note: this information is less useful if the linear association between X and Y is low

Example: Education & Job Prestige The actual SPSS regression results for that data: Estimates of a and b: “Constant” = a = Slope for “Year of School” = b = Equation: Prestige = Education A year of education adds 2.5 points job prestige

Review: Covariance Covariance (s YX ): Sum of deviation about Y-bar multiplied by deviation around X-bar: Measures whether deviation (from mean) in X tends is accompanied by similar deviation in Y –Or if cases with positive deviation in X have negative deviation in Y –This is summed up for all cases in the data

Review: Covariance Covariance: based on multiplying deviation in X and Y Y-bar =.5 X-bar = -1 This point deviates a lot from both means (3)(2.5) = 7.5 dev = 2.5 dev = 3 This point deviates very little from X-bar, Y-bar (.4)(-.25) =-.01

Review: Covariance and Slope The slope formula can be written out as follows:

Review: R-Square The R-Square statistic indicates how well the regression line “explains” variation in Y It is based on partitioning variance into: 1. Explained (“regression”) variance –The portion of deviation from Y-bar accounted for by the regression line 2. Unexplained (“error”) variance –The portion of deviation from Y-bar that is “error” Formula:

Review: R-Square Visually: Deviation is partitioned into two parts Y-bar “Explained Variance” Y=2+.5X “Error Variance”

Correlation Coefficient (r) The R-square is very similar to another important statistic: the correlation coefficient (r) –R-square is literally the square of r Formula for correlation coefficient: r is a measure of linear association Ranges from –1 to 1 Zero indicates no linear association 1 = perfect positive linear association -1 = perfect negative linear association

Correlation Coefficient (r) Example: Education and Job Prestige SPSS can calculate the correlation coefficient –Usually listed in a matrix to allow many comparisons Correlation of “Year of School” and Job Prestige: r =.521

Covariance, R-square, r, and b Covariance, R-square, r, and b are all similar –All provide information about the relationship between X and Y Differences: Covariance, b, and r can be positive or negative –r is scaled from –1 to +1, others range widely b tells you the actual slope –It relates change in X to change in Y in real units R-square is like r, but is never negative –And, it tells you “explained” variance of a regression

Correlation Hypothesis Tests Hypothesis tests can be done on r, R-square, b Example: Correlation (r): linear association Is observed positive or negative correlation significantly different from zero? –Might the population have no linear association? –Population correlation denoted by greek “r”, rho (  ) H0: There is no linear association (  = 0) H1: There is linear association (   0) We’ll mainly focus on tests regarding slopes But the process is similar for correlation (r)

Correlation Coefficient (r) Education and Job Prestige hypothesis test: Here, asterisks signify that coefficients are significantly different from zero,  =.01 “Sig.” is a p-value: The probability of observing r if  = 0. Compare it to  !

Hypothesis Tests: Slopes Given: Observed slope relating Education to Job Prestige = 2.47 Question: Can we generalize this to the population of all Americans? –How likely is it that this observed slope was actually drawn from a population with slope = 0? Solution: Conduct a hypothesis test Notation: slope = b, population slope =  H0: Population slope  = 0 H1: Population slope   0 (two-tailed test)

Example: Slope Hypothesis Test The actual SPSS regression results for that data: t-value and “sig” (p- value) are for hypothesis tests about the slope Reject H0 if: T-value > critical t (N-2 df) Or, “sig.” (p-value) less than 

Hypothesis Tests: Slopes What information lets us to do a hypothesis test? Answer: Estimates of a slope (b) have a sampling distribution, like any other statistic –It is the distribution of every value of the slope, based on all possible samples (of size N) If certain assumptions are met, the sampling distribution approximates the t-distribution –Thus, we can assess the probability that a given value of b would be observed, if  = 0 –If probability is low – below alpha – we reject H0

0 Sampling distribution of the slope Hypothesis Tests: Slopes Visually: If the population slope (  ) is zero, then the sampling distribution would center at zero –Since the sampling distribution is a probability distribution, we can identify the likely values of b if the population slope is zero If  =0, observed slopes should commonly fall near zero, too b If observed slope falls very far from 0, it is improbable that  is really equal to zero. Thus, we can reject H0.

Bivariate Regression Assumptions Assumptions for bivariate regression hypothesis tests: 1. Random sample –Ideally N > 20 –But different rules of thumb exist. (10, 30, etc.) 2. Variables are linearly related –i.e., the mean of Y increases linearly with X –Check scatter plot for general linear trend –Watch out for non-linear relationships (e.g., U- shaped)

Bivariate Regression Assumptions 3. Y is normally distributed for every outcome of X in the population –“Conditional normality” Ex: Years of Education = X, Job Prestige (Y) Suppose we look only at a sub-sample: X = 12 years of education –Is a histogram of Job Prestige approximately normal? –What about for people with X = 4? X = 16 If all are roughly normal, the assumption is met

Bivariate Regression Assumptions Normality: Examine sub-samples at different values of X. Make histograms and check for normality. Good Not very good

Bivariate Regression Assumptions 4. The variances of prediction errors are identical at every value of X –Recall: Error is the deviation from the regression line –Is dispersion of error consistent across values of X? –Definition: “homoskedasticity” = error dispersion is consistent across values of X –Opposite: “heteroskedasticity”, errors vary with X Test: Compare errors for X=12 years of education with errors for X=2, X=8, etc. –Are the errors around line similar? Or different?

Bivariate Regression Assumptions Homoskedasticity: Equal Error Variance Examine error at different values of X. Is it roughly equal? Here, things look pretty good.

Bivariate Regression Assumptions Heteroskedasticity: Unequal Error Variance At higher values of X, error variance increases a lot. This looks pretty bad.

Bivariate Regression Assumptions Notes/Comments: 1. Overall, regression is robust to violations of assumptions –It often gives fairly reasonable results, even when assumptions aren’t perfectly met 2. Variations of OLS regression can handle situations where assumptions aren’t met 3. But, there are also further diagnostics to help ensure that results are meaningful… –We’ll discuss them next week.

Regression Hypothesis Tests If assumptions are met, the sampling distribution of the slope (b) approximates a T-distribution Standard deviation of the sampling distribution is called the standard error of the slope (  b ) Population formula of standard error: Where  e 2 is the variance of the regression error

Regression Hypothesis Tests Estimating  e 2 lets us estimate the standard error: Now we can estimate the S.E. of the slope:

Regression Hypothesis Tests Finally: A t-value can be calculated: –It is the slope divided by the standard error Where s b is the sample point estimate of the standard error The t-value is based on N-2 degrees of freedom

Example: Education & Job Prestige T-values can be compared to critical t... SPSS estimates the standard error of the slope. This is used to calculate a t-value The t-value can be compared to the “critical value” to test hypotheses. Or, just compare “Sig.” to alpha. If t > crit or Sig < alpha, reject H0

Regression Confidence Intervals You can also use the standard error of the slope to estimate confidence intervals: Where t N-2 is the t-value for a two-tailed test given a desired  -level Example: Observed slope = 2.5, S.E. =.10 95% t-value for 102 d.f. is approximately 2 95% C.I. = 2.5 +/- 2(.10) Confidence Interval: 2.3 to 2.7

Regression Hypothesis Tests You can also use a T-test to determine if the constant (a) is significantly different from zero –But, this is typically less useful to do Hypotheses (  = population parameter of a): H0:  = 0, H1:   0 But, most research focuses on slopes

Regression: Outliers Note: Even if regression assumptions are met, slope estimates can have problems Example: Outliers -- cases with extreme values that differ greatly from the rest of your sample Outliers can result from: –Errors in coding or data entry –Highly unusual cases –Or, sometimes they reflect important “real” variation Even a few outliers can dramatically change estimates of the slope (b)

Regression: Outliers Outlier Example: Extreme case that pulls regression line up Regression line with extreme case removed from sample

Regression: Outliers Strategy for dealing with outliers: 1. Identify them Look at scatterplots for extreme values Or, ask SPSS to compute outlier diagnostic statistics –There are several statistics to identify cases that are affecting the regression slope a lot –Examples: “Leverage”, Cook’s D, DFBETA –SPSS can even identify “problematic” cases for you… but it is preferable to do it yourself.

Regression: Outliers 2. Depending on the circumstances, either: A) Drop cases from sample and re-do regression –Especially for coding errors, very extreme outliers –Or if there is a theoretical reason to drop cases –Example: In analysis of economic activity, communist countries differ a lot… B) Or, sometimes it is reasonable to leave outliers in the analysis –e.g., if there are several that represent an important minority group in your data When writing papers, identify if outliers were excluded (and the effect that had on the analysis).