DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.

Slides:



Advertisements
Similar presentations
Lecture 8 Relationships between Scale variables: Regression Analysis
Advertisements

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Regression and Correlation
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
An Introduction to Logistic Regression
Ch. 14: The Multiple Regression Model building
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Multiple Regression Research Methods and Statistics.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Introduction to Linear Regression and Correlation Analysis
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Examining Relationships in Quantitative Research
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
ANOVA, Regression and Multiple Regression March
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Regression David A. Kenny January 12, 2014.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Independent Variables POLS 300 Butz. Multivariate Analysis Problem with bivariate analysis in nonexperimental designs: –Spuriousness and Causality.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 6 Regression: ‘Loose Ends’
Chapter 13 Simple Linear Regression
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 14 Introduction to Multiple Regression
Regression Analysis AGEC 784.
Modeling in R Sanna Härkönen.
Bivariate & Multivariate Regression Analysis
The Correlation Coefficient (r)
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Multiple Regression.
Chapter 12: Regression Diagnostics
BIVARIATE REGRESSION AND CORRELATION
CHAPTER 29: Multiple Regression*
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Prepared by Lee Revere and John Large
Multiple Regression Chapter 14.
The Correlation Coefficient (r)
Presentation transcript:

DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014

The Correlation Coefficient (r) Age at First Childbirth Age at First Cohabitation This shows the strength/closeness of a relationship r = 0.5 (or perhaps less…)

r = + 1 r = -1 r = 0

Correlation… and Regression r measures correlation in a linear way … and is connected to linear regression More precisely, it is r 2 (r-squared) that is of relevance It is the ‘variation explained’ by the regression line … and is sometimes referred to as the ‘coefficient of determination’

y x Mean The arrows show the overall variation (variation from the mean of y)

y x Mean Some of the overall variation is explained by the regression line (i.e. the arrows tend to be shorter than the dashed lines, because the regression line is closer to the points than the mean line is)

Group Value Analysis of Variance (ANOVA): Decomposing variance into Between-Groups and Within-Group 1 23 Overall mean Line and arrow are differences from group mean and overall mean As an aside, it is worth noting that the logic of Analysis of Variance is in essence the same, the difference being that the group means, rather than a regression line, are used to ‘explain’ some of the variation (‘Between-groups’ variation).

Length of Residence (y) Age (x) 0 C 1 B Outlier ε y = Bx + C + ε Error term (Residual) ConstantSlope Regression line (+ ε)

Some variation is explained by the regression line The residuals constitute the unexplained variation The regression line is chosen so as to minimise the sum of the squared residuals i.e. to minimise Σε 2 (Σ means ‘sum of’) The full/specific name for this technique is Ordinary Least Squares (OLS) linear regression Choosing the line that best explains the data

Regression assumptions #1 and #2 0 ε Frequency #1: Residuals have the usual symmetric, ‘bell-shaped’ normal distribution #2: Residuals are independent of each other

y y x x Homoscedasticity Spread of residuals (ε) stays consistent in size (range) as x increases Heteroscedasticity Spread of residuals (ε) increases as x increases (or varies in some other way) Use Weighted Least Squares Regression assumption #3

Regression assumption #4 Linearity! (We’ve already assumed this…) In the case of a non- linear relationship, one may be able to use a non-linear regression equation, such as: y = B 1 x + B 2 x 2 + c

Another problem: Multicollinearity If two ‘independent variables’, x and z, are perfectly correlated (i.e. identical), it is impossible to tell what the B values corresponding to each should be e.g. if y = 2x + c, and we add z, should we get: y = 1.0x + 1.0z + c, or y = 0.5x + 1.5z + c, or y = x z + c ? The problem applies if two variables are highly (but not perfectly) correlated too…

Example of Regression (from Pole and Lampard, 2002, Ch. 9) GHQ = (-0.69 x INCOME) Is significantly different from 0 (zero)? A test statistic that takes account of the ‘accuracy’ of the B of (by dividing it by its standard error) is t = For this value of t in this example, the significance value is p = < 0.05 r-squared here is (-0.321) 2 = = 10.3%

The impact of an independent variable on a dependent variable is B But how and why does the value of B change when we introduce another independent variable? If the effects of the two independent are inter-related in the sense that they interact (i.e. the effect of one depends on the value of the other), how does B vary? B’s and Multivariate Regression Analysis

Multiple Regression GHQ = (-0.47 x INCOME) + (-1.95 x HOUSING) For B = 0.47, t = (& p = > 0.05) For B = -1.95, t = (& p = < 0.05) The r-squared value for this regression is (23.6%)

Interaction effects… Length of residence Age Women All Men In this situation there is an interaction between the effects of age and of gender, so B (the slope) varies according to gender and is greater for women

Dummy variables Categorical variables can be included in regression analyses via the use of one or more dummy variables (two-category variables with values of 0 and 1). In the case of a comparison of men and women, a dummy variable could compare men (coded 1) with women (coded 0). In general, a variable with n categories can be represented using (n-1) dummy variables.

Creating a variable to check for an interaction effect We may want to see whether an effect varies according to the level of another variable. Multiplying the values of two independent variables together, and including this third variable alongside the other two allows us to do this.

Interaction effects (continued) Length of residence Age Women All Men Slope of line for women = B AGE Slope of line for men = B AGE +B AGESEXD SEXDUMMY = 1 for men & 0 for women AGESEXD = AGE x SEXDUMMY For men AGESEXD = AGE & For women AGESEXD = 0