(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)

Slides:



Advertisements
Similar presentations
Correlation and regression
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered.
N-way ANOVA. 3-way ANOVA 2 H 0 : The mean respiratory rate is the same for all species H 0 : The mean respiratory rate is the same for all temperatures.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
An Introduction to Logistic Regression
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Regression and Correlation
Chapter 8: Bivariate Regression and Correlation
Regression and Correlation Methods Judy Zhong Ph.D.
9 - 1 Intrinsically Linear Regression Chapter Introduction In Chapter 7 we discussed some deviations from the assumptions of the regression model.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
Chapter 16 Data Analysis: Testing for Associations.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Examining Relationships in Quantitative Research
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.
Multiple Regression David A. Kenny January 12, 2014.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 6 Regression: ‘Loose Ends’
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Logistic Regression When and why do we use logistic regression?
Regression Analysis AGEC 784.
Bivariate & Multivariate Regression Analysis
Basic Estimation Techniques
The Correlation Coefficient (r)
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Chapter 12: Regression Diagnostics
BIVARIATE REGRESSION AND CORRELATION
Basic Estimation Techniques
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Nonparametric Statistics
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
The Correlation Coefficient (r)
Presentation transcript:

(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)

The Shape of Things to Come… Rest of Module Week 8 Morning: Regression Afternoon: Logistic Regression Week 9 Morning: Published Multivariate Analyses Afternoon: Regression & Logistic Regression (Computing Session) Week 10 Morning: Log-linear Models Afternoon: Log-linear Models (Computing Session) ASSESSMENT D ASSESSMENT E

The Correlation Coefficient (r) Age at First Childbirth Age at First Cohabitation This shows the strength/closeness of a relationship r = 0.5 (or perhaps less…)

r = + 1 r = -1 r = 0

Correlation… and Regression r measures correlation in a linear way … and is connected to linear regression More precisely, it is r 2 (r-squared) that is of relevance It is the ‘variation explained’ by the regression line … and is sometimes referred to as the ‘coefficient of determination’

y x Mean The arrows show the overall variation (variation from the mean of y)

y x Mean Some of the overall variation is explained by the regression line (i.e. the arrows tend to be shorter than the dashed lines, because the regression line is closer to the points than the mean line is)

Length of Residence (y) Age (x) 0 C 1 B outlier ε y = Bx + C + ε Error term (Residual) ConstantSlope Regression line

Some variation is explained by the regression line The residuals constitute the unexplained variation The regression line is chosen so as to minimise the sum of the squared residuals i.e. to minimise Σε 2 (Σ means ‘sum of’) The full/specific name for this technique is Ordinary Least Squares (OLS) linear regression Choosing the line that best explains the data

Regression assumptions #1 and #2 0 ε Frequency #1: Residuals have the usual symmetric, ‘bell-shaped’ normal distribution #2: Residuals are independent of each other

y y x x Homoscedasticity Spread of residuals (ε) stays consistent in size (range) as x increases Homoscedasticity Spread of residuals (ε) increases as x increases (or varies in some other way) Use Weighted Least Squares Regression assumption #3

Regression assumption #4 Linearity! (We’ve already assumed this…) In the case of a non- linear relationship, one may be able to use a non-linear regression equation, such as: y = B 1 x + B 2 x 2 + c

Another problem: Multicollinearity If two ‘independent variables’, x and z, are perfectly correlated (i.e. identical), it is impossible to tell what the B values corresponding to each should be e.g. if y = 2x + c, and we add z, should we get: y = 1.0x + 1.0z + c, or y = 0.5x + 1.5z + c, or y = x z + c ? The problem applies if two variables are highly (but not perfectly) correlated too…

Example of Regression (from Pole and Lampard, 2002, Ch. 9) GHQ = (-0.69 x INCOME) Is significantly different from 0 (zero)? A test statistic that takes account of the ‘accuracy’ of the B of (by dividing it by its standard error) is t = For this value of t in this example, the significance value is p = < 0.05 r-squared here is (-0.321) 2 = = 10.3%

… and of Multiple Regression GHQ = (-0.47 x INCOME) + (-1.95 x HOUSING) For B = 0.47, t = (& p = > 0.05) For B = -1.95, t = (& p = < 0.05) The r-squared value for this regression is (23.6%)

Interaction effects… Square root of length of residence Age Women All Men In this situation there is an interaction between the effects of age and of gender, so B (the slope) varies according to gender and is greater for women

Logistic regression and odds ratios Men: 1967/294 = 6.69 (to 1) Women: 1980/511 = 3.87 (to 1) Odds ratio6.69/3.87 = 1.73 Men: p/(1-p) = 3.87 x 1.73 = 6.69 Women: p/(1-p) = 3.87 x 1 = 3.87

Odds and log odds Odds = Constant x Odds ratio Log odds = log(constant) + log(odds ratio)

Men log (p/(1-p)) = log(3.87) + log(1.73) Women log (p/(1-p)) = log(3.87) + log(1) = log(3.87) log (p/(1-p)) = constant + log(odds ratio)

Note that: log(3.87) = log(6.69) = log(1.73)= log(1)= 0 And that the ‘reverse’ of the logarithmic transformation is exponentiation

log (p/(1-p)) = constant + (B x SEX) where B = log(1.73) SEX = 1 for men SEX = 0 for women Log odds for men = = Log odds for women = = Exp(1.900) = 6.69 & Exp(1.354) = 3.87

Interpreting effects in Logistic Regression In the above example: Exp(B) = Exp(log(1.73)) = 1.73 (the odds ratio!) In general, effects in logistic regression analysis take the form of exponentiated B’s (Exp(B)), which are odds ratios. Odds ratios have a multiplicative effect on the (odds of) the outcome Is a B of (= log(1.73)) significant? In this case p = < 0.05 for this B.

Back from odds to probabilities Probability = Odds / (1 + Odds) Men: 6.69 / ( ) = Women:3.87 / ( ) = 0.795

‘Multiple’ Logistic regression log odds = c + (B 1 x SEX) + (B 2 x AGE) = c + (0.461 x SEX) + ( x AGE) For B 1 = 0.461, p = < 0.05 For B 2 = , p = < 0.05 Exp(B 1 ) = Exp(0.461) = 1.59 Exp(B 2 ) = Exp(-0.099) = 0.905