Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Correlation and Regression By Walden University Statsupport Team March 2011.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Correlation and Regression
Quantitative Techniques
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
SIMPLE LINEAR REGRESSION
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Measures of Association Deepak Khazanchi Chapter 18.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Regression and Correlation
Correlation & Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Simple Linear Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Causality and confounding variables Scientists aspire to measure cause and effect Correlation does not imply causality. Hume: contiguity + order (cause.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
SIMPLE LINEAR REGRESSION AND CORRELLATION
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
CORRELATION ANALYSIS.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Statistics Correlation and regression. 2 Introduction Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment.
Chapter 13 Simple Linear Regression
Regression Analysis AGEC 784.
Bivariate & Multivariate Regression Analysis
Statistics for Managers using Microsoft Excel 3rd Edition
Linear Regression and Correlation Analysis
Understanding Standards Event Higher Statistics Award
Correlation and Regression
Stats Club Marnie Brennan
Simple Linear Regression
Presentation transcript:

Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

CONTENTS Correlation coefficients meaning values role significance Regression line of best fit prediction significance 2

INTRODUCTION Correlation the strength of the linear relationship between two variables Regression analysis determines the nature of the relationship Is there a relationship between the number of units of alcohol consumed and the likelihood of developing cirrhosis of the liver? 3

PEARSON’S COEFFICIENT OF CORRELATION (r) Measures the strength of the linear relationship between one dependent and one independent variable curvilinear relationships need other techniques Values lie between +1 and -1 perfect positive correlation r = +1 perfect negative correlation r = -1 no linear relationship r = 0 4

PEARSON’S COEFFICIENT OF CORRELATION 5 r = +1 r = -1 r = 0.6 r = 0

SCATTER PLOT 6 dependent variable make inferences about independent variable Calcium intake BMD

NON-NORMAL DATA 7

NORMALISED 8

SPSS OUTPUT: SCATTER PLOT 9

SPSS OUTPUT: CORRELATIONS 10

11 Interpreting correlation Large r does not necessarily imply: strong correlation r increases with sample size cause and effect strong correlation between the number of televisions sold and the number of cases of paranoid schizophrenia watching TV causes paranoid schizophrenia may be due to indirect relationship

12 Interpreting correlation Variation in dependent variable due to: relationship with independent variable: r 2 random factors: 1 - r 2 r 2 is the Coefficient of Determination or Variation explained e.g. r = r 2 = = 0.44 less than half of the variation (44%) in the dependent variable due to independent variable

13

14 Agreement Correlation should never be used to determine the level of agreement between repeated measures: measuring devices users techniques It measures the degree of linear relationship You can have high correlation with poor agreement

15 Non-parametric correlation Make no assumptions Carried out on ranks Spearman’s  easy to calculate Kendall’s  has some advantages over  distribution has better statistical properties easier to identify concordant / discordant pairs Usually both lead to same conclusions

16 Role of regression Shows how one variable changes with another By determining the line of best fit linear curvilinear

17 Line of best fit Simplest case linear Line of best fit between: dependent variable Y BMD independent variable X dietary intake of Calcium value of Y when X=0 Y = a + bX change in Y when X increases by 1

18 Role of regression Used to predict the value of the dependent variable when value of independent variable(s) known within the range of the known data extrapolation risky! relation between age and bone age Does not imply causality

SPSS OUTPUT: REGRESSION 19

20 Multiple regression More than one independent variable BMD dependent on: age gender calorific intake Use of bisphosphonates Exercise etc

21 Summary Correlation strength of linear relationship between two variables Pearson’s - parametric Spearman’s / Kendall’s non-parametric Interpret with care! Regression line of best fit prediction Multiple regression logistic

Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

Objectives of session Recognise the need to check fit of the model Recognise the need to check fit of the model Carry out checks of assumptions in SPSS for simple linear regression Carry out checks of assumptions in SPSS for simple linear regression Understand predictive model Understand predictive model Understand residuals Understand residuals

How is the fitted line obtained? Use method of least squares (LS) Seek to minimise squared vertical differences between each point and fitted line Results in parameter estimates or regression coefficients of slope (b) and intercept (a) – y=a+bx

Consider Fitted line of y = a +bx Explanatory (x) Dependent (y) a

Consider the regression of age on minimum LDL cholesterol achieved Select Regression Select Regression Linear…. Linear…. Dependent (y) – Min LDL achieved Dependent (y) – Min LDL achieved Independent (x) - Age_Base Independent (x) - Age_Base

N.B may look very small but represents: The DECREASE in LDL achieved for each increase in one unit of age i.e. ONE year Output from SPSS linear regression Coefficients a ModelUnstandardized CoefficientsStandardized Coefficients BStd. ErrorBetatsig 1(Constant) Age at baseline a. Dependent Variable: Min LDL achieved

H 0 : slope b = 0 Test t = slope/se = /0.002 = with p<0.001, so statistically significant Predicted LDL = xAge Output from SPSS linear regression Coefficients a ModelUnstandardized CoefficientsStandardized Coefficients BStd. ErrorBetatsig 1(Constant) Age at baseline a. Dependent Variable: Min LDL achieved

Predicted LDL achieved = xAge So for a man aged 65 the predicted LDL achieved = – 0.008x 65 = Prediction Equation from linear regression AgePredicted Min LDL

Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed

Use Graphs and Scatterplot to obtain the Lowess line of fit

1.Create Scatterplot and then double-click to enter chart editor 2.Chose Icon ‘Add fit line at total’ 3.Then select type of fit such as Lowess

Linear assumption: Fitted lowess smoothed line Lowess smoothed line (red) gives a good eyeball examination of linear assumption (green)

Definition of a residual A residual is the difference between the predicted value (fitted line) and the actual value or unexplained variation r i = y i – E ( y i ) Or r i = y i – ( a + bx )

Residuals

To assess the residuals in SPSS linear regression, select plots….. Normalised or standardised predicted value of LDL Normalised residual Select histogram of residuals and normal probability plot

In SPSS linear regression, select Statistics….. Select confidence intervals for regression coefficients Model fit Select Durbin- Watson for serial correlation and identification of outliers

Output: Scatterplot of residuals vs. predicted Note 1)Mean of residuals = 0 2)Most of data lie within + or -3 SDs of mean

Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed

Plot of residuals with normal curve super- imposed Output: Histogram of standardised residuals

Output: Cumulative probability plot Look for deviation from diagonal line to indicate non- normality

Output: Description of residuals Subjects with standardised residuals > 3 Descriptive statistics for residuals Worth investigation? Casewise Diagnostics(a) Case NumberStd. ResidualMin LDL Predicted Residual a. Dependent Variable: Min LDL achieved

R – correlation between min LDL achieved and Age at baseline, here R 2 - % variation explained, here 1.5%, not particularly high Durbin-Watson test - serial correlation of residuals should be approximately 2 if no serial correlation Output: Model fit and serial correlation Model Summary ModelRR SquareAdjusted R SquareStd. Error of the Estimate Durbin-Watson a a. Predictors: (Constant), Age at baseline

Summary After fitting any regression model check assumptions - Functional form – linearity is default, often not best fit, consider quadratic… Functional form – linearity is default, often not best fit, consider quadratic… Check Residuals for approx. normality Check Residuals for approx. normality Check Residuals for outliers (> 3 SDs) Check Residuals for outliers (> 3 SDs) All accomplished within SPSS All accomplished within SPSS

Practical on Model Checking Read in ‘LDL Data.sav’ 1) Fit age squared term in min LDL model and check fit of model compared to linear fit (Hint: Use transform/compute to create age squared term and fit age and age 2 ) 2) Fit separate linear regressions with min Chol achieved with predictors of 1) baseline Chol 2) APOE_lin 3) adherence Check assumptions and interpret results