Copyright (c)Bani K. Mallick1 STAT 651 Lecture #21.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Copyright © 2010 Pearson Education, Inc. Slide
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Copyright (c) Bani Mallick1 STAT 651 Lecture 10. Copyright (c) Bani Mallick2 Topics in Lecture #10 Comparing two population means using rank tests Comparing.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Linear Regression and Correlation Analysis
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Simple Linear Regression Analysis
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
SIMPLE LINEAR REGRESSION
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Copyright (c) Bani Mallick1 STAT 651 Lecture # 11.
Correlation and Regression Analysis
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Chapter 11 Simple Regression
Correlation and Linear Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Simple Linear Regression Models
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Introduction to Linear Regression
Chapter 10 Correlation and Regression
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Simple linear regression Tron Anders Moger
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Regression and Correlation
Correlation and Regression
Linear Regression and Correlation
Linear Regression and Correlation
Presentation transcript:

Copyright (c)Bani K. Mallick1 STAT 651 Lecture #21

Copyright (c)Bani K. Mallick2 Topics in Lecture #21 Correlation

Copyright (c)Bani K. Mallick3 Book Sections Covered in Lecture #21 Chapter 11.7

Copyright (c)Bani K. Mallick4 Lecture 20 Review: Leverage and Outliers Outliers in Linear Regression are difficult to diagnose They depend crucially on where X is * * * * * A boxplot of Y would think this is an outlier, when in reality it fits the line quite well

Copyright (c)Bani K. Mallick5 Lecture 20 Review: Outliers and Leverage It’s also the case than one observation can have a dramatic impact on the fit * * * * * * * The slope of the line depends crucially on the value far to the right

Copyright (c)Bani K. Mallick6 Lecture 20 Review: Outliers and Leverage But Outliers can occur * * * * * * * This point is simply too high for its value of X Line with Outlier Line without Outlier

Copyright (c)Bani K. Mallick7 Lecture 20 Review: Outliers and Leverage A leverage point is an observation with a value of X that is outlying among the X values An outlier is an observation of Y that seems not to agree with the main trend of the data Outliers and leverage values can distort the fitted least squares line It is thus important to have diagnostics to detect when disaster might strike

Copyright (c)Bani K. Mallick8 Lecture 20 Review: Outliers and Leverage We have three methods for diagnosing high leverage values and outliers Leverage plots: For a single X, these are basically the same as boxplots of the X-space (leverage) Cook’s distance (measures how much the fitted line changes if the observation is deleted) Residual Plots

Copyright (c)Bani K. Mallick9 Correlation and Measures of Fit You all know the word “correlation”, as in “Height and Weight are positively correlated” Many of you may also have heard of R-squared denoted by R 2 Both are measures of how well an independent variable predicts a dependent variable

Copyright (c)Bani K. Mallick10 Correlation and Measures of Fit R 2 measures the fraction of variance explained by the least squares line The relevant sums of squares are The fraction of the total sum of squares explained by the fitted line is

Copyright (c)Bani K. Mallick11 Correlation and Measures of Fit R 2 measures the fraction of variance explained by the least squares line If Y and X are perfectly linearly related, then all the variation in Y is explained by the line, and thus R 2 = 1 If Y and X are completely independent, then the line explains nothing about Y, so R 2 = 0 However, Y and X can be perfectly related but not linearly, and R 2 is misleading in this case (see later on)

Copyright (c)Bani K. Mallick12 GPA and Height Note that this is a fairly weak relationship, so little variance explained: suggests R- squared is near zero

Copyright (c)Bani K. Mallick13 GPA and Height

Copyright (c)Bani K. Mallick14 Aortic Valve Area and Body Surface Area Note that this is a stronger relationship: suggests R- squared is higher

Copyright (c)Bani K. Mallick15 AVA and BSA in Healthy Kids

Copyright (c)Bani K. Mallick16 Correlation and Measures of Fit The (Pearson) correlation coefficient measures how well Y and X are linearly related The correlation is always between –1 and +1

Copyright (c)Bani K. Mallick17 Correlation and Measures of Fit If the correlation = +1, then Y and X are perfectly positively related If the correlation = -1, then Y and X are perfectly negatively related If the correlation = 0, then Y and X are not linearly related

Copyright (c)Bani K. Mallick18 Correlation and Measures of Fit The (Spearman) correlation coefficient measures how well Y and X are monotonically related Replace Y by its rank among the Y’s Replace X by its rank among the X’s Computer the (Pearson) correlation Why would someone do a Spearman correlation?

Copyright (c)Bani K. Mallick19 Correlation and Measures of Fit The (Spearman) correlation coefficient measures how well Y and X are monotonically related Replace Y by its rank among the Y’s Replace X by its rank among the X’s Computer the (Pearson) correlation Why would someone do a Spearman correlation? Because it is more robust to outliers, and it is not affected by transformations

Copyright (c)Bani K. Mallick20 Correlation and Measures of Fit Both types of correlations are easily obtained in SPSS Go to “Analyze”, “Correlation” and type in all the variables that you want correlations for You have to click on Spearman to get it, otherwise you get only Pearson Confidence intervals for the population correlations are not included SPSS Demonstration using aortic data

Copyright (c)Bani K. Mallick21 Correlation and Measures of Fit The diagonals are meaningless: Y is perfectly correlated with Y Correlations **.873** ** ** **.982** Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Body Surface Area Aortic Valve Area Log(1+Aortic Valve Area) Body Surface Area Aortic Valve Area Log(1+Aortic Valve Area) Correlation is significant at the 0.01 level (2-tailed). **.

Copyright (c)Bani K. Mallick22 Correlation and Measures of Fit Note how the Spearman correlation of BSA and AVA is the same as the Spearman correlation of bSA and log(1+AVA)

Copyright (c)Bani K. Mallick23 Correlation and Measures of Fit Both correlations are random variables, i.e., if you redo an experiment, you will get a different Pearson correlation The population Pearson correlation is The estimated standard error of the sample Pearson correlation is

Copyright (c)Bani K. Mallick24 Correlation and Measures of Fit Null hypothesis of no linear relationship: A (1  100% CI for the population Pearson correlation  is Since the population correlation must be between –1 and +1, you should restrict your interval to that range: reject null if interval does not include 0

Copyright (c)Bani K. Mallick25 Correlation and Measures of Fit Consider the aortic stenosis healthy kids n = 70, Pearson correlation = The 95% CI is What is the meaning of this interval?

Copyright (c)Bani K. Mallick26 Correlation and Measures of Fit Consider the aortic stenosis healthy kids n = 70, Pearson correlation = The 95% CI is What is the meaning of this interval? 95% certain that the population Pearson correlation  is between.747 and.985

Copyright (c)Bani K. Mallick27 Some Warnings About Correlation The Pearson correlation can be greatly affected by outliers and leverage values This is why it is good to have the Spearman

Copyright (c)Bani K. Mallick28 Aortic Stenosis Data: Note the outlier in the Stenotic Kids

Copyright (c)Bani K. Mallick29 Some Warnings About Correlation The Pearson correlation with the outlier in the Stenotic kids is It is without the outlier The Spearman correlations are and with and without the outlier I can make correlations dance

Copyright (c)Bani K. Mallick30 Some Warnings About Correlation The correlations are (left) and 1.00 (right)! Only one point differs: high leverage outlier Linear Regression x Outlier added A A A A A A A A A A A x A A A A A A A A A A A Made Up Data with (left) and without (right) a high leverage outlier

Copyright (c)Bani K. Mallick31 Some Warnings About Correlation The Pearson correlation only measures linear correlation If your relationship is not linear, then Pearson will get confused

Copyright (c)Bani K. Mallick32 Some Warnings About Correlation Note the perfect quadratic relationship Pearson corr = 0

Copyright (c)Bani K. Mallick33 Construction Data

Copyright (c)Bani K. Mallick34 Construction Data Correlations **.120* ** ** *.896** Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Age (Modified) Base Pay (modified) Log(Base Pay modified - $30,000) Age (Modified) Base Pay (modified) Log(Base Pay modified - $30,000) Correlation is significant at the 0.01 level (2-tailed). **. Correlation is significant at the 0.05 level (2-tailed). *.

Copyright (c)Bani K. Mallick35 Construction Data

Copyright (c)Bani K. Mallick36 Armspan Data (Males)

Copyright (c)Bani K. Mallick37 Armspan Data (Males)

Copyright (c)Bani K. Mallick38 Armspan Data (Males) A 95% confidence interval for the population Pearson correlation is Meaning?

Copyright (c)Bani K. Mallick39 Armspan Data (Males)