Correlation & the Coefficient of Determination

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

SADC Course in Statistics Analysis of Variance for comparing means (Session 11)
SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Simple Linear Regression (Session 02)
The Poisson distribution
SADC Course in Statistics Further ideas concerning confidence intervals (Session 06)
SADC Course in Statistics Trends in time series (Session 02)
Assumptions underlying regression analysis
SADC Course in Statistics Basic principles of hypothesis tests (Session 08)
SADC Course in Statistics Inferences about the regression line (Session 03)
SADC Course in Statistics Revision of key regression ideas (Session 10)
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics A model for comparing means (Session 12)
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Probability Distributions
Chapter 4: Basic Estimation Techniques
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Correlation and Regression
Simple Linear Regression Analysis
Copyright © 2012 by Nelson Education Limited. Chapter 13 Association Between Variables Measured at the Interval-Ratio Level 13-1.
Multiple Regression and Model Building
Chapter 16: Correlation.
Correlation and regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
CORRELATON & REGRESSION
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
SADC Course in Statistics Comparing Regressions (Session 14)
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
Linear Regression/Correlation
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linear Regression Analysis
Correlation & Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.
SIMPLE LINEAR REGRESSION
Correlation Scatter Plots Correlation Coefficients Significance Test.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Biostatistics Unit 9 – Regression and Correlation.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 4 Summary Scatter diagrams of data pairs (x, y) are useful in helping us determine visually if there is any relation between x and y values and,
1 Virtual COMSATS Inferential Statistics Lecture-25 Ossam Chohan Assistant Professor CIIT Abbottabad.
© Buddy Freeman, 2015 Let X and Y be two normally distributed random variables satisfying the equality of variance assumption both ways. For clarity let.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
CORRELATION ANALYSIS.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Regression Analysis AGEC 784.
Correlation and Simple Linear Regression
Regression Analysis.
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Correlation and Regression
Ch 4.1 & 4.2 Two dimensions concept
Presentation transcript:

Correlation & the Coefficient of Determination SADC Course in Statistics Correlation & the Coefficient of Determination (Session 04)

Learning Objectives At the end of this session, you will be able to understand the meaning and limitations of Pearson’s coefficient of correlation (r) describe what is meant by the coefficient of determination (R2) and how it relates to r for a simple linear regression model derive the value of R2 using results of an analysis of variance.

What is Correlation? The term “correlation” refers to a measure of the strength of association between two variables. If the two variables increase or decrease together, they have a positive correlation. If, increases in one variable are associated with decreases in the other, they have a negative correlation.

Linear Correlation (r) For two quantitative variables X and Y, for which n pairs of measurements (xi, yi) are available, Pearson’s correlation coefficient (r) gives a measure of the linear association between X and Y. The formula is given below for reference.

Linear Correlation (r) If X and Y are perfectly positively correlated, r = 1 If there is absolutely no association, r = 0 If X and Y are perfectly positively correlated, r = -1 Thus -1 < r < +1 The closer r is to +1 or -1, the greater is the strength of the association.

Possible values for r

Coefficient of Determination It is often difficult to interpret r without some familiarity with the expected values of r. A more appropriate measure to use when interest lies in the dependence of Y on X, is the Coefficient of Determination, R2. It measures the proportion of variation in Y that is explained by X, and is often expressed as a percentage.

Using anova to find R2 Anova (for 93 rural female headed HHs) of log consumption expenditure versus number of persons per sleeping room is: Source d.f. S.S. M.S. F Prob. Regression 1 4.890 21.9 0.000 Residual 91 20.342 0.2235 Total 92 25.231 0.2743 R2 = Regre. S.S. / Total S.S. = 4.89/25.23 = 0.194

Interpretation of R2 From above, we can say that 19.4% of the variability in the income poverty proxy measure is accounted for by the number of persons per sleeping room. Clearly there are many other factors that influence the poverty proxy since over 80% of the variability is left unexplained!

Relationship of R2 to r When there is just one explanatory variable being considered (as in above example), the squared value of r equals R2. In the above example, value of r = -  0.194 = - 0.44 The negative value is used when taking the square root because the graph indicates a negative relationship (see next slide).

Plot of poverty proxy measure vs. persons per sleeping room

Benefits of R2 and r r is useful as an initial exploratory tool when several variables are being considered. The sign of r gives the direction of the association. R2 is useful in regression studies to check how much of the variability in the key response can be explained. R2 is most valuable when there is more than one explanatory variable. High values of R2 are particularly useful when using the model for predictions! (More on this later!)

Limitations of r Observe that seemingly high values of r, e.g. r=0.70, explain only about 50% of the variability in the response variable y. So take care when interpreting correlation coefficients. a low value for r does not necessarily imply absence of a relationship – could be a curved relationship! So plotting the data is crucial! Tests exist for testing there is no association. But depending on the sample size, even low values of r, e.g. r=0.20 can give significant results – not a very useful finding!

Limitations of R2 Note that R2 is only a descriptive measure to give a quick assessment of the model. Other methods exist for assessing the goodness of fit of the model. Adding explanatory variables to the model always increases R2 . Hence in practice, it is more usual to look at the adjusted R2 The adjusted R2 is calculated as 1 – (Residual M.S./Total M.S.) As with R2 , the adjusted R2 is often expressed as a percentage.

Practical work follows to ensure learning objectives are achieved…