Lecture 16 Correlation and Coefficient of Correlation

Slides:



Advertisements
Similar presentations
Correlation and regression
Advertisements

13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Correlation and regression Dr. Ghada Abo-Zaid
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Describing Relationships Using Correlation and Regression
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
PSY 307 – Statistics for the Behavioral Sciences
Chapter 12 Simple Regression
PPA 415 – Research Methods in Public Administration
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Relationships Among Variables
Lecture 5 Correlation and Regression
Lecture 15 Basics of Regression Analysis
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
SIMPLE LINEAR REGRESSION
Correlation and Regression
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Correlation and Linear Regression
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Covariance and correlation
Correlation.
Chapter 15 Correlation and Regression
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Chapter 14 Correlation and Regression
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
CORRELATION ANALYSIS.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Regression
CORRELATION ANALYSIS.
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Warsaw Summer School 2017, OSU Study Abroad Program
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Lecture 16 Correlation and Coefficient of Correlation By Aziza Munir

Learning Objectives What is Correlation What does it indicates What is the purpose of correlation if regression is already there/ What does coefficient of Correlation Indicates Linear, multiple and Partial correlation

Introduction Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between two variables When variables are dependent on time correlation is applied Correlation lies between +1 to -1

A zero correlation indicates that there is no relationship between the variables A correlation of –1 indicates a perfect negative correlation A correlation of +1 indicates a perfect positive correlation

Types of Correlation Types Type 1 Type 2 Type 3 There are three types of correlation Types Type 1 Type 2 Type 3

Type1 Positive Negative No Perfect If two related variables are such that when one increases (decreases), the other also increases (decreases). If two variables are such that when one increases (decreases), the other decreases (increases) If both the variables are independent

Type 2 Linear Non – linear When plotted on a graph it tends to be a perfect line When plotted on a graph it is not a straight line

Type 3 Simple Multiple Partial Two independent and one dependent variable One dependent and more than one independent variables One dependent variable and more than one independent variable but only one independent variable is considered and other independent variables are considered constant

Methods of Studying Correlation Scatter Diagram Method Karl Pearson Coefficient Correlation of Method Spearman’s Rank Correlation Method

Correlation: Linear Relationships Strong relationship = good linear fit Very good fit Moderate fit Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)

Coefficient of Correlation A measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations Represented by “r” r lies between +1 to -1 Magnitude and Direction

-1 < r < +1 The + and – signs are used for positive linear correlations and negative linear correlations, respectively

Shared variability of X and Y variables on the top Individual variability of X and Y variables on the bottom

Interpreting Correlation Coefficient r strong correlation: r > .70 or r < –.70 moderate correlation: r is between .30 & .70 or r is between –.30 and –.70 weak correlation: r is between 0 and .30 or r is between 0 and –.30 .

Coefficient of Determination Coefficient of determination lies between 0 to 1 Represented by r2 The coefficient of determination is a measure of how well the regression line represents the data If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation The further the line is away from the points, the less it is able to explain

r 2, is useful because it gives the proportion of the variance (fluctuation) of one variable that is predictable from the other variable It is a measure that allows us to determine how certain one can be in making predictions from a certain model/graph  The coefficient of determination is the ratio of the explained variation to the total variation The coefficient of determination is such that 0 <  r 2 < 1,  and denotes the strength of the linear association between x and y   

The Coefficient of determination represents the percent of the data that is the closest to the line of best fit For example, if r = 0.922, then r 2 = 0.850 Which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation) The other 15% of the total variation in y remains unexplained

Spearmans rank coefficient A method to determine correlation when the data is not available in numerical form and as an alternative the method, the method of rank correlation is used. Thus when the values of the two variables are converted to their ranks, and there from the correlation is obtained, the correlations known as rank correlation.

Computation of Rank Correlation Spearman’s rank correlation coefficient ρ can be calculated when Actual ranks given Ranks are not given but grades are given but not repeated Ranks are not given and grades are given and repeated

Testing significance of correlation Test for the significance of relationships between two CONTINUOUS variables We introduced Pearson correlation as a measure of the STRENGTH of a relationship between two variables But any relationship should be assessed for its SIGNIFICANCE as well as its strength. A general discussion of significance tests for relationships between two continuous variables. Factors in relationships between two variables The strength of the relationship: is indicated by the correlation coefficient: r but is actually measured by the coefficient of determination: r2 The significance of the relationship is expressed in probability levels: p (e.g., significant at p =.05) This tells how unlikely a given correlation coefficient, r, will occur given no relationship in the population NOTE! NOTE! NOTE! The smaller the p-level, the more significant the relationship BUT! BUT! BUT! The larger the correlation, the stronger the relationship

Consider the classical model for testing significance It assumes that you have a sample of cases from a population The question is whether your observed statistic for the sample is likely to be observed given some assumption of the corresponding population parameter. If your observed statistic does not exactly match the population parameter, perhaps the difference is due to sampling error The fundamental question: is the difference between what you observe and what you expect given the assumption of the population large enough to be significant -- to reject the assumption? The greater the difference -- the more the sample statistic deviates from the population parameter -- the more significant it is That is, the lessl ikely (small probability values) that the population assumption is true.  The classical model makes some assumptions about the population parameter: Population parameters are expressed as Greek letters, while corresponding sample statistics are expressed in lower-case Roman letters: r = correlation between two variables in the sample (rho) = correlation between the same two variables in the population A common assumption is that there is NO relationship between X and Y in the population: = 0.0 Under this common null hypothesis in correlational analysis: r = 0.0 

Testing for the significance of the correlation coefficient, r When the test is against the null hypothesis: r xy = 0.0 What is the likelihood of drawing a sample with r xy ­ 0.0? The sampling distribution of r is approximately normal (but bounded at -1.0 and +1.0) when N is large and distributes t when N is small. The simplest formula for computing the appropriate t value to test significance of a correlation coefficient employs the t distribution:  The degrees of freedom for entering the t-distribution is N - 2 Example: Suppose you obsserve that r= .50 between literacy rate and political stability in 10 nations Is this relationship "strong"? Coefficient of determination = r-squared = .25 Means that 25% of variance in political stability is "explained" by literacy rate Is the relationship "significant"? That remains to be determined using the formula above r = .50 and N=10

Comments set level of significance (assume .05) determine one-or two-tailed test (aim for one-tailed) For 8 df and one-tailed test, critical value of t = 1.86 We observe only t = 1.63 It lies below the critical t of 1.86 So the null hypothesis of no relationship in the population (r = 0) cannot be rejected   Comments Note that a relationship can be strong and yet not significant Conversely, a relationship can be weak but significant The key factor is the size of the sample. For small samples, it is easy to produce a strong correlation by chance and one must pay attention to signficance to keep from jumping to conclusions: i.e., rejecting a true null hypothesis, which meansmaking a Type I error. For large samples, it is easy to achieve significance, and one must pay attention to the strength of the correlation to determine if the relationship explains very much

Correlation summary Most common form (Pearson) used with two continuous variables, in a linear association Spearman used with curvilinear associations Point-biserial used whenever an independent samples t-test can be used Phi used when a chi square for goodness of fit (with just 2 levels/variable) can be used Can vary between -1 and +1 Does not tell anything about causation

Difference between Correlation and Regression Correlation Coefficient, R, measures the strength of bivariate association    The regression line is a prediction equation that estimates the values of y for any given x

Back to the idea of prediction With correlation, you can predict the value of one variable based on the value of another variable If you know someone’s marital problems, you can predict that person’s level of satisfaction But, if you knew more about that person you could do an even better job predicting satisfaction  regression: used to predict one quantitative variable from a whole mess of quantitative variables

Building up to regression First, the equation for a line? Y = bX + a AKA: Y = mX + b In both, have intercept and slope Intercept = predicted value of Y when X is zero Slope = how much Y is predicted to change as X changes Goal of regression line: Minimize the discrepancy between predicted and actual values of Y

Linking this to correlation Correlation = slope of the regression line, if the scores are in z-scores  predicted z score for Y variable = correlation value * z-score for X variable

Difference between regression and correlation Correlation is a special case of regression, with just one predictor variable Regression lets you add in more predictor variables to: Figure out how much of the Y variable is explained by a whole mess of predictor variables Figure out how much each predictor variable uniquely tells about the Y variable  two tests for significance – for whole model, and for each individual variable

Limitations of the correlation coefficient Though R measures how closely the two variables approximate a straight line, it does not validly measures the strength of nonlinear relationship  When the sample size, n, is small we also have to be careful with the reliability of the correlation Outliers could have a marked effect on R Causal Linear Relationship

Conclusion Correlation and regression Types of correlations Coefficient of correlation and its interpretation Difference between regression and correlation