Correlation Scatter Plots Correlation Coefficients Significance Test
Introduction We are often asked to describe the relationship between two or more variables Is there a relationship between points in the leaving cert and QCA Is there a relationship between parents IQ and children's IQ
What are Scatter Plots Two dimensional plot showing the (X,Y) value for each observation Used to determine whether there is any pronounced relationship and if so whether the relationship may be treated as approximately linear. Y is usually the response (dependent) variable X is usually the explanatory (independent) variable The response variable is the variable whose variation we wish to explain An explanatory variable is a variable used to explain variation in the response variable
Positive Linear Relationship
Negative Linear Relationship
No Linear Relationship
No Relationship
Example 1 Two sets of exam results for 11 students Maths & Physics Are they related Does a good performance in Maths go with a good performance in Physics Let the Maths mark be X Let the Physics mark be Y
Table of Results X Y X- Total is 440 X-mean is 40 Y-Total is 330 Y-mean is 30
Maths V s Physics
What does the Graph tell us The means divide the graph into four quadrants Most of the data lies in the bottom left or top right quadrants Only two fall outside these quadrants This indicates a probable relationship between X and Y for a particular student
Correlation Coefficient From a diagram we get a general idea of the relationship. For precision we need a numerical measure. We need to measure the strength of the relationship The most common measure is the Pearson Product Moment Correlation Coefficient Usually known as the Correlation Coefficient We will usually be dealing with population samples The sample correlation coefficient is called r
Properties of r r can take values from -1 to +1 r = +1 or r = -1 represents a perfect linear correlation or a perfect relationship between the variables r = 0 indicates little or no linear relationship i.e. as X increases there is no definite tendency for the values of Y to increase or decrease in a straight line r close to +1 indicates a large positive correlation i.e. Y tends to increase as X increases. r close to -1 indicates a large negative correlation i.e. Y tends to decrease as X increases. Further r differs from 0, the stronger the relationship. The sign of r indicates the direction of the relationship
Examples of various r values r = +1r = -1r = r = 0.70r = 0
The formula for Calculating r
Example 2 Find the correlation coefficient r between Y and X SubjectABCDEFG X Y
Create a table SubjectXiXi YiYi XiYiXiYi X i squared Y i squared A B C D E F G Total
Calculating S xx
Calculating S yy
Calculating S xy
Calculating r
Significance Test H o : No Linear relationship exists r equal to 0 H A: There is a linear relationship r not equal to 0 Confidence Interval say 90%, 95%, 99% etc This means alpha = 0.1, 0.05, 0.01 etc Use table 10: Percentage points of the Correlation Coefficient Left hand column choose v = n-2 ( n = sample size) Find critical value If r > critical value then reject H o
Conclusion r = 0.82 let alpha = 0.05 v = n-2 giving v = 5 From tables the critical point is > We reject Ho and conclude: We are 95% confident that there is a linear relationship between X and Y
Example 3 Is there an obvious relationship between X and Y Y = X+2 This is a Perfect Relationship What will r be r will be equal to 1 X Y
Set up the data table SubjectYXXYX squaredY squared A B C D E F Total
Calculate S xx
Calculate S yy
Calculate S xy
Calculate r Perfect Positive Linear Relationship
Back to Example 1 In our original example with the student results we drew a scatter plot. From the diagram it looked as if there was a probable positive linear relationship To be sure we need to calculate r Using a significance level of alpha = 0.05 we will test the claim that there is no linear correlation between Maths results and Physics results
Create a data table StudentXYXYX squaredY squared A B C D E F G H I J K Total
Apply the formulae
Correlation Coefficient is
Conclusion From the tables the critical point is r = > We Reject the claim and conclude that There is a Positive Linear Relationship between results in Maths and results in Physics
Regression Least Squares Predicting Y using X
What is Regression? Regression Analysis is used for prediction It allows us to predict the value of one variable given the value of another variable It gives us an equation that uses one variable to help explain variation in another In this course we deal with Simple Linear Regression
Simple Linear Regression First step in determining a relationship was drawing a scatter plot If a possible relationship was shown we found the strength of the relationship by calculating the correlation coefficient r The next stage is to calculate an equation which best describes the relationship between the two variables This line is called the Regression Line
What is the ‘best fit’ line Example 1
‘Least Squares’ best fit line We can have several lines of the form We want ‘best’ least residuals
Least Squares estimates are the least squares estimates of Closely related to r
Example 2
Combining we get
Regression line is
Example 3 X Y We know Y=X+2
Verifying the equation is correct
Giving
Example 1
Regression line
Example 1 continued If a student received a grade of 53 in Maths, what would the expected grade be in Physics We use the Regression line in order to predict the Physics result
Graphing The Regression Line