Download presentation
Presentation is loading. Please wait.
Published byAdrian Rich Modified over 9 years ago
1
Basic Concepts of Correlation
2
Definition A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way.
3
Definition The linear correlation coefficient r measures the strength of the linear relationship between the paired quantitative x- and y- values in a sample.
4
Exploring the Data We can often see a relationship between two variables by constructing a scatterplot. Figure 10-2 following shows scatterplots with different characteristics.
5
Scatterplots of Paired Data Figure 10-2
6
Scatterplots of Paired Data Figure 10-2
7
Scatterplots of Paired Data Figure 10-2
8
Requirements 1. The sample of paired ( x, y ) data is a simple random sample of quantitative data. 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern. 3. The outliers must be removed ONLY if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.
9
Notation for the Linear Correlation Coefficient n = number of pairs of sample data denotes the addition of the items indicated. x denotes the sum of all x- values. x 2 indicates that each x - value should be squared and then those squares added. ( x ) 2 indicates that the x- values should be added and then the total squared.
10
Notation for the Linear Correlation Coefficient xy indicates that each x -value should be first multiplied by its corresponding y- value. After obtaining all such products, find their sum. r = linear correlation coefficient for sample data. = linear correlation coefficient for population data.
11
Formula 10-1 n xy – ( x)( y) n( x 2 ) – ( x) 2 n( y 2 ) – ( y) 2 r = The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample. Computer software or calculators can compute r Formula
12
Interpreting r Using Table A-6: If the absolute value of the computed value of r, denoted | r |, exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Using Software: If the computed P-value is less than or equal to the significance level, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation.
13
- Ken Carroll, 1975 Correlation of incidence of death due to breast cancer with animal fat intake
14
- Ken Carroll, 1975 Correlation of incidence of death due to breast cancer with vegetable fat intake
15
From Francis Anscombe, 1973… each plot has the same mean (y=7.5), SD (4.12), r, (0.816), and linear regression line (y = 3 + 0.5x)…
16
ppppr 2 df0.100.050.020.01 … 80.5490.6320.7160.7650.585 90.5210.6020.6850.7350.540 100.4970.5760.6580.7080.501 110.4760.5530.6340.6840.467 120.4580.5320.6120.6610.437 130.4410.5140.5920.6410.410 140.4260.4970.5740.6230.388 150.4120.4820.5580.6060.367 160.4000.4680.5420.5900.348 170.3890.4560.5280.5750.331 180.3780.4440.5160.5610.315 190.3690.4330.5030.5490.288 200.3600.4230.4920.5370.651 300.2960.3490.4090.4490.202 400.2570.3040.3580.3930.154 500.2310.2730.3220.3540.125 600.2110.2500.2950.3250.106 800.1830.2170.2560.2830.080 1000.1640.1950.2300.2540.065 From the previous slide: note that with df of 9 r = 0.816 is significant at p < 0.01 A meaningless correlation in anyone’s book!!!
17
Interpreting the Linear Correlation Coefficient r Critical Values from Table A-6 and the Computed Value of r
18
Interpreting r : Explained Variation The value of r 2 is the proportion of the variation in y that is explained by the linear relationship between x and y.
19
Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no linear correlation.
20
Part 1: Basic Concepts of Regression
21
Regression The typical equation of a straight line y = mx + b is expressed in the form y = b 0 + b 1 x, where b 0 is the y -intercept and b 1 is the slope. ^ The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable), and y (called the response variable or dependent variable). ^
22
Definitions Regression Equation Given a collection of paired data, the regression equation Regression Line The graph of the regression equation is called the regression line (or line of best fit, or least squares line). y = b 0 + b 1 x ^ algebraically describes the relationship between the two variables.
23
Notation for Regression Equation y -intercept of regression equation Slope of regression equation Equation of the regression line Population Parameter Sample Statistic 0 b 0 1 b 1 y = 0 + 1 x y = b 0 + b 1 x ^
24
Requirements 1. The sample of paired ( x, y ) data is a random sample of quantitative data. 2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern. 3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.
25
Formulas for b 0 and b 1 Formula 10-3 (slope) ( y -intercept) Formula 10-4 calculators or computers can compute these values
26
The regression line fits the sample points best. Special Property
27
Rounding the y -intercept b 0 and the Slope b 1 Round to three significant digits. If you use the formulas 10-3 and 10-4, do not round intermediate values.
28
- Ken Carroll, 1975 From Francis Anscombe, 1973… each plot has the same mean (y=7.5), SD (4.12), r, (0.816), and linear regression line (y = 3 + 0.5x)…
29
We should know that the regression equation is an estimate of the true regression equation. This estimate is based on one particular set of sample data, but another sample drawn from the same population would probably lead to a slightly different equation.
30
1.Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well?????. Using the Regression Equation for Predictions 2.Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables????? Think… r 2
31
A % Fat % Fat Based on actual BD Based on predicted BD A25.96% 13.92% B 7.91% 13.92% B
33
Be VERY skeptical of individual “results” based on linear regression equations being used to make multiple serial predictions
34
Summary - correlation is used to calculate the strength of a linear relationship between paired variables from a sample - r value > or < 0 indicates a non-zero association between the 2 variables - a scatter plot is used to visualize the relationship between the 2 variables - r CANNOT indicate a causal relationship - interpret correlation by using r 2 … indicating the proportion of variation in variable A accounted for by the variation in variable B “significant r only means NOT a “0 association” - regression equation describes the association between the 2 variables - regression line is a graph of the regression equation describing the best fit line of the scatter plot - only with confidence limits can you properly interpret a linear regression line
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.