Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.

Similar presentations


Presentation on theme: "Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r."— Presentation transcript:

1 Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r

2 Translating a research question into a statistical procedure How strong is the linear relationship between skin cancer mortality and latitude? –(Pearson) correlation coefficient r –Coefficient of determination r 2

3 Where does this topic fit in? Model formulation Model estimation Model evaluation Model use

4 Situation #1 A very weak linear relationship

5 Situation #2 A fairly strong linear relationship

6 Coefficient of determination r 2 r 2 is a number (a proportion!) between 0 and 1. If r 2 = 1: –all data points fall perfectly on the regression line –the predictor x accounts for all of the variation in y If r 2 = 0: –the fitted regression line is perfectly horizontal –the predictor x accounts for none of the variation in y

7 Interpretation of r 2 r 2 ×100 percent of the variation in y is reduced by taking into account predictor x. r 2 ×100 percent of the variation in y is “explained by” the variation in predictor x.

8 R-sq in Minitab fitted line plot

9 R-sq in Minitab regression output The regression equation is Mort = 389.189 - 5.97764 Lat S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 % Analysis of Variance Source DF SS MS F P Regression 1 36464.2 36464.2 99.7968 0.000 Error 47 17173.1 365.4 Total 48 53637.3

10 Pearson correlation coefficient r r is a (unitless) number between -1 and 1, inclusive. Sign of coefficient of correlation –plus sign if slope of fitted regression line is positive –negative sign if slope of fitted regression line is negative If r 2 is represented in decimal form, e.g. 0.39 or 0.87, then:

11 Formulas for the Pearson correlation coefficient r

12 What do we learn from the formulas for r? The correlation coefficient r gets its sign from the slope b 1. The correlation coefficient r is a unitless measure. The correlation coefficient r = 0 when the estimated slope b 1 = 0 and vice versa.

13 Interpretation of Pearson correlation coefficient r There is no nice practical interpretation for r as there is for r 2. r = -1 is perfect negative linear relationship. r = 1 is perfect positive linear relationship. r = 0 is no linear relationship. For other r, how strong the relationship between x and y is deemed depends on the research area.

14 Pearson correlation coefficient r in Minitab Correlations: Mort, Lat Pearson correlation of Mort and Lat = -0.825 Correlations: Lat, Mort Pearson correlation of Lat and Mort = -0.825

15 How strong is the linear relationship between Celsius and Fahrenheit? Pearson correlation of Celsius and Fahrenheit = 1.000

16 How strong is the linear relationship between # of stories and height? Pearson correlation of HEIGHT and STORIES = 0.951

17 How strong is the linear relationship between driver age and see distance? Pearson correlation of Distance and DrivAge = -0.801

18 How strong is the linear relationship between height and g.p.a.? Pearson correlation of height and gpa = -0.053

19 Caution #1 The correlation coefficient r quantifies the strength of a linear relationship. It is possible to get r = 0 with a perfect curvilinear relationship.

20 Example of Caution #1 Pearson correlation of x and y = 0.000

21 Clarification of Caution #1 Pearson correlation of x and y = 0.000

22 Caution #2 A large r 2 value should not be interpreted as meaning that the estimated regression line fits the data well. Another function might better describe the trend in the data.

23 Example of Caution #2 Pearson correlation of Year and USPopn = 0.959

24 Caution #3 The coefficient of determination r 2 and the correlation coefficient r can both be greatly affected by just one data point (or a few data points).

25 Example of Caution #3 Pearson correlation of Deaths and Magnitude = 0.732

26 Example of Caution #3 Pearson correlation of Deaths and Magnitude = -0.960

27 Caution #4 Correlation (association) does not imply causation.

28 Example of Caution #4 Pearson correlation of Wine and Heart = -0.843

29 Caution #5 Ecological correlations are correlations that are based on rates or averages. Ecological correlations tend to overstate the strength of an association.

30 Example of Caution #5 Data from 1988 Current Population Survey Treating individuals as the units –Correlation between income and education for men age 25-64 in U.S. is r ≈ 0.4. Treating nine regions as the units –Compute average income and average education for men age 25-64 in each of the nine regions. –Correlation between the average incomes and the average education in U.S. is r ≈ 0.7.

31 Example of Caution #5

32

33 Caution #6 A “statistically significant” r 2 does not imply that the slope β 1 is meaningfully different from 0.

34 Caution #7 A large r 2 does not necessarily mean that a useful prediction of the response y new (or estimation of the mean response μ Y ) can be made. It is still possible to get prediction (or confidence) intervals that are too wide to be useful.

35 Using the sample correlation r to learn about the population correlation ρ

36 Translating a research question into a statistical procedure Is there a linear relationship between skin cancer mortality and latitude? –t-test for testing H 0 : β 1 = 0 –ANOVA F-test for testing H 0 : β 1 = 0 Is there a linear correlation between husband’s age and wife’s age? –t-test for testing population correlation coefficient H 0 : ρ = 0

37 Where does this topic fit in? Model formulation Model estimation Model evaluation Model use

38 Is there a linear correlation between husband’s age and wife’s age? Pearson correlation of HAge and WAge = 0.939

39 Is there a linear correlation between husband’s age and wife’s age? Pearson correlation of WAge and HAge = 0.939

40 The formal t-test for correlation coefficient ρ Null hypothesis H 0 : ρ = 0 Alternative hypothesis H A : ρ ≠ 0 or ρ 0 Test statistic P-value = What is the probability that we’d get a t* statistic as extreme as we did, if the null hypothesis is true? The P-value is determined by comparing t* to a t distribution with n-2 degrees of freedom.

41 Is there a linear correlation between husband’s age and wife’s age? Test statistic: Student's t distribution with 168 DF x P( X <= x ) 35.3900 1.0000 Help in determining the P-value: Just let Minitab do the work: Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000

42 When is it okay to use the t-test for testing H 0 : ρ = 0? When it is not obvious which variable is the response. When the (x, y) pairs are a random sample from a bivariate normal population. –For each x, the y’s are normal with equal variances. –For each y, the x’s are normal with equal variances. –Either, y can be considered a linear function of x. –Or, x can be considered a linear function of y. The (x, y) pairs are independent.

43 The three tests will always yield similar results. Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000 The regression equation is HAge = 3.59 + 0.967 Wage 170 cases used 48 cases contain missing values Predictor Coef SE Coef T P Constant 3.590 1.159 3.10 0.002 WAge 0.96670 0.02742 35.25 0.000 S = 4.069 R-Sq = 88.1% R-Sq(adj) = 88.0% Analysis of Variance Source DF SS MS F P Regression 1 20577 20577 1242.51 0.000 Error 168 2782 17 Total 169 23359

44 The three tests will always yield similar results. The regression equation is WAge = 1.57 + 0.911 HAge 170 cases used 48 cases contain missing values Predictor Coef SE Coef T P Constant 1.574 1.150 1.37 0.173 HAge 0.91124 0.02585 35.25 0.000 S = 3.951 R-Sq = 88.1% R-Sq(adj) = 88.0% Analysis of Variance Source DF SS MS F P Regression 1 19396 19396 1242.51 0.000 Error 168 2623 16 Total 169 22019 Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000

45 Which results should I report? If one of the variables can be clearly identified as the response, report the t-test or F-test results for testing H 0 : β 1 = 0. –Does it make sense to use x to predict y? If it is not obvious which variable is the response, report the t-test results for testing H 0 : ρ = 0. –Does it only make sense to look for an association between x and y?


Download ppt "Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r."

Similar presentations


Ads by Google