Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 9 Correlation. 2 Chapter 9 Correlation Major Points The problemThe problem ScatterplotsScatterplots An exampleAn example The correlation coefficientThe.

Similar presentations


Presentation on theme: "Chapter 9 Correlation. 2 Chapter 9 Correlation Major Points The problemThe problem ScatterplotsScatterplots An exampleAn example The correlation coefficientThe."— Presentation transcript:

1 Chapter 9 Correlation

2 2 Chapter 9 Correlation Major Points The problemThe problem ScatterplotsScatterplots An exampleAn example The correlation coefficientThe correlation coefficient –Correlations on ranks Factors affecting correlationsFactors affecting correlations Cont.

3 3 Chapter 9 Correlation Major Points--cont. Testing for significanceTesting for significance Intercorrelation matricesIntercorrelation matrices Other kinds of correlationsOther kinds of correlations Review questionsReview questions

4 4 Chapter 9 Correlation The Problem Are two variables related?Are two variables related? –Does one increase as the other increases? e. g. skills and incomee. g. skills and income –Does one decrease as the other increases? e. g. health problems and nutritione. g. health problems and nutrition How can we get a numerical measure of the degree of relationship? Remember though, that correlation does not equal causation!!!!How can we get a numerical measure of the degree of relationship? Remember though, that correlation does not equal causation!!!! Up to this point we have been comparing means on one variable. Now we are looking at the relationship between 2 or more variables.Up to this point we have been comparing means on one variable. Now we are looking at the relationship between 2 or more variables.

5 5 Chapter 9 CorrelationScatterplots Examples from text pg 172Examples from text pg 172 –See next three slides Every point on the plot represents a subjects scores on 2 variablesEvery point on the plot represents a subjects scores on 2 variables Typically the “predictor” goes on the abcissa or X axis, the “criterion” or outcome variable goes on the ordinate, or Y axis. Sometimes can’t distinguishTypically the “predictor” goes on the abcissa or X axis, the “criterion” or outcome variable goes on the ordinate, or Y axis. Sometimes can’t distinguish

6 6 Chapter 9 Correlation Each point represents two scores for a country, # physicians and mortality As the number of physicians rises per 100,000 population, so does the infant mortality rate. Notice that previously we plotted a score value and its frequency. Now EACH individual is plotted with his/her two respective scores, on two variables.

7 7 Chapter 9 Correlation How strongly are these related? Not strongly at all. There is virtually no relationship, the line is almost flat (horizontal). So we could say that spending more on healthcare doesn’t seem to be related to greater life expectancy, likewise living longer not related to increased health care costs for men. But what if we looked at this association for men age 20-75? Differences? Outliers in this chart?

8 8 Chapter 9 Correlation What is the trend here? As the amount of solar radiation increases (exposure to sunshine), what happens to the breast cancer rate? This is a negative association, and appears to be fairly strong.

9 9 Chapter 9 Correlation Heart Disease and Cigarettes Landwehr & Watkins report data on heart disease and cigarette smoking in 21 developed countriesLandwehr & Watkins report data on heart disease and cigarette smoking in 21 developed countries Data have been rounded to whole numbers for computational convenience.Data have been rounded to whole numbers for computational convenience. –The results were not affected.

10 10 Chapter 9 Correlation The Data Surprisingly, the U.S. it the first country on the list--the country with the highest consumption and highest mortality. What is the unit of analysis here? Each country.

11 11 Chapter 9 Correlation Scatterplot of Heart Disease CHD Mortality goes on ordinate (y axis)CHD Mortality goes on ordinate (y axis) –Why? Cigarette consumption on abscissa (X axis)Cigarette consumption on abscissa (X axis) –Why? What does each dot represent?What does each dot represent? Best fitting line included for clarityBest fitting line included for clarity

12 12 Chapter 9 Correlation {X = 6, Y = 11}

13 13 Chapter 9 Correlation What Does the Scatterplot Show? As smoking increases, so does coronary heart disease mortality.As smoking increases, so does coronary heart disease mortality. Relationship looks strongRelationship looks strong Not all data points on line.Not all data points on line. –This gives us “residuals” or “errors of prediction” To be discussed laterTo be discussed later

14 14 Chapter 9 Correlation Correlation Coefficient A measure of degree (strength) of relationship. Most common is Pearson rA measure of degree (strength) of relationship. Most common is Pearson r Sign refers to direction. (+ or - )Sign refers to direction. (+ or - ) Based on covarianceBased on covariance –Measure of degree to which large scores go with large scores, and small scores with small scores –As scores on variable increase, scores on the other variable tend to….

15 15 Chapter 9 CorrelationCovariance The formulaThe formula Basically tells if they tend to vary together, when x is above mean, does y tend to be above the mean? X below mean, is y below mean?Basically tells if they tend to vary together, when x is above mean, does y tend to be above the mean? X below mean, is y below mean? When would cov XY be large and positive?When would cov XY be large and positive? When would cov XY be large and negative?When would cov XY be large and negative?

16 16 Chapter 9 Correlation Correlation Coefficient Symbolized by Pearson rSymbolized by Pearson r Covariance is a function of the standard deviations, so we standardize the covariance by dividing by SD’s (just like we did with z’s, to get rid of original units)Covariance is a function of the standard deviations, so we standardize the covariance by dividing by SD’s (just like we did with z’s, to get rid of original units) Covariance ÷ (product of st. dev.)Covariance ÷ (product of st. dev.) Always a value between -1 and 1Always a value between -1 and 1

17 17 Chapter 9 Correlation Calculation: Heart Disease and Smoking Cov XY = 11.13, not meaningfulCov XY = 11.13, not meaningful s X = 2.33s X = 2.33 s Y = 6.69s Y = 6.69 Remember that the covariance divided by the standard deviations cannot be greater that 1.0, or less than –1.00.

18 18 Chapter 9 CorrelationCorrelation--cont. Correlation =.71Correlation =.71 Sign is positiveSign is positive –Why? If sign were negativeIf sign were negative –What would it mean? –Would not alter the degree of relationship. Interpretation: correlations of +/-Interpretation: correlations of +/-.00-.30 weak.30-.65 moderate, +.65 strong strong correlations were defined as an absolute value of r ranging from.65 to 1.00, moderate correlations were defined as an absolute value of r ranging from.30 to.65, and weak correlations were defined as an absolute value of r ranging from.00 to.30.

19 19 Chapter 9 Correlation Factors Affecting r Range restrictionsRange restrictions –See next slide Data only for countries with low consumption, greatly reduces correlationData only for countries with low consumption, greatly reduces correlation NonlinearityNonlinearity –e.g. age and size of vocabulary have a curvilinear association, not a straight line Homogeneity of variance or homoskedasticity: equal spread of data points at each point on the plot, assumed (football or loaf of french bread around line of best fit). Heteroskedasticity: uneven spread/variability (indicated by fan shaped or bowtie shaped spreadHomogeneity of variance or homoskedasticity: equal spread of data points at each point on the plot, assumed (football or loaf of french bread around line of best fit). Heteroskedasticity: uneven spread/variability (indicated by fan shaped or bowtie shaped spread Heterogeneous subsamplesHeterogeneous subsamples –Examples: ex. male and female height and weight, p 187. Different correlations for 2 groups, Combining the data may inflate or reduce the correlation Subsamples: data could be divided into two distinct sets on the basis of some other variable.

20 20 Chapter 9 Correlation Countries With Low Consumptions Data With Restricted Range Truncated at 5 Cigarettes Per Day Cigarette Consumption per Adult per Day 5.55.04.54.03.53.02.5 CHD Mortality per 10,000 20 18 16 14 12 10 8 6 4 2 What did this data look like when the range was broader?

21 21 Chapter 9 Correlation Testing r: Statistical significance of the correlation Population parameter = Population parameter =  Null hypothesis H 0 :  = 0, correlation in population is zeroNull hypothesis H 0 :  = 0, correlation in population is zero –Test of linear independence (no relation) –What would a true null mean here? –What would a false null mean here? Alternative hypothesis (H 1 )   0Alternative hypothesis (H 1 )   0 –Two-tailed: check if r > 0 and r 0 and r < 0

22 22 Chapter 9 Correlation Tables of Significance We could look up the critical r, how large r needs to be to reject null, Table in Appendix D.2We could look up the critical r, how large r needs to be to reject null, Table in Appendix D.2 For N - 2 = 19 df, r crit =.433For N - 2 = 19 df, r crit =.433 Our correlation in smoking study.71 >.433Our correlation in smoking study.71 >.433 Reject H 0Reject H 0 –Correlation is significant. –More cigarette consumption associated with more CHD mortality.

23 23 Chapter 9 Correlation Computer Printout SPSS intercorrelation matrix Correlations 1.000.239**.264**..003.001 132.239**1.000.095.003..139 132.264**.0951.000.001.139. 132 Pearson Correlation Sig. (1-tailed) N Pearson Correlation Sig. (1-tailed) N Pearson Correlation Sig. (1-tailed) N mother init jointattn14 baby init jointattn14 Social play scale mother init jointattn14 baby init jointattn14Social play scale Correlation is significant at the 0.01 level (1-tailed). **. Look at pattern of correlations for a set of variables, diagonal always 1.0 Interpret strength and direction of correlations Sig or p value indicated on line 1 with**, line 2 with actual p value

24 24 Chapter 9 Correlation Checking the assumptions Linearity: scatterplot, line of best fit should be pretty good fit for dataLinearity: scatterplot, line of best fit should be pretty good fit for data Normally distributed variables: histograms, mean+/- 2 ruleNormally distributed variables: histograms, mean+/- 2 rule Homoskedasticty: scatterplot should have football shape or loaf of french breadHomoskedasticty: scatterplot should have football shape or loaf of french bread No Heterogenous subsamples: scatterplot not have 2 distinct patternsNo Heterogenous subsamples: scatterplot not have 2 distinct patterns No Restricted range: descriptive stats should look normal, not skewed, good rangeNo Restricted range: descriptive stats should look normal, not skewed, good range Random sample usedRandom sample used No Outliers: scatterplot should not have any points that are far away from data spreadNo Outliers: scatterplot should not have any points that are far away from data spread

25 25 Chapter 9 Correlation Checking scatterplot

26 26 Chapter 9 Correlation Review Questions What determines what goes on which axis of a scatterplot?What determines what goes on which axis of a scatterplot? What would a correlation of 0 tell us about the relationship between lab grades and exam grades?What would a correlation of 0 tell us about the relationship between lab grades and exam grades? What factors might affect the relationship between smoking and CHD Mortality?What factors might affect the relationship between smoking and CHD Mortality?

27 27 Chapter 9 Correlation Review Questions--cont. Indicate level (high, med., or low) and sign of the correlation for:Indicate level (high, med., or low) and sign of the correlation for: –number of guns in community and number firearm deaths –robberies and incidence of drug abuse –In incidence of protected sex and incidence of AIDS –community education level and crime rate –Frequency of solar flares (bursts of light) and suicide Why would the size of the correlation required for significance (critical r) decrease as N increases? Cont.


Download ppt "Chapter 9 Correlation. 2 Chapter 9 Correlation Major Points The problemThe problem ScatterplotsScatterplots An exampleAn example The correlation coefficientThe."

Similar presentations


Ads by Google