Download presentation
Presentation is loading. Please wait.
Published byEzra Owens Modified over 9 years ago
1
Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau
Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr. K. Hennig Winter 2003 Term
2
Overview of chapter Correlations Regressions Pearson r
For non-linear (non scalar) data: Spearman r (with non-linear data) point-biserial (where one variable is dichotomous) phi-coefficient (where both variables are dichotomous) Regressions
3
CORRELATIONS: Figure 16-1 (p
CORRELATIONS: Figure (p. 522) The relationship between exam grade and time needed to complete the exam. Notice the general trend in these data: Students who finish the exam early tend to have better grades.
4
Figure (p. 523) The same set of n = 6 pairs of scores (X and Y values) is shown in a table and in a scatterplot. Notice that the scatterplot allows you to see the relationship between X and Y.
5
Three characteristics 1
Three characteristics 1. Direction: examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively related to temperature.
6
2. Form: Examples of relationships that are not linear: (a) relationship between reaction time and age; (b) relationship between mood and drug dose.
7
3. Degree: Examples of different values for linear correlations: (a) shows a strong positive relationship, approximately +0.90; (b) shows a relatively weak negative correlation, approximately –0.40; (c) shows a perfect negative correlation, –1.00; (d) shows no linear trend, 0.00.
8
Pearson (product-moment) correlation
sum of products of deviations, or SP = (X-Mx) (Y-MY), Mx = mean for x scores, etc. Recall: SS = ∑(X-M)2=(X-M)(X-M) 3 5
9
Pearson (product-moment) correlation
r = degree to which X and Y vary together degree X and Y vary separately computational formula: SP= XY- XY/n expressed as a z-score: r= zxzy/n note: must use population
10
Understanding and interpreting r
correlation do not prove causation, but they can disprove causation the value of a correlation can be effected greatly by range of scores in the data outliers can have a dramatic effect do not interpret a correlation as a proportion (e.g., 0.50 = 50%); rather r2 = .25 or 25% of the total variability is accounted for| -is called the coefficient of determination
11
The effect of range (a) In this example, the full range of X and Y values shows a strong, positive correlation, but the restricted range of scores produces a correlation near zero. (b) An example in which the full range of X and Y values shows a correlation near zero, but the scores in the restricted range produce a strong, positive correlation.
12
Outliers A demonstration of how one extreme data point (an outlier) can influence the value of a correlation.
13
Hyporthesis testing H0: p = 0 (There is no population correlation)
H1: p 0 (there is a real correlation)
14
CORRELATIONS: For non-linear relations Relationship between practice and performance. Although this relationship is not linear, there is a consistent positive relationship. An increase in performance tends to accompany an increase in practice.
15
Spearman r: Scatterplots showing (a) the scores and (b) the ranks for the data in Example Notice that there is a consistent, positive relationship between the X and Y scores, although it is not a linear relationship. Also notice that the scatterplot of the ranks shows a perfect linear relationship. Steps: 1. rank order 2. use formula of Pearson r, or Special formula
16
Other measures of relationship
Point-biserial - where one variable is dichotomous (has two values; male vs. female, first-born vs. later born, etc.) phi-coefficient - where both variables are (e.g., variable above - birth order (->1st vs. later born)
17
Introduction to regression SAT scores and GPA - regression line drawn through the data points. The regression line defines a precise, one-to-one relationship between each X value (SAT score) and its corresponding Y value (GPA).
18
Relationship between total cost and number of hours playing tennis
Relationship between total cost and number of hours playing tennis. The tennis club charges a $25 membership fee plus $5 per hour. The relationship is described by a linear equation: Total cost = $5 (number of hours) + $25 Y = bX + a. The statistical technique for finding a best-fit line is called regression
19
The distance between the actual data point (Y) and the predicted point on the line (Ŷ) is defined as Y – Ŷ. The goal of regression is to find the equation for the line that minimized these distances.
20
Best-fit straight line
Best-fit straight line. The predicted Y values (Ŷ) are on the regression line. Unless the correlation is perfect (+1.00 or –1.00), there will be some error between the actual Y values and the predicted Y values. The larger the correlation is, the less the error will be.
21
Scatterplot showing data points that perfectly fit the regression equation Ŷ = 1.6X – 2. Note that the correlation is r = (b) Scatterplot for the data from Example Notice that there is error between the actual data points and the predicted Y values of the regression line. -total squared error = ∑(Y-Ŷ)2 ->least squared solution
22
Regression (contd.) SP = (X-Mx) (Y-MY) SSx= (X-Mx)2 Example
The regression equation for Y is the linear equation: Goal is to find best a and b for best-fit line Ŷ = bX + a, where: b = SP/SSx, and a = MY-bMx SP = (X-Mx) (Y-MY) SSx= (X-Mx)2 Example X = 1, 3, Y=4, 9, 8 (from text p. 559) What are the predicted values for 5, 7, 9? SPSS
23
A set of 9 data points (X and Y values) with a correlation of r = The colored lines in part (a) show deviations from the mean for Y. For these data, SSY = 240 (total variability). In part (b) the colored lines show deviations from the regression line. For these data, SSerror = 86.4 The regression line reduces SS value by r2 = 0.64 or 64%. Error= 1 - r2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.