Download presentation
Presentation is loading. Please wait.
Published byHugh Smith Modified over 8 years ago
1
Linear Regression:
2
The relationship between two variables (e.g. height and weight; age and IQ) can be described graphically with a scatterplot : shortmediumlong y-axis: age (years) old medium young An individual's performance (each person supplies two scores, age and RT) x-axis: reaction time (msec)
3
Often in psychology, we are interested in seeing whether or not a linear relationship exists between two variables. Here, there is a strong positive relationship between RT and age:
4
Here is an equally strong but negative relationship between RT and age:
5
Here, there is no relationship between RT and age:
6
If we find a reasonably strong linear relationship between two variables, we might want to fit a straight line to the scatterplot. There are two reasons for wanting to do this: (a) for description: the line acts as a succinct description of the "idealised" relationship between our two variables, a relationship which we assume the real data reflect somewhat imperfectly. (b) for prediction: we could use the line to obtain estimates of values for one of the variables, on the basis of knowledge of the value of the other variable (e.g. if we knew a person's height, we could predict their weight).
7
Linear Regression is an objective method of fitting a line to our scatterplot - better than trying to do it by eye! Which line is the best fit to the data?
8
The recipe for drawing a straight line: To draw a line, we need two values: (a) the intercept - the point at which the line intercepts the vertical axis of the graph; (b) the slope of the line. same intercept, different slopes:different intercepts, same slope:
9
The formula for a straight line: Y = a + b * X Y is a value on the vertical (Y) axis; a is the intercept (the point at which the line intersects the vertical axis of the graph); b is the slope of the line; X is any value on the horizontal (X) axis.
10
Linear regression step-by-step: 10 individuals do two tests: a stress test, and a statistics test. What is the relationship between stress and statistics performance? subject: stress (X) test score (Y) A1884 B3167 C2563 D2989 E2193 F3263 G4055 H3670 I3553 J2777
11
Draw a scatterplot to see what the data look like:
12
There is a negative relationship between stress scores and statistics scores: people who scored high on the statistics test tend to have low stress levels, and people who scored low on the statistics test tend to have high stress levels.
13
Calculating the regression line: We need to find "a" (the intercept) and "b" (the slope) of the line. Work out "b" first, and "a" second.
14
To calculate “b”, the slope of the line:
15
stress test subject: X X 2 YXY A1818 2 = 324 84 18 * 84 = 1512 B3131 2 = 961 67 31 * 67 = 2077 C2525 2 = 625 63 25 * 63 = 1575 D2929 2 = 841 89 29 * 89 = 2581 E2121 2 = 441 93 21 * 93 = 1953 F3232 2 = 1024 63 32 * 63 = 2016 G4040 2 = 1600 55 40 * 55 = 2200 H3636 2 = 1296 70 36 * 70 = 2520 I3535 2 = 1225 53 35 * 53 = 1855 J2727 2 = 729 77 27 * 77 = 2079 X = X 2 = Y = XY = 294 9066 714 20368
16
We also need: N = the number of pairs of scores, = 10 in this case. ( X) 2 = "the sum of X squared" = 294 * 294 = 86436. NB: ( X) 2 means "square the sum of X"; add together all of the X values to get a total, and then square this total. X 2 means "sum the squared X values"; square each X value, and then add together these squared X values to get a total.
17
Working through the formula for b: 476.1 40.422 60.623
18
b = -1.476. b is negative, because the regression line slopes downwards from left to right: as stress scores (X) increase, statistics scores (Y) decrease.
19
Now work out a: Y is the mean of the Y scores: = 71.4. X is the mean of the X scores: = 29.4. b = -1.476 Therefore a = 71.4 - (-1.476 * 29.4) = 114.80.
20
The complete regression equation: Y' = 114.80 + ( -1.476 * X) To draw the line, input any three different values for X, in order to get associated values for Y'. For X = 10, Y' = 114.80 + (-1.476 * 10) = 100.04. For X = 30, Y' = 114.80 + (-1.476 * 30) = 70.52. For X = 50, Y' = 114.80 + (-1.476 * 50) = 41.00.
21
Regression line for predicting test scores (Y) from stress scores (X): stress score (X) test score (Y) 0 20 40 60 80 100 120 01020304050 Plot: X = 10, Y' = 100.04 X = 30, Y' = 70.52 X = 50, Y' = 41.00 intercept = 114.80
22
This is the regression line for predicting test score on the basis of knowledge of a person's stress score; this is the "regression of Y on X". To predict stress score on the basis of knowledge of test score (the "regression of X on Y"), we can't use this regression line! To predict Y from X requires a line that minimises the deviations of the predicted Y's from actual Y's. To predict X from Y requires a line that minimises the deviations of the predicted X's from actual X's - a different task! Solution: to calculate regression of X on Y, swap the column labels (so that the "X" values are now the "Y" values, and vice versa); and re-do the calculations.
24
Regression lines for predicting stress score from test score, and vice versa: Y' = 114.80 + (-1.476 * X)Y' = 55.04 + (-0.359 * X) (The previous graph redrawn, so that in both cases the predicted variable is on the vertical axis of the graph)
25
Linear Regression using SPSS: Analyze... > Regression... > Curve Estimation
26
b, the slope a, the intercept R 2 : how much variation in test score is accounted for by its relationship with stress? ANOVA: is our regression any better at predicting test score than simply using the mean test score?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.