Simple Linear Regression

Simple Linear Regression

Review Correlation: statistical technique used to describe the relationship between two interval or ratio variables Defined by r Sign tells us direction of relationship (+ vs. -) Value tells us consistency of relationship (-1 to +1) But what if we want to do more?

Regression = Prediction
Let’s say Amherst declares war on Northampton because Northampton tries to lure Judie's into moving out of Amherst. We are all pacifists here in the Happy Valley, so we decide to settle our differences with a rousing game of Trivial Pursuit! You are elected the Captain of Amherst’s team (as if you would be selected instead of me). How are you going to choose the team? Multiple criteria: Knowledge Performance under pressure Speed

Life Satisfaction Correlation: Do any of the following factors predict life satisfaction? Money Grades Relationship status Health Regression: If I know about your ________, (finances, GPA, relationships, health), I can predict your life satisfaction.

History - WWII We needed to place a ton of people into positions in which they would excel?

Relational Aggression (x)
Regression: statistical technique used to define the best fit equation that predicts the DV from one of more IV’s Person Relational Aggression (x) Popularity (y) A 5 4 B C 3 D E F 2 G H I J K L 1 M N Best fit line: identifies the central tendency of the relationship between the IV and the DV

The Regression Line Basics of linear equations Ŷ = 0 + 1X
…..(or perhaps you know it as y = mx + b) Ŷ = predicted dependent variable (e.g., popularity) X = independent (predictor) variable (e.g., relational aggression) 1 = slope 0 = y-intercept

Interpretation of y-intercept and slope
Predicted value of Y when X is zero Intercept only makes sense if X value can go down to zero. e.g. heights (x) predicting weights (y) Slope (1) Change in y for a one unit change in x. + implies direct relationship – implies inverse relationship

College GPA' = (0.675)(High School GPA) + 1.097
Ŷ = X What would be the predicted GPA of someone with a 3.0 high school GPA? College GPA' = (0.675)(3.0) =

Ŷ = X What would be the predicted GPA of someone with a 2.0 high school GPA? College GPA' = (0.675)(2.0) =

X X College GPA' = (0.675)(High School GPA) Ŷ = X College GPA' = (0.675)(3.0) = College GPA' = (0.675)(2.0) = 2.447 3.122 – 2.447=0.675 (slope: change in Y for each 1 pt change in X) X X

How do we calculate a regression equation?
Least-Squares method Sum of the vertical distance between each point and the line = 0. Square of the vertical distance is as small as possible. Unbiased Minimum variability

Calculating the slope and y-intercept
Slope (1) = SP / SSx Intercept (0)= My – (1* Mx)

Leaving nothing to chance
You’ve worked hard all semester long, but want to make sure you get the best possible grade. Would bribery help? Maybe. Let’s run a regression model to see if there is a relationship between monetary contributions to Jake and Abby’s College Fund and ‘Bonus Points for Special Contributions to Class’*. * Note: this is a humorous (?) example, not an actual invitation to bribery.

Defining the regression line
Subject X Y x2 xy Peeta 4 1 16 Rue 8 9 64 72 Katniss 2 5 10 Thresh 6 36 30 x = 20 y = 20 (x2) = 120 (xy) = 116 SSx = (x2) – [(x)2 / n] = 120 – [(20)2 / 4] = 120 – (400 / 4) = 120 – = 20 SP = (xy) – [(x)y) / n] = 116 – [(20)(20) / 4] = 116 – (400 /4) = 116 – = 16

Defining the regression line
Subject X Y x2 Xy Peeta 4 1 16 Rue 8 9 64 72 Katniss 2 5 10 Thresh 6 36 30 x = 20 y = 20 (x2) = 120 (xy) = 116 SSx= 20 SP= 16 1 = SP / SSx = 16 / 20 = 0.8 0 = My – (b* Mx) = 5 – (.8)(5) = 1.0

Rue Katniss Thresh Peeta Ŷ = 0.8x+1

Least Squares Method Sum of the vertical distance between each point and the line = 0. Square of the vertical distance is as small as possible.

Measuring Distance from the Regression Line (Error)
Ŷ = 0.8x+1 Rue Katniss Thresh Peeta

Measuring Error Ŷ = 0.8x+1 x y Ŷ = 0.8x+1 Distance (Y-Ŷ)
Rue Katniss Thresh Peeta x y Ŷ = 0.8x+1 Distance (Y-Ŷ) Squared-Distance (Y-Ŷ) 2 P 4 1 4.2 -3.2 10.24 R 8 9 7.4 1.6 2.56 K 2 5 2.6 2.4 5.76 T 6 5.8 -0.8 0.64 ∑ = 0 ∑ (Y-Ŷ) 2 =19.20

Standard error of the estimate
STANDARD ERROR OF THE ESTIMATE: measure of standard distance between predicted Y values and actual Y values

x y Ŷ = 0.8x+1 Distance (Y-Ŷ) Squared-Distance (Y-Ŷ) 2 4 1 4.2 1 – 4.2 = -3.2 10.24 8 9 7.4 9 – 7.4 = 1.6 2.56 2 5 2.6 5 – 2.6 = 2.4 5.76 6 5.8 5 – 5.8 = -0.8 0.64 ∑ =0 ∑ =19.20 Standard Error of the Estimate “On Average, predicted scores differed from actual scores by 3.1 points.”

Standard Error and Correlation
STE = 0 Small STE Large STE

Let it snow (3x) I’m not a giant fan of winter, but I do kind of like sledding. I’m trying to convince a group of y’all to go sledding with me, but no one is interested. So, I buy a bunch of beer to try to convince you that sledding down Memorial Hill is a good idea.* I let you guys consume what you consider to be an appropriate amount of beer for a Tuesday night and then ask you to rate your interest in sledding on a 10-point scale.

Find the regression equation
x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 3 6 9 36 18 7 49 42 4 16 12 2 8 x = y = (x2) = (y2) = (xy) = Find the regression equation What is the predicted interest in sledding for someone who has consumed 2 drinks? Calculate the standard error of the estimate

0 = My – (1* Mx) SSx = (x2) – [(x)2 / n]
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 3 6 9 36 18 7 49 42 4 16 12 2 8 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 SSx = (x2) – [(x)2 / n] SP = (xy) – [(x)y)] / n 1 = SP / SSx 0 = My – (1* Mx)

Using regression equations for prediction
Ŷ = 1.0(X) + 2 What do we expect Y to be if someone has 4 drinks Ŷ = 1.0(X) + 2 = 1.0(4)+2 = 6 BUT remember… The predicted value is not perfect (unless the correlation between x and y = -1 or +1) The equation cannot be used to make predictions about x values outside of the range of observed x values (e.g. Y when x is 20) DO NOT confuse prediction with causation!

SSY SSresidual SSregression
Preview: this will be important for hypothesis testing in regression Total Variability in Y SSregression SSresidual Variability in Y explained by X Variability in Y NOT explained by X

Partitioning Variability in Y
Squared correlation (r2) tells us the proportion of variability in Y explained by X e.g. r2 = .65 means X explains 65% of the variability in Y SSregression = (r2)SSY = (.65)SSY

Partitioning Variability in Y
Unpredicted variability will therefore = 1- r2 (variability not explained by x) e.g. if r2 = .65 then 35% of the variability in Y is not explained by x (1-.65=.35) SSresidual = (1-r2)SSY = (.35)SSY Alternative calculation of SSresidual = ∑(Y-Ŷ)2

r2 = .582 = .33 SSx = 10 SP = 10 SSy=30 x (#drinks) y (sledding) x2 y2
25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 SSx = 10 SP = 10 SSy=30 r2 = = .33

SSregression = (r2)SSY = .33 (30) = 9.9
x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 SSx = 10 SP = 10 SSy=30 r2 = = .33 SSregression = (r2)SSY = .33 (30) = 9.9 SSresidual = (1-r2)SSY = (30) = .67 (30) =20.1

Hypothesis Testing for Regression
If there is no real association between x and y in the population we expect the slope to equal zero If the slope for the regression line from our sample does not equal zero we must decide between two alternatives: The nonzero slope is due to random chance error The nonzero slope suggests a systematic real relationship between x and y that exists in the population

Null and Alternative for Regression
Ho: the equation does not account for a significant proportion of the variance in the Y scores Ho: b = 0 Ha: the equation does account for a significant proportion of the variance in Y scores. Ha: b  0 Process of testing the regression line Called analysis of regression Very similar to ANOVA F ratio used to determine if variance predicted is significantly greater than would be expected by chance

Calculating the F ratio
F = MSregression = = 1 (for simple regression) MSresidual = = n-2 SSY SSresidual SSregression

Steps for Hypothesis Testing in Simple Regression
Step 1: Specify the null and alternative hypotheses. Ho: 1 = 0 Ha: 1  0 Step 2: Designate the rejection region by selecting . Step 3: Obtain an F critical value df (1, n-2) Step 4: collect your data Step 5: Use your sample data to calculate the a) regression equation b) r2 c) SS Step 6: Calculate the F ratio Step 7: Compare Fobs with Fcrit Step 8: Interpret results

Full Sledding Example x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 1: Specify the null and alternative hypotheses. Ho: 1 = 0 Ha: 1  0 Step 2: Designate the rejection region by selecting  = .05 Step 3: Obtain an F critical value df (1, n-2) = (1, 3) Fcrit = 10.13

Full Sledding Example Step 5: (a) calculate regression equation
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Already Done! Step 5: (a) calculate regression equation Ŷ = 1.0 X + 2

Full Sledding Example r2 = .582 = .33 Step 5: (b) Calculate r2 x
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 5: (b) Calculate r2 Already Done! r2 = = .33

Full Sledding Example Step 5: (c) Calculate SS
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 5: (c) Calculate SS SSregression = (r2)SSY = 9.9 SSresidual = (1-r2)SSY = 20.1 Already Done!

Full Sledding Example Step 6: Calculate F
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 6: Calculate F SSregression =9.9 MSregression = SSregression dfregression MSregression = = 9.9 SSresidual = MSresidual = SSresidual dfresidual MSresidual = −2 =6.7

Full Sledding Example F = MSregression MSresidual = 9.9 6.7 =1.48
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 6: Calculate F MSregression =9.9 MSresidual =6.7 F = MSregression MSresidual = =1.48

Full Sledding Example Step 7: Compare F obs to F crit Fcrit = 10.13
(#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 7: Compare F obs to F crit Fcrit = 10.13 Fobs = 1.48 Decision: fail to reject the null Step 8: Interpretation: number of drinks does not account for a significant portion of the variance in attractiveness ratings

Summary Table SS df MS F Regression 9.9 1 1.48 Residual 20.1 3 6.7 Total 30.0 4 Number of drinks did not explain a significant proportion of the variance in willingness to go sledding with me, F(1,3)=1.48, p > .05, R2=.33. Money down the toilet!!

Regression in RC Does weekend bedtime predict hours or TV viewing?
Predictor (IV): Week end bed time Response (DV): hours of TV Step 1: Specify the null and alternative hypotheses. Ho: 1 = 0 Ha: 1  0 Step 2: Designate the rejection region by selecting  = .05 Step 3: Will judge significance using p-value method!

Regression in RC: WE bed & TV
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * WeekEnd ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 7.94 on 1 and 28 DF, p-value:

Writing up results of regression
Weekend bedtime was a significant predictor of television viewing, F (1, 28) = 7.94, p = .01, r2 = The r2 indicates that weekend bedtime explains 22.09% of the variance in TV viewing. The slope of the regression equation was not significant (β1 = , p = .01) and indicated that people who went to sleep later tended to watch more TV (shame!). The regression equation relating weekend bedtime and TV viewing was: TV = (WE).

Standardized Slopes Standardized regression coefficient (β “beta”): standardized slope representing change of x and y in standard deviation units. Example: Ŷ = βx + a Ŷ = .5x + 10 Interpretation of β: for each one SD increase in X there will be a corresponding .5 SD increase in Y Why would we want a standardized coefficient? Gives us a measure of effect size If we have more than one predictor we can compare relative effects on Y

How do we calculate Beta?
By Hand: Transform Y scores into z scores Transform X scores into z scores Compute slope using transformed scores (then change will be in standard deviation units) (I won’t make you do this by hand!)

Real data examples

-What is R2? -What is the standardized slope? -What does the standardized slope suggest about change in X vs. Y?

-Is self-esteem more strongly related to positivity or to negativity?
-Calculate and interpret R2 for the association between self-esteem and positivity

Simple Linear Regression

Similar presentations

Presentation on theme: "Simple Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simple Linear Regression

Similar presentations

Presentation on theme: "Simple Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback