Simple Linear Regression

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Objectives 10.1 Simple linear regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
The Simple Regression Model
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Introduction to Probability and Statistics Linear Regression and Correlation.
Simple Linear Regression Analysis
Relationships Among Variables
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
SIMPLE LINEAR REGRESSION
Correlation and Regression
Chapter 15 Correlation and Regression
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Introduction to Linear Regression
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Elementary Statistics Correlation and Regression.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Reasoning in Psychology Using Statistics
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Environmental Modeling Basic Testing Methods - Statistics III.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 13 Understanding research results: statistical inference.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Lecture 11: Simple Linear Regression
GS/PPAL Section N Research Methods and Information Systems
Chapter 14 Introduction to Multiple Regression
Regression and Correlation
Dependent-Samples t-Test
Review Guess the correlation
Correlation and Simple Linear Regression
Regression 11/6.
Regression 10/29.
Correlation and regression
Chapter 11 Simple Regression
Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.
Understanding Standards Event Higher Statistics Award
Relationship with one independent variable
Simple Linear Regression
Correlation and Simple Linear Regression
Reasoning in Psychology Using Statistics
Correlation and Regression
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
Inferential Statistics and Probability a Holistic Approach
Introduction to Regression
Correlation and Regression
Reasoning in Psychology Using Statistics
Relationship with one independent variable
Correlation and Regression
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Regression & Prediction
Reasoning in Psychology Using Statistics
Introduction to Regression
Chapter Thirteen McGraw-Hill/Irwin
Introduction to Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple Linear Regression

Review Correlation: statistical technique used to describe the relationship between two interval or ratio variables Defined by r Sign tells us direction of relationship (+ vs. -) Value tells us consistency of relationship (-1 to +1) But what if we want to do more?

Regression = Prediction Let’s say Amherst declares war on Northampton because Northampton tries to lure Judie's into moving out of Amherst. We are all pacifists here in the Happy Valley, so we decide to settle our differences with a rousing game of Trivial Pursuit! You are elected the Captain of Amherst’s team (as if you would be selected instead of me). How are you going to choose the team?   Multiple criteria: Knowledge Performance under pressure Speed

Life Satisfaction Correlation: Do any of the following factors predict life satisfaction? Money Grades Relationship status Health Regression: If I know about your ________, (finances, GPA, relationships, health), I can predict your life satisfaction.

History - WWII We needed to place a ton of people into positions in which they would excel?

Relational Aggression (x) Regression: statistical technique used to define the best fit equation that predicts the DV from one of more IV’s Person Relational Aggression (x) Popularity (y) A 5 4 B C 3 D E F 2 G H I J K L 1 M N Best fit line: identifies the central tendency of the relationship between the IV and the DV

The Regression Line Basics of linear equations Ŷ = 0 + 1X …..(or perhaps you know it as y = mx + b) Ŷ = predicted dependent variable (e.g., popularity) X = independent (predictor) variable (e.g., relational aggression) 1 = slope 0 = y-intercept

Interpretation of y-intercept and slope Predicted value of Y when X is zero Intercept only makes sense if X value can go down to zero. e.g. heights (x) predicting weights (y) Slope (1) Change in y for a one unit change in x. + implies direct relationship – implies inverse relationship

College GPA' = (0.675)(High School GPA) + 1.097 Ŷ = 0.675 X + 1.097 What would be the predicted GPA of someone with a 3.0 high school GPA? College GPA' = (0.675)(3.0) + 1.097 = 3.122.

College GPA' = (0.675)(High School GPA) + 1.097 Ŷ = 0.675 X + 1.097 What would be the predicted GPA of someone with a 2.0 high school GPA? College GPA' = (0.675)(2.0) + 1.097 = 2.447.

College GPA' = (0.675)(High School GPA) + 1.097 X X College GPA' = (0.675)(High School GPA) + 1.097 Ŷ = 0.675 X + 1.097 College GPA' = (0.675)(3.0) + 1.097 = 3.122. College GPA' = (0.675)(2.0) + 1.097 = 2.447 3.122 – 2.447=0.675 (slope: change in Y for each 1 pt change in X) X X

How do we calculate a regression equation? Least-Squares method Sum of the vertical distance between each point and the line = 0. Square of the vertical distance is as small as possible. Unbiased Minimum variability

Calculating the slope and y-intercept Slope (1) = SP / SSx Intercept (0)= My – (1* Mx)

Leaving nothing to chance You’ve worked hard all semester long, but want to make sure you get the best possible grade. Would bribery help? Maybe. Let’s run a regression model to see if there is a relationship between monetary contributions to Jake and Abby’s College Fund and ‘Bonus Points for Special Contributions to Class’*. * Note: this is a humorous (?) example, not an actual invitation to bribery.

Defining the regression line Subject X Y x2 xy Peeta 4 1 16 Rue 8 9 64 72 Katniss 2 5 10 Thresh 6 36 30 x = 20 y = 20 (x2) = 120 (xy) = 116 SSx = (x2) – [(x)2 / n] = 120 – [(20)2 / 4] = 120 – (400 / 4) = 120 – 100 = 20   SP = (xy) – [(x)y) / n] = 116 – [(20)(20) / 4] = 116 – (400 /4) = 116 – 100 = 16

Defining the regression line Subject X Y x2 Xy Peeta 4 1 16 Rue 8 9 64 72 Katniss 2 5 10 Thresh 6 36 30 x = 20 y = 20 (x2) = 120 (xy) = 116 SSx= 20  SP= 16 1 = SP / SSx = 16 / 20 = 0.8   0 = My – (b* Mx) = 5 – (.8)(5) = 1.0

Rue Katniss Thresh Peeta Ŷ = 0.8x+1

Least Squares Method Sum of the vertical distance between each point and the line = 0. Square of the vertical distance is as small as possible.

Measuring Distance from the Regression Line (Error) Ŷ = 0.8x+1 Rue Katniss Thresh Peeta

Measuring Error Ŷ = 0.8x+1 x y Ŷ = 0.8x+1 Distance (Y-Ŷ) Rue Katniss Thresh Peeta x y Ŷ = 0.8x+1 Distance (Y-Ŷ) Squared-Distance (Y-Ŷ) 2 P 4 1 4.2 -3.2 10.24 R 8 9 7.4 1.6 2.56 K 2 5 2.6 2.4 5.76 T 6 5.8 -0.8 0.64   ∑ = 0 ∑ (Y-Ŷ) 2 =19.20

Standard error of the estimate STANDARD ERROR OF THE ESTIMATE: measure of standard distance between predicted Y values and actual Y values

x y Ŷ = 0.8x+1 Distance (Y-Ŷ) Squared-Distance (Y-Ŷ) 2 4 1 4.2 1 – 4.2 = -3.2 10.24 8 9 7.4 9 – 7.4 = 1.6 2.56 2 5 2.6 5 – 2.6 = 2.4 5.76 6 5.8 5 – 5.8 = -0.8 0.64   ∑ =0 ∑ =19.20 Standard Error of the Estimate “On Average, predicted scores differed from actual scores by 3.1 points.”

Standard Error and Correlation STE = 0 Small STE Large STE

Let it snow (3x) I’m not a giant fan of winter, but I do kind of like sledding. I’m trying to convince a group of y’all to go sledding with me, but no one is interested. So, I buy a bunch of beer to try to convince you that sledding down Memorial Hill is a good idea.* I let you guys consume what you consider to be an appropriate amount of beer for a Tuesday night and then ask you to rate your interest in sledding on a 10-point scale.

Find the regression equation x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 3 6 9 36 18 7 49 42 4 16 12 2 8 x = y = (x2) = (y2) = (xy) = Find the regression equation What is the predicted interest in sledding for someone who has consumed 2 drinks? Calculate the standard error of the estimate

0 = My – (1* Mx) SSx = (x2) – [(x)2 / n] (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 3 6 9 36 18 7 49 42 4 16 12 2 8 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 SSx = (x2) – [(x)2 / n] SP = (xy) – [(x)y)] / n 1 = SP / SSx  0 = My – (1* Mx)

Using regression equations for prediction Ŷ = 1.0(X) + 2 What do we expect Y to be if someone has 4 drinks Ŷ = 1.0(X) + 2 = 1.0(4)+2 = 6 BUT remember… The predicted value is not perfect (unless the correlation between x and y = -1 or +1) The equation cannot be used to make predictions about x values outside of the range of observed x values (e.g. Y when x is 20) DO NOT confuse prediction with causation!

SSY SSresidual SSregression Preview: this will be important for hypothesis testing in regression Total Variability in Y SSregression SSresidual Variability in Y explained by X Variability in Y NOT explained by X

Partitioning Variability in Y Squared correlation (r2) tells us the proportion of variability in Y explained by X e.g. r2 = .65 means X explains 65% of the variability in Y SSregression = (r2)SSY = (.65)SSY

Partitioning Variability in Y Unpredicted variability will therefore = 1- r2 (variability not explained by x) e.g. if r2 = .65 then 35% of the variability in Y is not explained by x (1-.65=.35) SSresidual = (1-r2)SSY = (.35)SSY Alternative calculation of SSresidual = ∑(Y-Ŷ)2

r2 = .582 = .33 SSx = 10 SP = 10 SSy=30 x (#drinks) y (sledding) x2 y2 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 SSx = 10 SP = 10 SSy=30 r2 = .582 = .33

SSregression = (r2)SSY = .33 (30) = 9.9 x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 SSx = 10 SP = 10 SSy=30 r2 = .582 = .33 SSregression = (r2)SSY = .33 (30) = 9.9 SSresidual = (1-r2)SSY = 1-.33 (30) = .67 (30) =20.1

Hypothesis Testing for Regression If there is no real association between x and y in the population we expect the slope to equal zero If the slope for the regression line from our sample does not equal zero we must decide between two alternatives: The nonzero slope is due to random chance error The nonzero slope suggests a systematic real relationship between x and y that exists in the population

Null and Alternative for Regression Ho: the equation does not account for a significant proportion of the variance in the Y scores Ho: b = 0 Ha: the equation does account for a significant proportion of the variance in Y scores. Ha: b  0 Process of testing the regression line Called analysis of regression Very similar to ANOVA F ratio used to determine if variance predicted is significantly greater than would be expected by chance

Calculating the F ratio F = MSregression = = 1 (for simple regression) MSresidual = = n-2 SSY SSresidual SSregression

Steps for Hypothesis Testing in Simple Regression Step 1: Specify the null and alternative hypotheses. Ho: 1 = 0 Ha: 1  0 Step 2: Designate the rejection region by selecting . Step 3: Obtain an F critical value df (1, n-2) Step 4: collect your data Step 5: Use your sample data to calculate the a) regression equation b) r2 c) SS Step 6: Calculate the F ratio Step 7: Compare Fobs with Fcrit Step 8: Interpret results

Full Sledding Example x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 1: Specify the null and alternative hypotheses. Ho: 1 = 0 Ha: 1  0 Step 2: Designate the rejection region by selecting  = .05 Step 3: Obtain an F critical value df (1, n-2) = (1, 3) Fcrit = 10.13

Full Sledding Example Step 5: (a) calculate regression equation (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Already Done! Step 5: (a) calculate regression equation Ŷ = 1.0 X + 2

Full Sledding Example r2 = .582 = .33 Step 5: (b) Calculate r2 x (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 5: (b) Calculate r2 Already Done! r2 = .582 = .33

Full Sledding Example Step 5: (c) Calculate SS (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 5: (c) Calculate SS SSregression = (r2)SSY = 9.9 SSresidual = (1-r2)SSY = 20.1 Already Done!

Full Sledding Example Step 6: Calculate F (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 6: Calculate F SSregression =9.9 MSregression = SSregression dfregression MSregression = 9.9 1 = 9.9 SSresidual = 20.1 MSresidual = SSresidual dfresidual MSresidual = 20.1 5−2 =6.7

Full Sledding Example F = MSregression MSresidual = 9.9 6.7 =1.48 (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 6: Calculate F MSregression =9.9 MSresidual =6.7 F = MSregression MSresidual = 9.9 6.7 =1.48

Full Sledding Example Step 7: Compare F obs to F crit Fcrit = 10.13 (#drinks) y (sledding) x2 y2 xy (Y- 5 10 25 100 50 7 3 6 9 36 18 1 49 42 8 -1 4 16 12 -3 2 x = 20 y = 30 (x2) = 90 (y2) = 210 (xy) = 130 (Y- = 20 Step 7: Compare F obs to F crit Fcrit = 10.13 Fobs = 1.48 Decision: fail to reject the null Step 8: Interpretation: number of drinks does not account for a significant portion of the variance in attractiveness ratings

Summary Table   SS df MS F Regression 9.9 1 1.48 Residual 20.1 3 6.7 Total 30.0 4 Number of drinks did not explain a significant proportion of the variance in willingness to go sledding with me, F(1,3)=1.48, p > .05, R2=.33. Money down the toilet!!

Regression in RC Does weekend bedtime predict hours or TV viewing? Predictor (IV): Week end bed time Response (DV): hours of TV Step 1: Specify the null and alternative hypotheses. Ho: 1 = 0 Ha: 1  0 Step 2: Designate the rejection region by selecting  = .05 Step 3: Will judge significance using p-value method!

Regression in RC: WE bed & TV Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.0234 2.1189 -2.371 0.02487 * WeekEnd 0.4291 0.1523 2.818 0.00877 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 Residual standard error: 0.7889 on 28 degrees of freedom Multiple R-squared: 0.2209, Adjusted R-squared: 0.1931 F-statistic: 7.94 on 1 and 28 DF, p-value: 0.00877

Regression in RC: WE bed & TV Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.0234 2.1189 -2.371 0.02487 * WeekEnd 0.4291 0.1523 2.818 0.00877 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 Residual standard error: 0.7889 on 28 degrees of freedom Multiple R-squared: 0.2209, Adjusted R-squared: 0.1931 F-statistic: 7.94 on 1 and 28 DF, p-value: 0.00877

Writing up results of regression Weekend bedtime was a significant predictor of television viewing, F (1, 28) = 7.94, p = .01, r2 = .2209. The r2 indicates that weekend bedtime explains 22.09% of the variance in TV viewing. The slope of the regression equation was not significant (β1 = 0.4291, p = .01) and indicated that people who went to sleep later tended to watch more TV (shame!). The regression equation relating weekend bedtime and TV viewing was: TV = -5.0234 + .4291(WE).

Standardized Slopes Standardized regression coefficient (β “beta”): standardized slope representing change of x and y in standard deviation units. Example: Ŷ = βx + a Ŷ = .5x + 10 Interpretation of β: for each one SD increase in X there will be a corresponding .5 SD increase in Y Why would we want a standardized coefficient? Gives us a measure of effect size If we have more than one predictor we can compare relative effects on Y

How do we calculate Beta? By Hand: Transform Y scores into z scores Transform X scores into z scores Compute slope using transformed scores (then change will be in standard deviation units) (I won’t make you do this by hand!)

Real data examples

-What is R2? -What is the standardized slope? -What does the standardized slope suggest about change in X vs. Y?

-Is self-esteem more strongly related to positivity or to negativity? -Calculate and interpret R2 for the association between self-esteem and positivity