Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.

Review

. . . . .

Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)

Computational formula

Correlation

Hypothesis testing of r
Is there a significant relationship between X and Y (or are they independent)? Are two independent correlations significantly different than each other?

Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)

. . . . .

Regression Equation Y = a + bX Where:
Y = value predicted from a particular X value a = point at which the regression line intersects the Y axis b = slope of the regression line X = X value for which you wish to predict a Y value

Regression

How to draw the regression line
. . . . .

Hypothesis Testing Have learned
How to calculate r as an estimate of relationship between two variables How to calculate b as a measure of the rate of change of Y as a function of X Next determine if these values are significantly different than 0

Testing b The significance test for r and b are equivalent
If X and Y are related (r), then it must be true that Y varies with X (b). Important to learn b significance tests for multiple regression

Calculate t-observed b = Slope Sb = Standard error of slope

Multiple Regression Good news! No Math Bad news!
Too complicated to do by hand Bad news! Almost all conceptual

Causal Models X (IV) the cause of Y (DV)

Causal Models X (IV) is the cause of Y (DV)
This is an assumption – causation is not demonstrated with statistics! X Y

Remember Candy Depression Charlie 5 55 Augustus 7 43 Veruca 4 59 Mike
108 Violet 65

Remember Y = 127 + -13.26(X) COV = -30.5 N = 5 r = -.81 Sx = 1.52
Sy = 24.82

Causal Models -13.26 Candy Depression

Example Data collected from 15 people Salary Years since Ph.D.
Publications

Example Predict the salary of a person from the time since their Ph.D. (in years)

Example Predict the salary of a person from the time since their Ph.D. (in years) Y = 51, (X) What do these mean? $51,670 a person tends to earn after graduating (Years = 0) Each year after that a person’s salary increase $1,218 a year

Causal Models 1,218 Years since Ph.D. Salary

Example Predict the salary of a person from the number of publications they have

Causal Models 334 Publications Salary

What if we have two IVs? It is possible to use two IVs at the same time to predict a DV Use both publications and years since Ph.D. to predict salary

Causal Models Publications 334 Salary 1,218 Years since Ph.D.

Causal Models Publications 334 Salary 1,218 Years since Ph.D.
How to interpret values if IVs are independent

Causal Models Publications 334 Salary 1,218 Years since Ph.D.
Problem: Information provided by publications and Years is probably somewhat redundant

Causal Models Publications Salary r = .66 Years since Ph.D.

Causal Models Publications Salary r = .66 Years since Ph.D.
Must estimate these regression coefficients so this relationship is taken into account (called “partial regression coefficients”)

Regression Coefficients
Basic logic is exactly the same as normal regression Least squares Has one intercept and each of the IVs has one slope

bo the intercept b1 the slope of the first IV b2 the slope of the second IV bp the slope of p IV Y = bo + b1 (X1) + b2 (X2) bp (Xp)

Example Predict the salary of a person from the number of publications they have and the years since they got their Ph.D.

Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications

Causal Models Publications 122 Salary r = .66 977 Years since Ph.D.

Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?

Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?

Question Current Problem Y = Salary
X1 = Years since Ph.D.; X2 = Publications Which IV has a greater “effect” of salary?

Standardized Regression Coefficients
Conceptually the same as standardizing all variables and then doing regression analysis Why does this work? Example with Years predicting Salary

With a single predictor – Unstandardized 1,218 Years since Ph.D. Salary With a single predictor -- Standardized .71 Years since Ph.D. Salary

With single IV Correlation between years and salary (r = .71) is the SAME as the standardized regression weight!

β1 = Standardized Regression of first IV β2 = Standardized Regression of second IV βp = Standardized Regression of p IV β0 = Intercept (always = 0)

Remember Publications 122 Salary r = .66 977 Years since Ph.D.

Standardized Publications .21 Salary r = .66 .57 Years since Ph.D.

Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) Which IV has a greater “effect” of salary? Can interpret in SD units

Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) What would you predict the salary to be if a person’s Years = 1.2 and a persons publications = -.50? Interpret what these values mean!

Testing the full model How well does the model predict?
The fit test for the full model and its significance are equal for both standardized and unstandardized models

Person Z1 Z2 ZY 1 -1.26 .35 -.83 2 -.53 -.24 3 -.63 -1.40 -.84 4 .63 1.23 1.56 5 1.26 .36

Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862

Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63
-.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902

Multiple R

Testing for Significance
Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

Multiple R Commonly used as R2 Can be tested for significance
Pros and Cons Can be tested for significance Does the set of variables (taken together) predict Y at better than chance levels? H1 : R* > 0 Ho : R* = 0

Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63
-.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902

Significance testing for Multiple R
p = number of predictors N = total number of observations

Fcrit Page # 737 Need two df Numerator df = p Denominator df = N – p - 1

Fcrit Need two df Numerator df = p Denominator df = N – p - 1 F (2, 2) = 19.00

Multiple R If F > Fcrit reject Ho and accept H1
If F < or = Fcrit fail to reject Ho Current problem – fail to reject Ho These two variables do not predict the outcome

Practice The teaching salary example Based on 15 people Two IVs

p = number of predictors N = total number of observations

F crit (2,12) = 3.89

Multiple R If F > Fcrit reject Ho and accept H1
If F < or = Fcrit fail to reject Ho Current problem – accept H1 These two variables do predict the outcome

Detour Moving back to issues of correlation This will help with . . .

Testing for Significance
Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

How strong is the relationship between publications and salary if we partial out the effect of years? What this is saying

Salary

Salary Publications

r2 SP = .35 Salary Publications

r2 SP = .35 Salary Publications
r2 is a ratio = Variance explained / Total Variance Total variance of Salary = 1 (standardized)

r2 SP = .35 .65 Salary Publications

e Salary a b c Publications Years

? e Salary a b c Publications Years

? e Salary a b c Publications Years How strong is the relationship between publications and salary if we partial out the effect of years?

Semipartial correlation of publications and salary
Years

Semipartial correlation of publications and salary
Years Multiple R2 = a + c + b

Multiple R

R2 = .53 or a + b + c e Salary a b c Publications Years

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c e Salary
Publications Years

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c
So what is just “a”? e Salary a b c Publications Years

So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c)

So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c) or R2 – r2sy

So what is just “a”? e Salary a b c Publications Years a = R2 – r2sy = = .03 Thus semipartial correlation = .17

So what is just “a”? e Salary a b c Publications Years What is the correlation between years and salary controlling for publications?

Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.

Similar presentations

Presentation on theme: "Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.

Similar presentations

Presentation on theme: "Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster."— Presentation transcript:

Similar presentations

About project

Feedback