Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.

Similar presentations


Presentation on theme: "Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster."— Presentation transcript:

1

2 Review

3 . . . . .

4 Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)

5 Computational formula

6 Correlation

7 Hypothesis testing of r
Is there a significant relationship between X and Y (or are they independent)? Are two independent correlations significantly different than each other?

8 Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)

9 . . . . .

10 Regression Equation Y = a + bX Where:
Y = value predicted from a particular X value a = point at which the regression line intersects the Y axis b = slope of the regression line X = X value for which you wish to predict a Y value

11 Regression

12 How to draw the regression line
. . . . .

13 Hypothesis Testing Have learned
How to calculate r as an estimate of relationship between two variables How to calculate b as a measure of the rate of change of Y as a function of X Next determine if these values are significantly different than 0

14 Testing b The significance test for r and b are equivalent
If X and Y are related (r), then it must be true that Y varies with X (b). Important to learn b significance tests for multiple regression

15 Calculate t-observed b = Slope Sb = Standard error of slope

16

17 Multiple Regression Good news! No Math Bad news!
Too complicated to do by hand Bad news! Almost all conceptual

18 Causal Models X (IV) the cause of Y (DV)

19 Causal Models X (IV) is the cause of Y (DV)
This is an assumption – causation is not demonstrated with statistics! X Y

20 Remember Candy Depression Charlie 5 55 Augustus 7 43 Veruca 4 59 Mike
108 Violet 65

21 Remember Y = 127 + -13.26(X) COV = -30.5 N = 5 r = -.81 Sx = 1.52
Sy = 24.82

22 Causal Models -13.26 Candy Depression

23 Example Data collected from 15 people Salary Years since Ph.D.
Publications

24 Example Predict the salary of a person from the time since their Ph.D. (in years)

25

26 Example Predict the salary of a person from the time since their Ph.D. (in years) Y = 51, (X) What do these mean? $51,670 a person tends to earn after graduating (Years = 0) Each year after that a person’s salary increase $1,218 a year

27 Causal Models 1,218 Years since Ph.D. Salary

28 Example Predict the salary of a person from the number of publications they have

29

30 Causal Models 334 Publications Salary

31 What if we have two IVs? It is possible to use two IVs at the same time to predict a DV Use both publications and years since Ph.D. to predict salary

32 Causal Models Publications 334 Salary 1,218 Years since Ph.D.

33 Causal Models Publications 334 Salary 1,218 Years since Ph.D.
How to interpret values if IVs are independent

34 Causal Models Publications 334 Salary 1,218 Years since Ph.D.
Problem: Information provided by publications and Years is probably somewhat redundant

35

36 Causal Models Publications Salary r = .66 Years since Ph.D.

37 Causal Models Publications Salary r = .66 Years since Ph.D.
Must estimate these regression coefficients so this relationship is taken into account (called “partial regression coefficients”)

38 Regression Coefficients
Basic logic is exactly the same as normal regression Least squares Has one intercept and each of the IVs has one slope

39 Regression Coefficients
bo the intercept b1 the slope of the first IV b2 the slope of the second IV bp the slope of p IV Y = bo + b1 (X1) + b2 (X2) bp (Xp)

40 Example Predict the salary of a person from the number of publications they have and the years since they got their Ph.D.

41

42 Regression Coefficients
Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications

43 Causal Models Publications 122 Salary r = .66 977 Years since Ph.D.

44 Regression Coefficients
Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?

45 Regression Coefficients
Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?

46 Regression Coefficients
Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?

47 Regression Coefficients
Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?

48 Question Current Problem Y = Salary
X1 = Years since Ph.D.; X2 = Publications Which IV has a greater “effect” of salary?

49

50

51 Standardized Regression Coefficients
Conceptually the same as standardizing all variables and then doing regression analysis Why does this work? Example with Years predicting Salary

52 Standardized Regression Coefficients
With a single predictor – Unstandardized 1,218 Years since Ph.D. Salary With a single predictor -- Standardized .71 Years since Ph.D. Salary

53

54 Standardized Regression Coefficients
With single IV Correlation between years and salary (r = .71) is the SAME as the standardized regression weight!

55 Standardized Regression Coefficients
β1 = Standardized Regression of first IV β2 = Standardized Regression of second IV βp = Standardized Regression of p IV β0 = Intercept (always = 0)

56 Remember Publications 122 Salary r = .66 977 Years since Ph.D.

57

58 Standardized Publications .21 Salary r = .66 .57 Years since Ph.D.

59 Regression Coefficients
Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) Which IV has a greater “effect” of salary? Can interpret in SD units

60 Regression Coefficients
Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) What would you predict the salary to be if a person’s Years = 1.2 and a persons publications = -.50? Interpret what these values mean!

61 Testing the full model How well does the model predict?
The fit test for the full model and its significance are equal for both standardized and unstandardized models

62 Person Z1 Z2 ZY 1 -1.26 .35 -.83 2 -.53 -.24 3 -.63 -1.40 -.84 4 .63 1.23 1.56 5 1.26 .36

63 Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862

64 Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63
-.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902

65 Multiple R

66 Multiple R

67 Testing for Significance
Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

68 Testing for Significance
Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

69 Multiple R Commonly used as R2 Can be tested for significance
Pros and Cons Can be tested for significance Does the set of variables (taken together) predict Y at better than chance levels? H1 : R* > 0 Ho : R* = 0

70 Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63
-.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902

71 Significance testing for Multiple R
p = number of predictors N = total number of observations

72 Significance testing for Multiple R
p = number of predictors N = total number of observations

73 Significance testing for Multiple R
p = number of predictors N = total number of observations

74 Significance testing for Multiple R
Fcrit Page # 737 Need two df Numerator df = p Denominator df = N – p - 1

75 Significance testing for Multiple R
Fcrit Need two df Numerator df = p Denominator df = N – p - 1 F (2, 2) = 19.00

76 Multiple R If F > Fcrit reject Ho and accept H1
If F < or = Fcrit fail to reject Ho Current problem – fail to reject Ho These two variables do not predict the outcome

77

78 Practice The teaching salary example Based on 15 people Two IVs

79 Significance testing for Multiple R
p = number of predictors N = total number of observations

80 Significance testing for Multiple R
F crit (2,12) = 3.89

81 Multiple R If F > Fcrit reject Ho and accept H1
If F < or = Fcrit fail to reject Ho Current problem – accept H1 These two variables do predict the outcome

82

83 Detour Moving back to issues of correlation This will help with . . .

84 Testing for Significance
Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

85

86 How strong is the relationship between publications and salary if we partial out the effect of years? What this is saying

87 Salary

88 Salary Publications

89 r2 SP = .35 Salary Publications

90 r2 SP = .35 Salary Publications
r2 is a ratio = Variance explained / Total Variance Total variance of Salary = 1 (standardized)

91 r2 SP = .35 .65 Salary Publications

92 e Salary a b c Publications Years

93 ? e Salary a b c Publications Years

94 ? e Salary a b c Publications Years How strong is the relationship between publications and salary if we partial out the effect of years?

95 Semipartial correlation of publications and salary
Years

96 Semipartial correlation of publications and salary
Years Multiple R2 = a + c + b

97 Multiple R

98 R2 = .53 or a + b + c e Salary a b c Publications Years

99 R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c e Salary
Publications Years

100 R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c
So what is just “a”? e Salary a b c Publications Years

101 R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c
So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c)

102 R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c
So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c) or R2 – r2sy

103 R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c
So what is just “a”? e Salary a b c Publications Years a = R2 – r2sy = = .03 Thus semipartial correlation = .17

104 R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c
So what is just “a”? e Salary a b c Publications Years What is the correlation between years and salary controlling for publications?

105


Download ppt "Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster."

Similar presentations


Ads by Google