Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Similar presentations


Presentation on theme: "Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable."— Presentation transcript:

1

2 Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

3 Association versus causation

4

5

6 Scatterplots

7 Weeks since beginning of semester Percentage of computers used in computer labs free

8

9 Stata Exercise 1

10

11 Stata Exercise 2 Suppose we were considering the effect of hiring more people into the firm. On average, what total billings can we expect from a staff of 50? 150?

12 Stata Exercise 3

13

14

15

16

17

18 Stata Exercise 4

19 Stata Exercise 5 Adding Categorical Values to a Scatterplot Often it is useful to have a way of distinguishing groups of data in a scatterplot

20

21

22

23

24

25 Stata Exercise 6

26

27

28

29 Transforming Data Data analysts often look for a transformation of the data that simplifies the overall pattern. The transformation typically involves turning a non-Normally distributed variable into a more-or-less Normally distributed variable. Stata Exercise 7

30 Categorical Explanatory Variable What if the explanation for the numbers is not another number but the category? For example, investing in a particular sector of the economy might be great in some years or terrible in others. Stata Exercise 8

31 More scatterplots Relations between competitors Stata Exercise 9

32 Correlation

33 Which one has the stronger correlation?

34 r = covariance(x,y) / [stdev(x)*stdev(y)] r = (1/(n-1)) * sum of [(standardized values of x) (standardized values of] y)

35

36 Correlation The r coefficient between measures of height and weight is positive because people who are of above-average height tend to be of above-average weight … so if the z-score for height is large, the z-score for weight tends to be large. r = (1/(n-1)) * sum of [(standardized values of x) (standardized values of] y) Correlation applet at www.whfreeman.com/pbs

37 Stata Exercise 11

38

39 Correlation Correlation coefficients, as well as scatterplots can be used for comparisons. For example, how well did Vanguard International Growth Fund (an investment vehicle) do compared to an average of the stocks in Europe, Australasia and the Far East? Stata Exercise 12

40 Correlation Doesn’t tell you anything about causality Variables must be numerical It is indifferent to units of measurement r>0 means positive association; r<0, negative -1 < r < 1. r = -1 means a perfectly straight downward-sloping line. r=0 means no relation. r only measures linear relations r is not resistant to outliers Stata Exercise 13

41

42

43 Regression

44 The Linear Regression Model Errors have a mean 0 and a constant sd of  and are independent of x.

45

46

47

48

49

50

51

52

53

54

55 (66.5’’, $20,000) (76.5’’, $35,600) (61.5’’, $12,200) y – 20,000 = 1560 (x - 66.5) y = – 84,000 + 1560 x Sketch a scatterplot of the data consistent with this line $37,694 95% of values

56

57 Draw the best-fitting line through the circles

58

59 Mark with an “X” the average “y” value for each “x” value. Then draw the best-fitting line through the Xs

60

61

62

63

64 Regression (unlike correlation) is sensitive to your determination of which variable is explanatory and which response. Sales = a + b(item) Item = a + b(sales) Fact 1 Stata Exercise 14

65 Facts 2 and 3 If x changes by one standard deviation of x, y changes by r standard deviations of y. – E.g., s x = 1, s y = 2, and r = 0.61. If x changes by 1, y will change by 2*0.61 = 1.22 The regression line goes through the point – The point-slope form of the line requires only the information on this slide to draw a line.

66 Fact 4 Correlation r is related to the slope of the regression line and therefore to the relation between x and y. Actually, the square of r, that is, R 2 is the fraction of the variation in y that is explained by the variation in x.

67

68 Because most of the variation in gas consumption is explained by temperature, the R 2 of this regression is very high.

69

70

71 tbill98tbill98_hatresiduals 11.510.84649 12.612.19961 13.814.81564 6.45.975251 5.36.336083 Excel Exercise 1

72 Stata Exercises 15 and 16

73

74

75

76

77

78

79 With influential observations Without influential observation 21

80 Stata Exercise 17

81

82

83

84 Cautions about Correlation and Regression Don’t extrapolate too far Correlations are stronger for averages than for individuals Beware of lurking (latent, hidden, excluded, neglected) variables Association is not causation – Establishing causation takes a lot of work (see p. 139).

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102


Download ppt "Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable."

Similar presentations


Ads by Google