Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Wealth of Nations Jamie Brabston Matt Caulfield Mark Testa.

Similar presentations


Presentation on theme: "The Wealth of Nations Jamie Brabston Matt Caulfield Mark Testa."— Presentation transcript:

1 The Wealth of Nations Jamie Brabston Matt Caulfield Mark Testa

2 Overview Introduction Regression of Individual Variables Multicollinearity Multiple Regression Stepwise Regression Final Model

3 Introduction Collected data for 30 countries 12 variables Life expectancy, median age, population growth, population density, literacy rate, unemployment rate, oil consumption – oil production, cell phone / land line, military expenditures, area, sex ratio, external debt Goal: create a model to predict GDP per capita

4 Life Expectancy

5 Analysis: R 2 : 0.45. P-value: Highly significant. An outlier was identified using a Leverage-residual plot and removed. Residuals vs. Fitted Values plot showed nonlinearity. Tried a Box-Cox transform.

6 Life Expectancy - Top: Influential data points. - Bottom: Non-influential data points. - Left: Non-outliers. - Right: Outliers. Upshot: Eliminate points in the top right quadrant as influential outliers.

7 Life Expectancy Box-Cox Transform: y -> (y p - 1)/p Produces linear fit if variables are related by a power law. This plot shows the goodness of the fit as a function of p. In this case, the optimal p is fairly small.

8 Life Expectancy Linear regression was done on the BC transformed data. Significant nonlinearity remained.

9 Life Expectancy Conclusions: Clearly, there is a significant positive relationship between per capita GDP and life expectancy. We could not identify the precise nature of the relationship. This prevents extrapolation and prediction.

10 Median Age

11 Analysis: R 2 : 0.58. P-value: Highly significant. No suspected outliers. The plot of Residuals vs. Fitted values is approximately linear, but significantly deviated from normal.

12 Median Age Box-Cox Transform gives:

13 Median Age Box-Cox transform significantly improved the normality of the residual distribution. The Box-Cox p = 0.15. R 2 is improved to 0.72. Final Model: (GDP 0.15 – 1)/0.15 = -2.1 + 0.17(Med.Age)

14 Population Growth

15 Analysis: R 2 = 0.058. p-value: 0.11. Correlation is very low, and the p-value is outside any reasonable significance level. An outlier was found and eliminated using a Leverage-Residual plot.

16 Population Growth Box-Cox Transform:

17 Population Growth A Box-Cox transform improved the nonlinearity slightly, and gave a significant p-value. From this, we concluded that population growth has a slight negative relationship with GDP. No detailed predictions are possible because significant nonlinearity remains.

18 Population Density

19 Analysis: The outlier on the far right corresponds to Singapore, a country with an exceptionally high population density. A less extreme outlier is China. Both of these data points were removed.

20 Population Density

21 The p-value for the data without outliers is a very insignificant 0.68. A Box-Cox transform was attempted, but the p-value did not get close to significance. Conclusion: Population density and GDP are essentially unrelated.

22 Literacy Rate

23

24

25

26

27 Final model: GDP= -3.320 +.0657(literacy rate)

28 Unemployment Rate

29

30

31

32

33 Final model: GDP= 1.388 -.0236(unemployment rate)

34 Oil Consumption – Production

35

36

37

38

39

40 Final model: GDP= -3.320 +.0657(literacy rate)

41 Cell phones vs. Landlines

42

43

44

45

46

47

48 Final model: GDP= 1.52811 -.0928(cells vs landlines)

49 Military Expenditures

50

51 Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

52

53

54

55

56 Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear

57 Area

58

59 Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

60

61

62

63

64 Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear Residuals are not random Q-Q plot isn’t normal

65 Sex Ratio

66

67 Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

68

69

70

71

72 Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear Residuals are not random Q-Q plot isn’t normal

73 External Debt

74

75 Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

76

77

78

79

80 Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear Residuals are not random Q-Q plot isn’t normal

81 Multicollinearity Multicollinearity occurs when two explanatory variables are linearly related. A stepwise regression will conclude both are significant, even though the model would work just as well with only one. Variance inflation factors between each pair of explanatory variables were found, and none were too high. There is no significant multicollinearity.

82 Multiple Regression Taking into account all 12 variables at once High R 2 Not accurate In our data: Too many variables Too few observations

83 Stepwise Regression Stepwise regression model: predicted GDP = -6.499e+01 + 2.296(median age) + 9.385(population growth) + 9.723e-04(external debt) + 1.808e-03(population density) R-squared 80.78% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

84

85

86 Removing Outliers One influential outlier Singapore Very high population density Small country with a lot of people financially well to do

87 Stepwise Model w/o Outlier New model after removing Singapore predicted GDP = -6.277e+01 + 2.257(median age) + 8.885(population growth) + 9.274e-04(external debt) + 2.232e-03(population density) R-squared 83.89% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

88

89 Box-Cox Transformation

90 Box-Cox Model New Model (all data points) ((predicted GDP)^(0.5)-1) / (0.5) = - 1.388e+01 + 5.560e-01(median age) + 1.915(population growth) + 1.665e- 04(external debt) + 2.228e-04(population density) R-squared 82.8% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

91

92

93 Box-Cox w/o Outlier New model after removing Singapore ((predicted GDP)^(0.5)-1) / (0.5) = - 1.258e+01 + 5.382e-01(median age) + 1.686(population growth) + 1.682e- 04(external debt) – 3.106e-03(population density) R-squared 87.35% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

94

95 Final Model Box-Cox model without outlier ((predicted GDP)^(0.5)-1) / (0.5) = - 1.258e+01 + 5.382e-01(median age) + 1.686(population growth) + 1.682e- 04(external debt) – 3.106e-03(population density) Greece Observed GDP: 30.6 Predicted GDP: 34.6


Download ppt "The Wealth of Nations Jamie Brabston Matt Caulfield Mark Testa."

Similar presentations


Ads by Google