The Wealth of Nations Jamie Brabston Matt Caulfield Mark Testa.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Ecole Nationale Vétérinaire de Toulouse Linear Regression
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
AP Statistics Course Review.
Statistical Methods Lecture 28
Topic 7 – Other Regression Issues Reading: Some parts of Chapters 11 and 15.
Residuals.
Kin 304 Regression Linear Regression Least Sum of Squares
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.
Conclusion to Bivariate Linear Regression Economics 224 – Notes for November 19, 2008.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Correlation and Regression
Getting to Know Your Scatterplot and Residuals
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Basic Statistical Concepts
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24 Multiple Regression (Sections )
Regression Diagnostics Checking Assumptions and Data.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Linear regression Brian Healy, PhD BIO203.
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Linear Regression/Correlation
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.
Correlation & Regression
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Regression and Correlation Methods Judy Zhong Ph.D.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Multiple Regression Analysis
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Chapter 3: Examining relationships between Data
Ch 3 – Examining Relationships YMS – 3.1
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
WARM-UP Do the work on the slip of paper (handout)
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTION 10.3 Variable selection Confounding variables.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Lecture Slides Elementary Statistics Twelfth Edition
The simple linear regression model and parameter estimation
Statistics 101 Chapter 3 Section 3.
Chapter 9 Multiple Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
Chapter 12: Regression Diagnostics
BPK 304W Regression Linear Regression Least Sum of Squares
BPK 304W Correlation.
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 29: Multiple Regression*
Checking the data and assumptions before the final analysis.
Regression Forecasting and Model Building
Review of Chapter 3 Examining Relationships
Presentation transcript:

The Wealth of Nations Jamie Brabston Matt Caulfield Mark Testa

Overview Introduction Regression of Individual Variables Multicollinearity Multiple Regression Stepwise Regression Final Model

Introduction Collected data for 30 countries 12 variables Life expectancy, median age, population growth, population density, literacy rate, unemployment rate, oil consumption – oil production, cell phone / land line, military expenditures, area, sex ratio, external debt Goal: create a model to predict GDP per capita

Life Expectancy

Analysis: R 2 : P-value: Highly significant. An outlier was identified using a Leverage-residual plot and removed. Residuals vs. Fitted Values plot showed nonlinearity. Tried a Box-Cox transform.

Life Expectancy - Top: Influential data points. - Bottom: Non-influential data points. - Left: Non-outliers. - Right: Outliers. Upshot: Eliminate points in the top right quadrant as influential outliers.

Life Expectancy Box-Cox Transform: y -> (y p - 1)/p Produces linear fit if variables are related by a power law. This plot shows the goodness of the fit as a function of p. In this case, the optimal p is fairly small.

Life Expectancy Linear regression was done on the BC transformed data. Significant nonlinearity remained.

Life Expectancy Conclusions: Clearly, there is a significant positive relationship between per capita GDP and life expectancy. We could not identify the precise nature of the relationship. This prevents extrapolation and prediction.

Median Age

Analysis: R 2 : P-value: Highly significant. No suspected outliers. The plot of Residuals vs. Fitted values is approximately linear, but significantly deviated from normal.

Median Age Box-Cox Transform gives:

Median Age Box-Cox transform significantly improved the normality of the residual distribution. The Box-Cox p = R 2 is improved to Final Model: (GDP 0.15 – 1)/0.15 = (Med.Age)

Population Growth

Analysis: R 2 = p-value: Correlation is very low, and the p-value is outside any reasonable significance level. An outlier was found and eliminated using a Leverage-Residual plot.

Population Growth Box-Cox Transform:

Population Growth A Box-Cox transform improved the nonlinearity slightly, and gave a significant p-value. From this, we concluded that population growth has a slight negative relationship with GDP. No detailed predictions are possible because significant nonlinearity remains.

Population Density

Analysis: The outlier on the far right corresponds to Singapore, a country with an exceptionally high population density. A less extreme outlier is China. Both of these data points were removed.

Population Density

The p-value for the data without outliers is a very insignificant A Box-Cox transform was attempted, but the p-value did not get close to significance. Conclusion: Population density and GDP are essentially unrelated.

Literacy Rate

Final model: GDP= (literacy rate)

Unemployment Rate

Final model: GDP= (unemployment rate)

Oil Consumption – Production

Final model: GDP= (literacy rate)

Cell phones vs. Landlines

Final model: GDP= (cells vs landlines)

Military Expenditures

Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear

Area

Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear Residuals are not random Q-Q plot isn’t normal

Sex Ratio

Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear Residuals are not random Q-Q plot isn’t normal

External Debt

Analysis Doesn’t pass conditions for regression Data isn’t linear Residuals aren’t random Q-Q plot is curved Outliers

Analysis of Box-Cox Model Doesn’t pass conditions for regression Data isn’t linear Residuals are not random Q-Q plot isn’t normal

Multicollinearity Multicollinearity occurs when two explanatory variables are linearly related. A stepwise regression will conclude both are significant, even though the model would work just as well with only one. Variance inflation factors between each pair of explanatory variables were found, and none were too high. There is no significant multicollinearity.

Multiple Regression Taking into account all 12 variables at once High R 2 Not accurate In our data: Too many variables Too few observations

Stepwise Regression Stepwise regression model: predicted GDP = e (median age) (population growth) e-04(external debt) e-03(population density) R-squared 80.78% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

Removing Outliers One influential outlier Singapore Very high population density Small country with a lot of people financially well to do

Stepwise Model w/o Outlier New model after removing Singapore predicted GDP = e (median age) (population growth) e-04(external debt) e-03(population density) R-squared 83.89% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

Box-Cox Transformation

Box-Cox Model New Model (all data points) ((predicted GDP)^(0.5)-1) / (0.5) = e e-01(median age) (population growth) e- 04(external debt) e-04(population density) R-squared 82.8% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

Box-Cox w/o Outlier New model after removing Singapore ((predicted GDP)^(0.5)-1) / (0.5) = e e-01(median age) (population growth) e- 04(external debt) – 3.106e-03(population density) R-squared 87.35% of the variability in GDP per capita is accounted for by the linear association with median age, population growth, external debt, and population density

Final Model Box-Cox model without outlier ((predicted GDP)^(0.5)-1) / (0.5) = e e-01(median age) (population growth) e- 04(external debt) – 3.106e-03(population density) Greece Observed GDP: 30.6 Predicted GDP: 34.6