Auto Accidents: What’s responsible?

Slides:



Advertisements
Similar presentations
Auto Accidents: Whats responsible? Math 70: Group Project Group 8 Janelle Chang Helena Jeanty Rhiana Quail.
Advertisements

Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.7 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
From Anova to Regression: analyzing the effect on consumption of no. of persons in family Family consumption data family.dta E/Albert/Courses/cdas/appstat00/From.
Heteroskedasticity The Problem:
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Adaptive expectations and partial adjustment Presented by: Monika Tarsalewska Piotrek Jeżak Justyna Koper Magdalena Prędota.
Multiple Regression Spring Gore Likeability Example Suppose: –Gore’s* likeability is a function of Clinton’s likeability and not directly.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Regression Example Using Pop Quiz Data. Second Pop Quiz At my former school (Irvine), I gave a “pop quiz” to my econometrics students. The quiz consisted.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Addressing Alternative Explanations: Multiple Regression Spring 2007.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Econ 314: Project 1 Answers and Questions Examining the Growth Data Trends, Cycles, and Turning Points.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Lecture 5. Linear Models for Correlated Data: Inference.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 Estimating and Testing  2 0 (n-1)s 2 /  2 has a  2 distribution with n-1 degrees of freedom Like other parameters, can create CIs and hypothesis tests.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
The statistics behind the game
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
From t-test to multilevel analyses Del-2
From t-test to multilevel analyses (Linear regression, GLM, …)
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 18 Omitted Variable Bias
ANOVA Advanced Statistical Methods: Continuous Variables ANOVA
The statistics behind the game
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Eva Ørnbøl + Morten Frydenberg
Common Statistical Analyses Theory behind them
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

Auto Accidents: What’s responsible? Group 8 Janelle Chang Helena Jeanty Rhiana Quail

DISCLAIMER!!! Weather conditions Drivers’ mental health Drivers’ physical health Time of day Time of year

Sorting the Data... National vs. Regional Reasoning… National: all 50 states (not including DC) Regional: Region 1 ~ North East Region 2 ~ South East Region 3 ~ South MidWest Region 4 ~ North MidWest Region 5 ~ South West Region 6 ~ North West Reasoning… Allows one to view any type of national behavior Allows for comparisons to be made within the United States

Normalizing Data Reason: Every entry needs to be expressed in a “standard” proportion so that the data can be evaluated equally. State populations differs Number of states per region differ Basic assumption: more people = more cars = higher number of automobile fatalities.

Testing #1: Does alcohol affect the number of drivers killed in car accidents? assumption Alcohol affects the number of people killed in car accidents BUT is not the only contributing factor. Younger people probably drink more irresponsibly so more likely to be involved and be responsible for fatal car accidents. #2: Does a combination of age and alcohol affect the number of people (including drivers) killed in car accidents? #3: Do individual regions mimic national data?

t-Test H0: tot. drivers killed = drunk drivers killed For each region: H0: tot. drivers killed = drunk drivers killed H1: tot. drivers killed  drunk drivers killed t-Test:  = 0.05, 95% confidence 2-sided test df = (# obs) - 1

t-Test (#1) Rejecting H0 Reject H0: | t| > t15 ie. 6.36 > 2.131 Source | SS df MS Number of obs = 17 -------------+------------------------------ F( 1, 15) = 40.43 Model | 1.1071e-12 1 1.1071e-12 Prob > F = 0.0000 Residual | 4.1075e-13 15 2.7383e-14 R-squared = 0.7294 -------------+------------------------------ Adj R-squared = 0.7114 Total | 1.5179e-12 16 9.4868e-14 Root MSE = 1.7e-07 ------------------------------------------------------------------------------ reg1normki~d | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- reg1normdr~k | 125.8276 19.78864 6.36 0.000 83.64916 168.0061 _cons | 1.37e-07 4.70e-08 2.91 0.011 3.66e-08 2.37e-07 Reject H0: | t| > t15 ie. 6.36 > 2.131

Regression: driverskilled = 1.37e-07+ 125.8276 * drunkdriverskilled

t-Test Accepting H0 Accept H0: | t| < t2 Source | SS df MS Number of obs = 4 -------------+------------------------------ F( 1, 2) = 10.11 Model | 6.5849e-14 1 6.5849e-14 Prob > F = 0.0863 Residual | 1.3030e-14 2 6.5150e-15 R-squared = 0.8348 -------------+------------------------------ Adj R-squared = 0.7522 Total | 7.8879e-14 3 2.6293e-14 Root MSE = 8.1e-08 ------------------------------------------------------------------------------ reg5normki~d | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- reg5normdr~k | 90.53328 28.47673 3.18 0.086 -31.99218 213.0587 _cons | 1.85e-07 5.11e-08 3.62 0.069 -3.50e-08 4.05e-07 Accept H0: | t| < t2 ie. -4.303 < 3.18 < 4.303

Regression: driverskilled = 110.3849 + 1.27e-07 *drunkdriverskilled

Testing #1: Does alcohol affect the number of drivers killed in car accidents? assumption Alcohol affects the number of people killed in car accidents BUT is not the only contributing factor. Younger people probably drink more irresponsibly so more likely to be involved and be responsible for fatal car accidents. #2: Does a combination of age and alcohol affect the number of people (including drivers) killed in car accidents? #3: Do individual regions mimic national data?

F-Test For each region: H0: 1 = 2 = 0 H1: 1  2 (at least one  i  0) F-Test:  = 0.05, 95% confidence 1-sided test

F-Test (#2) Rejecting H0 Reject H0: F0.025, 2, 8 > 4.46 Source | SS df MS Number of obs = 11 -------------+------------------------------ F( 2, 8) = 8.22 Model | 7.0241e-13 2 3.5121e-13 Prob > F = 0.0115 Residual | 3.4191e-13 8 4.2739e-14 R-squared = 0.6726 -------------+------------------------------ Adj R-squared = 0.5908T Total | 1.0443e-12 10 1.0443e-13 Root MSE = 2.1e-07 ------------------------------------------------------------------------------ reg1normki~d | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- -----+---------------------------------------------------------------- reg1normdr~k | 102.7064 39.55021 2.60 0.032 11.50342 193.9093 personskil~d | -.0043674 .0144079 -0.30 0.770 -.037592 .0288572 _cons | 2.69e-07 2.30e-07 1.17 0.275 -2.61e-07 7.99e-07 Reject H0: F0.025, 2, 8 > 4.46 ie. 8.22 > 4.46 peoplekilled = 2.69e-07 + 102.7064 * drunkdrivers - .0043674 * agekilled

F-Test Accepting H0 Accept H0: F0.025, 2, 5 < 5.79 Source | SS df MS Number of obs = 8 -------------+------------------------------ F( 2, 5) = 3.64 Model | 3.2398e-14 2 1.6199e-14 Prob > F = 0.1059 Residual | 2.2270e-14 5 4.4540e-15 R-squared = 0.5926 -------------+------------------------------ Adj R-squared = 0.4297 Total | 5.4668e-14 7 7.8097e-15 Root MSE = 6.7e-08 ------------------------------------------------------------------------------ reg3normki~d | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- reg3normdr~k | 394.0596 146.2681 2.69 0.043 18.06551 770.0537 agekilled | -.0011894 .0022405 -0.53 0.618 -.0069489 .0045701 _cons | 1.38e-07 4.85e-08 2.85 0.036 1.36e-08 2.63e-07 Accept H0: F0.025, 2, 5 < 5.79 ie. 2.69 < 5.79 driverskilled = 1.38e-07 + 394.0596 * drunkdrivers -.0011894 * agekilled

Testing #1: Does alcohol affect the number of drivers killed in car accidents? assumption Alcohol affects the number of people killed in car accidents BUT is not the only contributing factor. Younger people probably drink more irresponsibly so more likely to be involved and be responsible for fatal car accidents. #2: Does a combination of age and alcohol affect the number of people (including drivers) killed in car accidents? #3: Do individual regions mimic national data?

Confidence Intervals (#3) Confidence Interval of the mean for the National Data National Mean of drivers killed: 2.96763E-07 Confidence Interval (2.12099E-07, 3.81428E-07) (2.96763E-07 - 8.46641E-08 , 2.96763E-07 + 8.46641E-08)

Region Results with Confidence Intervals Lies within National CI Region Mean

Graph of National Data

ANOVA Test H0: national = reg 1 = reg 2 = .….. = reg 6 The number of divers killed in car accidents is independent of the region in which they occur. Reject H0 if F > F0.95, 3, 2 = 19.2 F = 7.7631 < 19.2 so accept H0

Conclusions Nationally, 4 out of the 6 regions rejected the F-test null hypothesis => there is a correlation between age, BAC, and the number of drivers killed. Regionally, 4 out 6 supported the national data trend. The regressions carried out confirm that the number of people killed depends on the number of drunk drivers. Regions do not reflect the national trend for the average number of drivers killed. The number of drivers killed does not depend on the region in which they occur.