Time Series Analysis – Chapter 4 Hypothesis Testing Hypothesis testing is basic to the scientific method and statistical theory gives us a way of conducting tests of scientific hypotheses. Scientific philosophy today rests on the idea of falsification: For a theory to be a valid scientific theory it must be possible, at least in principle, to make observations that would prove the theory false. For example, here is a simple theory: All swans are white
Time Series Analysis – Chapter 4 Hypothesis Testing All swans are white This is a valid scientific theory because there is a way to falsify it: I can observe one black swan and the theory would fall. For more information on the history and philosophy of falsification I suggest reading Karl Popper.
Time Series Analysis – Chapter 4 Hypothesis Testing Besides the idea of falsification, we must keep in mind the other basic tenant of the scientific method: All evidence that supports a theory or falsifies it must be empirically based and reproducible.
In other words, data! Just holding a belief (no matter how firm) that a theory is true or false is not a justifiable stance. This chapter gives us the most basic statistical tools for taking data or empirical evidence and using it to substantiate or nullify (show to be false) a hypothesis.
All evidence that supports a theory or falsifies it must be empirically based and reproducible. I have just used the word hypothesis and this chapter is concerned with hypothesis testing, not theory testing. This is because theories are composed of many hypotheses and, usually, a theory is not directly supported or attacked but one or more of it’s supporting hypotheses are scrutinized.
Discrimination or Not Activity Null Hypothesis H o : No Discrimination Alternative Hypothesis H a : Discrimination How do we choose which hypothesis to support?
Discrimination or Not Activity Null Hypothesis H o : No Discrimination Alternative Hypothesis H a : Discrimination How do we choose which hypothesis to support?
The p-value p-value measures amount of support for alternative hypothesis. The smaller the p-value the more support for the alternative hypothesis. Typical level of support is 5% or 0.05
Fourth Graders Feet Data Set
Fourth Graders Feet Data Set – Minitab Output Predictor variable (x): Childs Age Response variable (y): Foot Length The regression equation is Foot Length = Childs Age Predictor Coef SE Coef T P Constant Childs Age
Fourth Graders Feet Data Set – Minitab Output
Fourth Graders Feet Data Set – One Tailed Alternative
Statistical vs. Practical Significance 401K data set Predictor variables x 1 : mrate x 2 : age x 3 : totemp Response variable (y): prate
Statistical vs. Practical Significance The regression equation is: prate = mrate age totemp Predictor Coef SE Coef T P Constant mrate age totemp
Statistical vs. Practical Significance The regression equation is: prate = mrate age totemp All predictors are statistically significant.
Statistical vs. Practical Significance The regression equation is: prate = mrate age totemp If total number of employees increases by ten thousand then participation rate decreases by *10,000 = 1.3% (other predictors held constant)
Boeing 747 Jet What does an empty Boeing 747 jet weigh?
Boeing 747 Jet What does an empty Boeing 747 jet weigh? My point estimate: 250,000 lbs Answer: 358,000 lbs I am wrong! A point estimate is almost always wrong!
Boeing 747 Jet What does an empty Boeing 747 jet weigh? My confidence interval estimate: (0, ∞) Answer: 358,000 lbs I am right! But, my interval is not useful!
Point and Interval Estimates – Minitab will compute both 401K data set Predictor variables x 1 : age Response variable (y): prate In Minitab go to Regression -> General Regression and select the correct model variables then click on the Results box and make sure the “Display confidence intervals” box is selected.
Point and Interval Estimates – Minitab will compute both 401K data set Predictor variables x 1 : age Response variable (y): prate Regression Equation prate = age Coefficients Term Coef SE Coef T P 95% CI Constant ( , ) age ( , )
Confidence Intervals
Where does come from? t distribution with n – k – 1 degrees of freedom where k is the number of predictors in the model. For our model, n = 1533 and k = 1 We also need to know the confidence level of the interval (typically 95%) Then, use a t table!
Testing Linear Combinations of Parameters TWOYEAR data set Predictor variables x 1 : jc – # years attending a two-year college x 2 : univ – # years attending a four-year college x 3 : exper – months in workforce Response variable (y): log(wage)
Testing Linear Combinations of Parameters
The ANOVA F Test
Multiple Linear Regression Assumptions
MLR Assumption 2: Data comes from a random sample
Multiple Linear Regression Assumptions MLR Assumption 3: None of the independent or predictor variables are perfectly correlated (if they were, Minitab would not run a regression analysis).
Multiple Linear Regression Assumptions MLR Assumption 4: The error, u, has an expected value of zero.
Multiple Linear Regression Assumptions MLR Assumption 5: The error, u, has the same variance given any values of the explanatory variables. This is the assumption of homoskedasticity.
Multiple Linear Regression Assumptions