Part 24: Hypothesis Tests 24-1/33 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics
Part 24: Hypothesis Tests 24-2/33 Statistics and Data Analysis Part 24 – Hypothesis Tests
Part 24: Hypothesis Tests 24-3/33 Hypothesis Tests Hypothesis Tests in the Regression Model Tests of Independence of Random Variables
Part 24: Hypothesis Tests 24-4/33 Application: Monet Paintings Does the size of the painting really explain the sale prices of Monets paintings? Investigate: Compute the regression Hypothesis: The slope is actually zero. Rejection region: Slope estimates that are very far from zero. The hypothesis that β = 0 is rejected
Part 24: Hypothesis Tests 24-5/33 Regression Analysis Investigate: Is the coefficient in a regression model really nonzero? Testing procedure: Model: y = α + βx + ε Hypothesis: H 0 : β = 0. Rejection region: Least squares coefficient is far from zero. Test: α level for the test = 0.05 as usual Compute t = b/StandardError Reject H 0 if t is above the critical value 1.96 if large sample Value from t table if small sample. Reject H 0 if reported P value is less than α level Degrees of Freedom for the t statistic is N-2
Part 24: Hypothesis Tests 24-6/33 An Equivalent Test Is there a relationship? H 0 : No correlation Rejection region: Large R 2. Test: F= Reject H 0 if F > 4 Math result: F = t 2. Degrees of Freedom for the F statistic are 1 and N-2
Part 24: Hypothesis Tests 24-7/33 Partial Effect Hypothesis: If we include the signature effect, size does not explain the sale prices of Monet paintings. Test: Compute the multiple regression; then H 0 : β 1 = 0. α level for the test = 0.05 as usual Rejection Region: Large value of b 1 (coefficient) Test based on t = b 1 /StandardError Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation is ln (US$) = ln (SurfaceArea) Signed Predictor Coef SE Coef T P Constant ln (SurfaceArea) Signed S = R-Sq = 46.2% R-Sq(adj) = 46.0% Reject H 0. Degrees of Freedom for the t statistic is N-3 = N-number of predictors – 1.
Part 24: Hypothesis Tests 24-8/33 Testing The Regression Degrees of Freedom for the F statistic are K and N-K-1
Part 24: Hypothesis Tests 24-9/33 n 1 = Number of predictors n 2 = Sample size – number of predictors – 1
Part 24: Hypothesis Tests 24-10/33 Cost Function Regression The regression is significant. F is huge. Which variables are significant? Which variables are not significant?
Part 24: Hypothesis Tests 24-11/33 Application: Part of a Regression Model Regression model includes variables x1, x2,… I am sure of these variables. Maybe variables z1, z2,… I am not sure of these. Model: y = α+β 1 x1+β 2 x2 + δ 1 z1+δ 2 z2 + ε Hypothesis: δ 1 =0 and δ 2 =0. Strategy: Start with model including x1 and x2. Compute R 2. Compute new model that also includes z1 and z2. Rejection region: R 2 increases a lot.
Part 24: Hypothesis Tests 24-12/33 Test Statistic
Part 24: Hypothesis Tests 24-13/33 Gasoline Market
Part 24: Hypothesis Tests 24-14/33 Gasoline Market Regression Analysis: logG versus logIncome, logPG The regression equation is logG = logIncome logPG Predictor Coef SE Coef T P Constant logIncome logPG S = R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total R 2 = / =
Part 24: Hypothesis Tests 24-15/33 Gasoline Market Regression Analysis: logG versus logIncome, logPG,... The regression equation is logG = logIncome logPG logPNC logPUC logPPT Predictor Coef SE Coef T P Constant logIncome logPG logPNC logPUC logPPT S = R-Sq = 96.0% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Now, R 2 = / = Previously, R 2 = / =
Part 24: Hypothesis Tests 24-16/33 Improvement in R 2 Inverse Cumulative Distribution Function F distribution with 3 DF in numerator and 46 DF in denominator P( X <= x ) = 0.95 x = The null hypothesis is rejected. Notice that none of the three individual variables are significant but the three of them together are.
Part 24: Hypothesis Tests 24-17/33 Application Health satisfaction depends on many factors: Age, Income, Children, Education, Marital Status Do these factors figure differently in a model for women compared to one for men? Investigation: Multiple regression Null hypothesis: The regressions are the same. Rejection Region: Estimated regressions that are very different.
Part 24: Hypothesis Tests 24-18/33 Equal Regressions Setting: Two groups of observations (men/women, countries, two different periods, firms, etc.) Regression Model: y = α+β 1 x1+β 2 x2 + … + ε Hypothesis: The same model applies to both groups Rejection region: Large values of F
Part 24: Hypothesis Tests 24-19/33 Procedure: Equal Regressions There are N1 observations in Group 1 and N2 in Group 2. There are K variables and the constant term in the model. This test requires you to compute three regressions and retain the sum of squared residuals from each: SS1 = sum of squares from N1 observations in group 1 SS2 = sum of squares from N2 observations in group 2 SSALL = sum of squares from NALL=N1+N2 observations when the two groups are pooled. The hypothesis of equal regressions is rejected if F is larger than the critical value from the F table (K numerator and NALL-2K-2 denominator degrees of freedom)
Part 24: Hypothesis Tests 24-20/ |Variable| Coefficient | Standard Error | T |P value]| Mean of X| Women===|=[NW = 13083]================================================ Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | Men=====|=[NM = 14243]================================================ Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | Both====|=[NALL = 27326]============================================== Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | German survey data over 7 years, 1984 to 1991 (with a gap). 27,326 observations on Health Satisfaction and several covariates. Health Satisfaction Models: Men vs. Women
Part 24: Hypothesis Tests 24-21/33 Computing the F Statistic | Women Men All | | HEALTH Mean = | | Standard deviation = | | Number of observs. = | | Model size Parameters = | | Degrees of freedom = | | Residuals Sum of squares = | | Standard error of e = | | Fit R-squared = | | Model test F (P value) = (.000) (.000) (.0000) |
Part 24: Hypothesis Tests 24-22/33 A Test of Independence In the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent Formal hypothesis, based only on the laws of probability: Prob(Own,Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities. Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.
Part 24: Hypothesis Tests 24-23/33 A Contingency Table Analysis
Part 24: Hypothesis Tests 24-24/33 Independence Test Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions. [Rent,Reject] x = [Rent,Accept] x = [Own,Reject] x = [Own,Accept] x =
Part 24: Hypothesis Tests 24-25/33 Comparing Actual to Expected
Part 24: Hypothesis Tests 24-26/33 When is Chi Squared Large? For a 2x2 table, the critical chi squared value for α = 0.05 is (Not a coincidence, 3.84 = ) Our is large, so the hypothesis of independence between the acceptance decision and the own/rent status is rejected.
Part 24: Hypothesis Tests 24-27/33 Computing the Critical Value Calc Probability Distributions Chi- square The value reported is For an R by C Table, D.F. = (R-1)(C-1)
Part 24: Hypothesis Tests 24-28/33 Analyzing Default Do renters default more often (at a different rate) than owners? To investigate, we study the cardholders (only) We have the raw observations in the data set. DEFAULT OWNRENT 0 1 All All
Part 24: Hypothesis Tests 24-29/33 Hypothesis Test
Part 24: Hypothesis Tests 24-30/33 Treatment Effects in Clinical Trials Does Phenogyrabluthefentanoel (Zorgrab) work? Investigate: Carry out a clinical trial. N+0 = The placebo effect N+T – N+0 = The treatment effect Is N+T > N+0 (significantly)? Placebo Drug Treatment No Effect N00 N0T Positive Effect N+0 N+T
Part 24: Hypothesis Tests 24-31/33
Part 24: Hypothesis Tests 24-32/33 Confounding Effects
Part 24: Hypothesis Tests 24-33/33 What About Confounding Effects? Normal Weight Obese Nonsmoker Smoker Age and Sex are usually relevant as well. How can all these factors be accounted for at the same time?