1 Power 14 Goodness of Fit & Contingency Tables
2 Outline u I. Projects u II. Goodness of Fit & Chi Square u III.Contingency Tables
3 Part I: Projects u Teams u Assignments u Presentations u Data Sources u Grades
4 Team One u Catherine Wohletz: Project choice u Joshua Friedberg: Data Retrieval u Julio Urenda: Statistical Analysis u Daniel Grund: PowerPoint Presentation u Takuro Hatanaka: Executive Summary u Sylvia Salinas: Technical Appendix
5 Assignments u 1. Project choice u 2. Data Retrieval u 3. Statistical Analysis u 4. PowerPoint Presentation u 5. Executive Summary u 6. Technical Appendix
6 PowerPoint Presentations: Member 4 u 1. Introduction: Members 1,2, 3 –What –Why –How u 2. Executive Summary: Member 5 u 3. Exploratory Data Analysis: Member 3 u 4. Descriptive Statistics: Member 3 u 5. Statistical Analysis: Member 3 u 6. Conclusions: Members 3 & 5 u 7. Technical Appendix: Table of Contents, Member 6
7 Executive Summary and Technical Appendix
8
9 Grades
10 Data Sources u FRED: Federal Reserve Bank of St. Louis, –Business/Fiscal F Index of Consumer Sentiment, Monthly (1952:11) F Light Weight Vehicle Sales, Auto and Light Truck, Monthly ( ) u Economagic, u U S Dept. of Commerce, –Population –Economic Analysis,
11 Data Sources (Cont. ) u Bureau of Labor Statistics, u California Dept of Finance,
12 II. Goodness of Fit & Chi Square u Rolling a Fair Die u The Multinomial Distribution u Experiment: 600 Tosses
13 The Expected Frequencies
14 The Expected Frequencies & Empirical Frequencies Empirical Frequency
15 Hypothesis Test u Null H 0 : Distribution is Multinomial u Statistic: (O i - E i ) 2 /E i, : observed minus expected squared divided by expected u Set Type I 5% for example u Distribution of Statistic is Chi Square P(n 1 =1, n 2 n 3 =0, n 4 =0, n 5 =0, n 6 =0) = n!/ P(n 1 =1, n 2 =0, n 3 =0, n 4 =0, n 5 =0, n 6 =0) = n!/ P(n 1 =1, n 2 n 3 =0, n 4 =0, n 5 =0, n 6 =0)= 1!/1!0!0!0!0!0!(1/6) 1 (1/6) 0 P(n 1 =1, n 2 =0, n 3 =0, n 4 =0, n 5 =0, n 6 =0)= 1!/1!0!0!0!0!0!(1/6) 1 (1/6) 0 (1/6) 0 (1/6) 0 (1/6) 0 (1/6) 0 One Throw, side one comes up: multinomial distribution
16 Chi Square: x 2 = (O i - E i ) 2 = 6.15
Chi Square Density for 5 degrees of freedom %
18 Contingency Table Analysis u Tests for Association Vs. Independence For Qualitative Variables
19 Does Consumer Knowledge Affect Purchases? Frost Free Refrigerators Use More Electricity
20 Marginal Counts
21 Marginal Distributions, f(x) & f(y)
22 Joint Disribution Under Independence f(x,y) = f(x)*f(y)
23 Expected Cell Frequencies Under Independence
24 Observed Cell Counts
25 Contribution to Chi Square: (observed-Expected) 2 /Expected Chi Sqare = = 3.09 (m-1)*(n-1) = 1*1=1 degrees of freedom Upper Left Cell: ( ) 2 /324 = 100/324 =0.31
5% 5.02
27 Conclusion u No association between consumer knowledge about electricity use and consumer choice of a frost-free refrigerator
28 Using Goodness of Fit to Choose Between Competing Probability Models u Men on base when a home run is hit
29 Men on base when a home run is hit
30 Conjecture u Distribution is binomial
31 Average # of men on base Sum of products = n*p = = 0.63
32 Using the binomial k=men on base, n=# of trials u P(k=0) = [3!/0!3!] (0.21) 0 (0.79) 3 = u P(k=1) = [3!/1!2!] (0.21) 1 (0.79) 2 = u P(k=2) = [3!/2!1!] (0.21) 2 (0.79) 1 = u P(k=3) = [3!/3!0!] (0.21) 3 (0.79) 0 = 0.009
33 Assuming the binomial u The probability of zero men on base is u the total number of observations is 765 u so the expected number of observations for zero men on base is 0.493*765=377.1
34 Goodness of Fit
Chi Square, 3 degrees of freedom 5% 7.81
36 Conjecture: Poisson where np = 0.63 u P(k=3) = 1- P(k=2)-P(k=1)-P(k=0) P(k=0) = e - k /k! = e (0.63) 0 /0! = P(k=1) = e - k /k! = e (0.63) 1 /1! = P(k=2) = e - k /k! = e (0.63) 2 /2! =
37 Average # of men on base Sum of products = n*p = = 0.63
38 Conjecture: Poisson where np = 0.63 u P(k=3) = 1- P(k=2)-P(k=1)-P(k=0) P(k=0) = e - k /k! = e (0.63) 0 /0! = P(k=1) = e - k /k! = e (0.63) 1 /1! = P(k=2) = e - k /k! = e (0.63) 2 /2! =
39 Goodness of Fit
Chi Square, 3 degrees of freedom 5% 7.81
41 Likelihood Functions u Review OLS Likelihood u Proceed in a similar fashion for the probit
42 Likelihood function u The joint density of the estimated residuals can be written as: u If the sample of observations on the dependent variable, y, and the independent variable, x, is random, then the observations are independent of one another. If the errors are also identically distributed, f, i.e. i.i.d, then
43 Likelihood function u Continued: If i.i.d., then u If the residuals are normally distributed: u This is one of the assumptions of linear regression: errors are i.i.d normal u then the joint distribution or likelihood function, L, can be written as:
44 Likelihood function u and taking natural logarithms of both sides, where the logarithm is a monotonically increasing function so that if lnL is maximized, so is L:
45 Log-Likelihood u Taking the derivative of lnL with respect to either a-hat or b-hat yields the same estimators for the parameters a and b as with ordinary least squares, except now we know the errors are normally distributed.
46 Probit u Example: expenditures on lottery as a % of household income u lottery i = a + b*income i + e i u if lottery i >0, i.e. a + b*income i + e i >0, then Bern i, the yes-no indicator variable is equal to one and e i >- a - b*income i u this determines a threshold for observation i in the distribution of the error e i u assume
i
i Area above the threshold is the probability of playing the lottery for observation i, P yes
i Area above the threshold is the probability of playing the lottery for observation i, P yes P no for observation i
50 Probit u Likelihood function for the observed sample u Log likelihood:
51
i Area above the threshold is the probability of playing the lottery for observation i, P yes P no for observation i
53 Probit u Substituting these expressions for P no and P yes in the ln Likelihood function gives the complete expression.
54 Probit u Likelihood function for the observed sample u Log likelihood: