1 Power 14 Goodness of Fit & Contingency Tables
2 Outline u I. Parting Shots On the Linear Probability Model u II. Goodness of Fit & Chi Square u III.Contingency Tables
3 The Vision Thing u Discriminating BetweenTwo Populations u Decision Theory and the Regression Line
4 income education x = a, x 2 > y 2 y = b x, y > 0 mean income non Mean educ. non Mean Educ Players Mean income Players Players Non-players Discriminating line
5 Expected Costs of Misclassification u E C MC = C(n/p)*P(n/p)*P(p) + C(p/n)*P(n/p)*P(p) where P(n) = 23/100 u Suppose C(n/p) = C(p/n) u then E C MC = C*P(n/p)*3/4 + C*P(p/n)*1/4 u And the two costs of misclassification will be balanced if P(p/n) =3/4 = Bern
6 The Regression Line- Discriminant Function u Bern = 3/4 u Bern = c + b 1 *educ + b 2 *income u Bern = 3/4 = *educ * income, or u *educ = *income u Educ = *income, u the regression line
7 Lottery: Players and Non-Players Vs. Education & Income Income ($000) Education (Years) Discriminant Function or Decision Rule: Bern = ¾ = 1.39 – *education – *income Legend: Non-Players Players Mean- Nonplayers Mean-Players
8 II. Goodness of Fit & Chi Square u Rolling a Fair Die u The Multinomial Distribution u Experiment: 600 Tosses
9 The Expected Frequencies
10 The Expected Frequencies & Empirical Frequencies Empirical Frequency
11 Hypothesis Test u Null H 0 : Distribution is Multinomial u Statistic: (O i - E i ) 2 /E i, : observed minus expected squared divided by expected u Set Type I 5% for example u Distribution of Statistic is Chi Square P(n 1 =1, n 2 n 3 =0, n 4 =0, n 5 =0, n 6 =0) = n!/ P(n 1 =1, n 2 =0, n 3 =0, n 4 =0, n 5 =0, n 6 =0) = n!/ P(n 1 =1, n 2 n 3 =0, n 4 =0, n 5 =0, n 6 =0)= 1!/1!0!0!0!0!0!(1/6) 1 (1/6) 0 P(n 1 =1, n 2 =0, n 3 =0, n 4 =0, n 5 =0, n 6 =0)= 1!/1!0!0!0!0!0!(1/6) 1 (1/6) 0 (1/6) 0 (1/6) 0 (1/6) 0 (1/6) 0 One Throw, side one comes up: multinomial distribution
12
13 Chi Square: x 2 = (O i - E i ) 2 = 6.15
Chi Square Density for 5 degrees of freedom %
15 Contingency Table Analysis u Tests for Association Vs. Independence For Qualitative Variables
16 Does Consumer Knowledge Affect Purchases? Frost Free Refrigerators Use More Electricity
17 Marginal Counts
18 Marginal Distributions, f(x) & f(y)
19 Joint Disribution Under Independence f(x,y) = f(x)*f(y)
20 Expected Cell Frequencies Under Independence
21 Observed Cell Counts
22 Contribution to Chi Square: (observed-Expected) 2 /Expected Chi Sqare = = 3.09 (m-1)*(n-1) = 1*1=1 degrees of freedom Upper Left Cell: ( ) 2 /324 = 100/324 =0.31
5% 5.02
24 Using Goodness of Fit to Choose Between Competing Proabaility Models u Men on base when a home run is hit
25 Men on base when a home run is hit
26 Conjecture u Distribution is binomial
27 Average # of men on base Sum of products = n*p = = 0.63
28 Using the binomial k=men on base, n=# of trials u P(k=0) = [3!/0!3!] (0.21) 0 (0.79) 3 = u P(k=1) = [3!/1!2!] (0.21) 1 (0.79) 2 = u P(k=2) = [3!/2!1!] (0.21) 2 (0.79) 1 = u P(k=3) = [3!/3!0!] (0.21) 3 (0.79) 0 = 0.009
29 Goodness of Fit
Chi Square, 3 degrees of freedom 5% 7.81
31 Conjecture: Poisson where np = 0.63 u P(k=3) = 1- P(k=2)-P(k=1)-P(k=0) P(k=0) = e - k /k! = e (0.63) 0 /0! = P(k=1) = e - k /k! = e (0.63) 1 /1! = P(k=2) = e - k /k! = e (0.63) 2 /2! =
32 Goodness of Fit