Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #2 Chi-square Tests for Homogeneity, Chi-square Goodness of Fit Test,
Chi-square Tests Tests for independence in contingency tables Tests for homogeneity
Binomial Samples (Product Binomial Sampling) Genetic Theory: Ho: pW = 0.5 vs. Ha: pW 0.5 Assumptions: 8 samples, mutually independent counts Hypothesis #1: Is pw = 0.5? Binomial inference on p Equivalently, overall goodness of fit (known p) Hypothesis #2: Are all the pw equal? Test for homogeneity (equal but unknown p) Hypothesis #3: Is each pw = 0.5? Goodness of fit (8 samples, known p)
Test of Homogeneity of k Binomial Samples, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj 0.5 for some j Does not assume homogeneity (see below) X2 = 22.96 , df = 8 , p = 0.003
Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pj pk for some (j,k)
Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pj pk for some (j,k) X2 = 20.43 , df = 7 , p = 0.005 Note: Only one of each pair of expected values is independently estimated (k = 8, not 16)
Chi-square Tests Tests for independence in contingency tables Tests for homogeneity Goodness of fit tests
Chi-square Goodness of Fit Test: Specified Probabilities Assumptions n independent observations k mutually exclusive possible outcomes pj = Pr(outcome j) is the same on every trial Sample size condition All npj 1 At least 80% of the npj 5
Goodness of Fit Test: Specified Probabilities Sample size: n Observed count for outcome j : Oj Expected count for outcome j : Ej = npj Ho: Pr(outcome j) = pj for j = 1 , ... , k Ha: Pr(outcome j) pj for at least one j Reject Ho if X2 > Xa2 Xa2 = Chi-Square df = k - 1
Sufficient Evidence of Cognitive Learning Path Chosen A B C D Total Number of rats 4 5 8 15 32 Expected number 8 8 8 8 32 Sufficient Evidence of Cognitive Learning ? p = 0.026 Using a significance level of a = 0.05, there is sufficient evidence (p = 0.026) to reject the hypothesis that rats choose the 4 doors with equal probability.
Mendelian Inheritance Do the genotypes of a cross-breeding occur in the ratio 9:3:3:1 ? Reject Ho if X2 > 7.815 (a = 0.05)
Mendelian Inheritance 0.25 0.08 1.33 1.00 X2 = 0.25 + 0.08 + 1.33 + 1.00 = 2.66 There is insufficient evidence (p > 0.10) at a significance level of 0.05 to conclude that the genotypes from this type of cross-breeding occur in proportions that differ from those predicted by Mendelian inheritance theory.
Chi-Square Goodness of Fit Test: Unknown Parameters Estimate the parameters of the distribution Divide range of data values into mutually exclusive and exhaustive classes Discrete data: often use the values themselves Continuous data: use k = n1/2 or k = log(n) classes Estimate the probability of being in each class Compare the observed (Oi) counts in each class with the estimated expected (Ei) counts
Chi-Square Goodness of Fit Test for the Poisson Distribution Number of senders (automated telephone equipment) in use at a given time 23 – 1 = 22 Categories H0: number ~ Poisson Ha: number not Poisson Reject if X > C20.05(20) = 31.4 df: 22 – 1 (mutually exclusive & exhaustive) – 1 (estimated parameter) = 20
Chi-Square Goodness of Fit Test for the Normal Distribution Divide the data into mutually exclusive and exhaustive (contiguous) classes First and last classes are open-ended ( , U1), (L2,U2), (L3, U3) … (Lk, ) with Lj = Uj-1 Estimate the mean and standard deviation Calculate z-scores for the limits of each class Estimate the Probability Content for Each Class pj = Pr(zLj < z < zUj) Estimate the Expected Frequency for Each Class Ej = npj
Chi-Square Goodness of Fit Test Can be applied to any discrete or continuous probability distribution, only probabilities need be specified: Ei = npi Asymptotic chi-square distribution All Ei > 1 & at Least 80% of the Ei > 5 Does not have the highest power for specific distributions, against specific alternatives Degrees of freedom (k classes) If each class represents an independent sample (i.e, k replicate samples) and all parameters are known (i.e., known probabilities), df = k If the classes represent mutually exclusive and exhaustive categories (i.e., expected frequencies must sum to n), data are independent and from a single sample All parameters are known, df = k – 1 r parameters are estimated: df = k – r – 1 e.g., (n – 1)s2/s2 ~ C2(n – 1)
Goodness of Fit to the Binomial, Known p Normal theory approximation Chi-square tests
Binomial Sample, Specified p: Normal Theory Approximation Genetic Theory: Ho: pW = 0.5 vs. Ha: pW 0.5 Greater Power by Combining Samples (Assuming Homogeneity) p = 0.110
Alternative to the Binomial Test: Chi-square Goodness of Fit, Specified p Genetic Theory: Ho: pW = 0.5 vs. Ha: pW 0.5 p = 0.110
Overall Binomial Test vs. Test of Homogeneity, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj 0.5 for some j X2 = 2.56 , df = 1 , p = 0.110 Greater Power if Homogeneous X2 = 22.96 , df = 8 , p = 0.003 Greater Power if Not Homogeneous
Homogeneity, unspecified p equivalent to independence Binomial Samples pw unspecified Homogeneity, unspecified p equivalent to independence
Some Goodness of Fit Tests Chi-square Goodness-of-fit test Very general, can have little power Kolmogorov-Smirnov goodness-of-fit test Good general test, especially for continuous random variables Wilk-Shapiro test for normality Regarded as the best test for normality
Comparing Odds Ratios Across Categories
Race and Death Penalty Punishment Are the results consistent across aggravation levels ?
Mantel-Haenszel Test Several 2 x 2 tables Assuming a common odds ratio, test that the odds ratio = 1
Race and Death Penalty Punishment Expected frequencies for chi-square test of independence Note: None have sufficient sample sizes for tests of independence
Mantel-Haenszel Test Select one cell; e.g., upper-left Calculate the excess for each table Excess = Observed – Expected e.g., Excess = O11 – E11 Calculate the variances of the excesses Variance = R1R2C1C2/n2(n-1)
Race and Death Penalty Punishment Conclusion: Nearly 7 more white-victim murderers received the death penalty than would be expected if the odds were the same for white- and black-victim murderers
Estimating the Common Odds Ratio Death Penalty and Race