© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions
© 2008 McGraw-Hill Higher Education The Chi-Square Test Chi-Square is a test for a relationship between two nominal variables Calculations are made using a cross- tabulation (or “crosstab”) table, which reports frequencies of joint occurrences of attributes
© 2008 McGraw-Hill Higher Education Crosstab Tables Cross-tabulation or “crosstab” tables are designed to compare the frequencies of two nominal/ordinal variables at once
© 2008 McGraw-Hill Higher Education Sample Crosstab Table Spent night on streets in last 2 weeks by gender among homeless persons On streets MaleFemaleTotal Yes No Total
© 2008 McGraw-Hill Higher Education Reading a Crosstab Table The number in a cell is the frequency of joint occurrences, where a joint occurrence is the combination of categories of the two variables for a single individual From the cell, look up then look to the left E.g., in the table above, the joint occurrence of “male and on-street” is 28, the number in the sample who are both male and spent a night on the streets
© 2008 McGraw-Hill Higher Education Reading a Crosstab Table (cont.) The numbers in the margins on the right side and the bottom present marginal totals, the total number of subjects in a category The grand total (n, the sample size) is presented in the bottom right-hand corner
© 2008 McGraw-Hill Higher Education Crosstab Tables and the Chi-Square Test For the chi-square test, the categories of the independent variable (X) go in the columns of the table, and those of the dependent variable (Y), in the rows E.g.: Is gender a good predictor of who among homeless persons is likely to spend a night on the streets?
© 2008 McGraw-Hill Higher Education Calculating Expected Frequencies In addition to the observed joint frequencies, the chi-square test involves calculating the expected frequency of each table cell The expected frequency of a cell is equal to the column marginal total for the cell (look down) times the row marginal total for cell (look to the right) divided by the grand total
© 2008 McGraw-Hill Higher Education Using Expected Frequencies to Test the Hypothesis The expected frequencies are those that would occur if there is no relationship between the two nominal/ordinal variables The chi-square statistic measures the gap between expected and observed frequencies If there is no relationship, then the expected and observed frequencies are the same and chi-square computes to zero
© 2008 McGraw-Hill Higher Education The Chi-Square Statistic The sampling distribution is generated using the chi-square equation: χ 2 = Σ[(O-E) 2 / E] where O is the observed frequency of a cell, and E is the expected frequency Chi-square tells us whether the summed squared differences between the observed and expected cell frequencies are so great that they are not simply the result of sampling error
© 2008 McGraw-Hill Higher Education When to Use the Chi-Square Statistic 1)There is one population with a representative sample from it 2)There are two variables, both of a nominal/ordinal level of measurement 3)The expected frequency of each cell in the crosstab table is at least five
© 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test Step 1. The H 0 states that there is no relationship between the two variables. When this is the case, chi- square calculates to a value of zero, give or take some sampling error This null hypothesis asserts no difference in observed and expected frequencies
© 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test (cont.) Step 2. The sampling distribution is the chi- square distribution. It describes all possible outcomes of the chi-square statistic with repeated sampling when there is no relationship between X and Y Degrees of freedom are determined by the number of columns and rows in the crosstab table: df = (r -1) (c -1)
© 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test (cont.) Step 4. The test effects are the differences between expected and observed frequencies The test statistic is the chi-square statistic The p-value is obtained by comparing the calculated chi-square value to the critical values of the chi-square distribution in Statistical Table G of Appendix B
© 2008 McGraw-Hill Higher Education The Existence of a Relationship for the Chi-Square Test Existence: Test the H 0 that χ 2 = 0; that is, there is no relationship between X and Y If the H 0 is rejected, a relationship exists
© 2008 McGraw-Hill Higher Education Direction and Strength of a Relationship for Chi-Square Direction: Not applicable (because the variables are nominal level) Strength: These measures exist but are seldom reported because they are prone to misinterpretation
© 2008 McGraw-Hill Higher Education Nature of a Relationship for the Chi-Square Test Nature: Report the differences between the observed and expected cell frequencies for a couple of outstanding cells Calculate column percentages for selected cells
© 2008 McGraw-Hill Higher Education Column and Row Percentages A column percentage is a cell’s frequency as a percentage of the column marginal total A row percentage is a cell’s frequency as a percentage of the row marginal total
© 2008 McGraw-Hill Higher Education Chi-Square as a Difference of Proportions Test The chi-square test is frequently used to compare proportions of categories of a nominal/ordinal variable for two or more groups of a second nominal/ordinal variable Thus, it may be viewed as a difference of proportions test as illustrated in Figure 13-2 in the text
© 2008 McGraw-Hill Higher Education The Binomial Distribution The binomial distribution test is a small single-sample proportions test. Contrast it to the large single-sample proportions test of Chapter 10 The test hinges on mathematically expanding the binomial distribution equation, (P + Q) n
© 2008 McGraw-Hill Higher Education When to Use the Binomial Distribution 1)There is only one nominal variable and it is dichotomous, with P = p [of success] and Q = p [of failure] 2)There is a single, representative sample from one population 3)Sample size is such that [(p smaller )(n)] < 5, where p smaller = the smaller of P u and Q u 4)There is a target value of the variable to which we may compare the sample proportion
© 2008 McGraw-Hill Higher Education Expansion of the Binomial Distribution Equation Expansion of the binomial distribution equation, (P + Q) n, provides the sampling distribution for dichotomous events. That is, the equation describes all possible sampling outcomes and the probability of each, where there are only two possible categories of a nominal variable
© 2008 McGraw-Hill Higher Education An Example of an Expanded Binomial Equation The equation reveals, for example, the possible outcomes of the tossing of 4 coins P = p [heads] =.5; Q = p [tails] =.5; n = 4 coins (P + Q) 4 = P 4 + 4P 3 Q 1 + 6P 2 Q 2 + 4P 1 Q 3 + Q 4 Add the coefficients to get the total number of possible outcomes = 16 The probability of 3 heads and 1 tails, is the coefficient of P 3 Q 1 over the sum of coefficients = 4 over 16 =.25
© 2008 McGraw-Hill Higher Education Pascal’s Triangle Pascal’s Triangle provides a shortcut method for expanding the binomial equation It provides the coefficients for small samples and allows a quick computation of the probabilities of all possible outcomes when P and Q are equal to.5 See Table 13-7 in the text
© 2008 McGraw-Hill Higher Education Features of the Binomial Distribution Test Step 1. H 0 : P u = a target value Step 2. The sampling distribution is an expanded binomial equation for the given sample size
© 2008 McGraw-Hill Higher Education Features of the Binomial Distribution Test (cont.) Step 4. The effect is the observed combination of successes and failures, which corresponds to a term in the equation (e.g., 3 heads and 1 tails, is represented by the term 4P 3 Q 1 ) The test statistic is the expanded binomial equation The p-value is taken directly from the equation (not from a statistical table)
© 2008 McGraw-Hill Higher Education Statistical Follies: Statistical Power and Sample Size For a given level of significance, statistical power is a test statistic’s probability of not incurring a Type II error (i.e., unknowingly making the incorrect decision of failing to reject a false null hypothesis) Low statistical power can result from having too small a sample size