1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 22 Comparing Counts.

Slides:



Advertisements
Similar presentations
Chapter 26 Comparing Counts
Advertisements

Chi Square Procedures Chapter 11.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 26: Comparing Counts
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
GOODNESS OF FIT TEST & CONTINGENCY TABLE
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
CHAPTER 26: COMPARING COUNTS OF CATEGORICAL DATA To test claims and make inferences about counts for categorical variables Objective:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. 1.. Section 11-2 Goodness of Fit.
Chapter 26 Chi-Square Testing
Chapter 26: Comparing counts of categorical data
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Other Chi-Square Tests
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 26 Comparing Counts.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit Copyright © 2008 by W. H. Freeman & Company Daniel S.
Other Chi-Square Tests
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Lesson Inference for Two-Way Tables. Knowledge Objectives Explain what is mean by a two-way table. Define the chi-square (χ 2 ) statistic. Identify.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
Statistics 26 Comparing Counts. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.
Copyright © 2009 Pearson Education, Inc. Chapter 26 Comparing Counts.
Comparing Counts Chi Square Tests Independence.
Other Chi-Square Tests
CHAPTER 26 Comparing Counts.
Chapter 12 Tests with Qualitative Data
Chapter 25 Comparing Counts.
1) A bicycle safety organization claims that fatal bicycle accidents are uniformly distributed throughout the week. The table shows the day of the week.
Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
15.1 Goodness-of-Fit Tests
Inference for Relationships
Inference on Categorical Data
Lesson 11 - R Chapter 11 Review:
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
WELCOME BACK! days From: Tuesday, January 3, 2017
Chapter 26 Comparing Counts.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Presentation transcript:

1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 22 Comparing Counts

2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Objectives 72. Perform chi-square tests for goodness-of-fit, homogeneity, and independence, to include: writing appropriate hypotheses checking the necessary assumptions drawing an appropriate diagram computing the P-value making a decision, and interpreting the results in the context of the problem.

3 Copyright © 2014, 2012, 2009 Pearson Education, Inc Goodness-of-Fit Test

4 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Are CEO Zodiac Signs Uniform? A survey of 256 randomly selected CEOs If uniform, each sign expected to have 256/12 ≈ 21.3 births Pisces has more. Others have fewer. Is the distribution far enough from uniform to conclude the data do not come from a uniform distribution?

5 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Goodness-of-Fit vs. Two-Proportion z-Test To just test for Pisces, the two-proportion z-test works. Performing 12 separate z-tests each with  = 0.05, P(Type I Error) compounded Instead, the Goodness-of-Fit Test properly combines differences from expected into a single hypothesis test.

6 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Hypotheses for a goodness of fit test H 0 : The distribution of counts occurs in a manner consistent with our model. – For example, if our model is a uniform distribution we might say p 1 =p 2 =p 3 =p 4 H A : the distribution of counts occurs in a manner which is inconsistent with our model. – Note: the distribution of counts can vary from our model in many different ways! Slide 1- 6

7 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Finding Expected Counts 1478 baseball players’ birth months. Find the expected count for each month. January: 1478 × 0.08 = February: 1478 × 0.07 =

8 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Assumptions and Conditions Counted Data Condition The values in each cell are counts. Doesn’t work with percents, proportions, or measurements Independence Assumption The counts in each cell must be independent of each other. For random samples, we can generalize to the entire population.

9 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Assumptions and Conditions (Continued) Sample Size Assumption Expected counts for each cell  5. This is called the Expected Cell Frequency Condition. If the assumptions and conditions are met, we can perform a Chi-Square Test for Goodness-of-Fit.

10 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Checking Assumptions and Conditions Are the assumptions and conditions met for the baseball player births study? Counted Data Condition: Each cell is a count. Independence Assumption: The births are independent. They do not come from a random sample, but they are representative of all players past and present. Expected Cell Frequency Condition: Expected counts range from to All  5. Goodness-of-fit does apply.

11 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chi-Square Calculations Interested in difference between observed and expected: residuals. Make positive by squaring them all. Get relative sizes of the residuals by dividing them by the expected counts. This is a Chi-Square Model with df = n – 1. n is the number of categories, not the sample size.

12 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide Calculations (cont.) The chi-square models are actually a family of distributions indexed by degrees of freedom (much like the t-distribution). The number of degrees of freedom for a goodness-of-fit test is n – 1, where n is the number of categories.

13 Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Shape of  2 With df large, the weighted residuals add up quickly. Unlike z or t, a larger  2 is more common. The mode of  2 is df – 2. The expected value of  2 is df, to the right of the mode due to the right skewed distribution.

14 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide Chi-Square P-Values The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals. If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.” So the chi-square test is always one-sided If the calculated value is large enough, we’ll reject the null hypothesis.

15 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide The mechanics may work like a one-sided test, but the interpretation of a chi-square test is in some ways many-sided. There are many ways the null hypothesis could be wrong. There’s no direction to the rejection of the null model—all we know is that it doesn’t fit. Chi-Square P-Values (cont.)

16 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Can’t Prove the Null We often want to see if the distribution fits a model. Unfortunately, fitting the model is H 0. We can never accept the null hypothesis. A smaller sample size will likely result in failing to reject H 0. This does not mean that a small sample size will help prove the null.

17 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide The Chi-Square Calculation 1.Find the expected values: Every model gives a hypothesized proportion for each cell. The expected value is the product of the total number of observations times this proportion: n*p 2.Compute the residuals: Once you have expected values for each cell, find the residuals, Observed – Expected. 3.Square the residuals.

18 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide The Chi-Square Calculation (cont.) 4.Compute the components. Now find the components for each cell. 5.Find the sum of the components (that’s the chi- square statistic).

19 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide The Chi-Square Calculation (cont.) 6.Find the degrees of freedom. It’s equal to the number of cells minus one. 7.Test the hypothesis.  Use your chi-square statistic to find the P-value. (Remember, you’ll always have a one-sided test.)  Large chi-square values mean lots of deviation from the hypothesized model, so they give small P-values.

20 Copyright © 2014, 2012, 2009 Pearson Education, Inc. TI-83+ for x 2 Goodness of Fit test Enter the observed frequencies in L 1 and the expected frequencies in L 2. Highlight L 3 and type (L 1 - L 2 ) 2 / L 2 enter The computed values should appear in L 3 We now need to sum the values in L 3. We do this by going to List (2 nd Stat) -> MATH -> 5: sum Sum(L 3 ) this gives us our x 2 statistic. To find the p-value we need to go to DISTR (2 nd Vars) -> 7: x 2 cdf(Ans, 1E99, df) Where Ans is the x 2 statistic just calculated 1E99 is a very large number, and df is (# categories – 1) Slide 1- 20

21 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Perform the X 2 Goodness of Fit Test 1478 baseball players’ birth months. Find the expected count for each month. Compute the X 2 test statistic Find the p-value

22 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Baseball Birth Months H 0 : The distribution of baseball player birth months is the same as for the general population. H A : The distribution of baseball player birth months is not the same as for the general population. P-value = There’s evidence that major league ballplayers’ birth months have a different distribution from the rest of us.

23 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Using our calculator -- only for TI 84+ χ 2 Goodness-of-Fit Test for the TI-84 Plus Calculator It is relatively easier to conduct a χ2 goodness-of-fit test if you happen to have a TI-84 Plus calculator. Enter the observed frequencies in L1 and the expected frequencies in L2 Go to Stat -> Tests -> D:X 2 GOF-Test Your calculator will prompt you for the list with the observed frequencies (L1 for this example) and the list with the expected frequencies (L2 for this example), and the number of degrees of freedom. You can find the degrees of freedom by subtracting 1 from the number of categories (k – 1). Then press Calculate Slide 1- 23

24 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chi-Square with StatCrunch Enter Obs and Exp data Stat→Goodness-of-fit→Chi-Square test

25 Copyright © 2014, 2012, 2009 Pearson Education, Inc. A die is filled with a lead weight and then rolled 200 times with the following results: 1: 27 2: 31 3: 42 4: 40 5: 28 6: 32 Use an alpha of 0.05 to test the claim that the outcomes are not equally likely (Triola 2008). H0: The die rolls are evenly distributed over 1-6. HA: The die rolls are not evenly distributed over 1-6 (i.e. the die is not “fair”). Test Statistic (X2) = 5.86 P-Value = 0.32 Conclusion: Since the P-value is greater than alpha, we cannot reject the null hypothesis. Therefore, there is not enough statistical evidence to support the claim that the die is not fair. Slide 1- 25

26 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 39. You are planning to open an old time soda fountain and your partner claims that the public will not prefer any flavor over another. The flavors you serve are cherry, strawberry, orange, lime and grape. After several customers, you stop and take a look at how sales are going and here are the results. The following numbers of people ordered the flavor shown. Cherry 35, Strawberry 32, Orange 29, Lime 26 and Grape 25. Test to see if there was a preference at the 0.05 significance level. H0: The customers have no preference for any flavor. HA: The customers have flavor preferences. Test Statistic (X2) = 2.35 P-Value = 0.67 Conclusion: Since the P-value is larger than alpha, we cannot reject the null hypothesis. Therefore, there is no statistical evidence that customers prefer some flavors more than others. Slide 1- 26

27 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 37. The following data lists automobile fatalities by day of week: Sun: 132 Mon: 98 Tue: 95 Wed: 98 Thu: 105 Fri: 133 Sat: 158 Use an alpha of 0.05 to test the claim that the outcomes are not uniformly spread across the days of the week (Triola 2008). H0: Automobile fatalities are evenly distributed over all seven days of the week. HA: Automobile fatalities are not evenly distributed over all seven days of the week. Test Statistic (X2) = P-Value = Small # Conclusion: Since the P-value is less than alpha, we can reject the null hypothesis. Therefore, the statistical evidence suggests that driving fatalities are not evenly spread throughout the days of the week. Slide 1- 27

28 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 38. The following data lists the birth months of Oscar-winning actors: Jan: 9 Feb: 5 Mar: 7 Apr: 14 May: 8 Jun: 1 Jul:7 Aug: 6 Sep: 4 Oct: 5 Nov: 1 Dec: 9 Use an alpha of 0.05 to test the claim that the outcomes are not uniformly spread across the months (Triola 2008). H0: Actor birth months are uniformly spread out over all 12 months. HA: Actor birth months are not uniformly spread out over all 12 months. Test Statistic (X2) = P-Value = 0.02 Conclusion: Since the P-value is less than alpha, we can reject the null hypothesis. Therefore, the statistical evidence suggests that actor birth months are NOT evenly spread out throughout the year. Slide 1- 28

29 Copyright © 2014, 2012, 2009 Pearson Education, Inc Chi-Square Test for Homogeneity

30 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Plans of Graduates at Different Colleges Are students’ plans the same at different colleges within the same university? Class-size differences cloud the table. Proportions may be helpful.

31 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Test for Homogeneity More than two proportions to compare, so we cannot use the two-proportion z-test. Generalize to the chi-square test for homogeneity. Same mechanics as goodness-of-fit, different hypotheses and conclusions. Chi-square test for homogeneity asks if the distribution is the same for different groups. The test looks for differences large enough to step beyond what we might expect from random sample-to-sample variation

32 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Assumptions and Conditions Same as Goodness-of-Fit Counted Data Condition: Always use counts, not proportions, percents, or measurements. Independence Assumption: Need randomization to generalize to the population. Expected Cell Frequency Condition: All expected counts should be at least 5.

33 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Hypotheses for a X 2 Test of Homogeneity H 0 : The distribution of ________ is the same for __________ – e.g. Students’ post-graduation activities are distributed in the same way for all four colleges. H A : The distribution of ________ is not the same for __________ – e.g. Students’ plans do not have the same distribution.

34 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Calculations Expected Counts 685, or about 47.0%, of the 1456 students who responded to the survey were employed. Of the 448 agriculture students, expect 47% of 448 or to be employed.   Calculation Use the same formula with these observed and expected. df = (R – 1)(C – 1) = (3 – 1)(4 – 1) = 6.

35 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Students’ Choices Same Across Colleges? Think → Plan: I have a table of counts from the college’s class of Hypotheses: H 0 : Students’ post-graduation activities are distributed in the same way for all four colleges. H A : Students’ plans do not have the same distribution.

36 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Students’ Choices Same Across Colleges? Model: The bar chart shows how the percents differ. Counted Data Condition: I will use the table of counts. Independence Assumption: It is not a random sample, but with this large study, most students act independently of the others. Expected Cell Frequency Condition: All  5 Use the  2 -model with df = 6 and do a chi-square test for homogeneity.

37 Copyright © 2014, 2012, 2009 Pearson Education, Inc. X 2 test for Independence/Homogeneity using the TI Enter the contingency table into the TI using the Matrix function. Matrx…Edit…enter to get into the matrix [A]. – A matrix is called by its rows first and then its columns. Enter these numbers accordingly. ○ MATRIX[A] 3x4 – Next fill in the matrix as it appears. Remember to hit enter after each data entry. – Then 2 nd Quit. Stat -> Test -> #C x 2 -test. – We placed the observed data in [A] and the TI will place the expected values in [B] or anywhere you stipulate. – Then ask the TI to calculate. Slide 1- 37

38 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Homogeneity With StatCrunch Enter Data (first column are row labels) Stat → Tables → Contingency → with summary

39 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Students’ Choices Same Across Colleges? Show → Mechanics: I used StatCrunch to produce the expected counts,  2, and P-value.  2 ≈ P-value <

40 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Students’ Choices Same Across Colleges? Tell → Conclusion: The P-value is very small, so I reject the null hypothesis. I conclude that there’s evidence that the post- graduation activities of students from these four colleges don’t have the same distribution.

41 Copyright © 2014, 2012, 2009 Pearson Education, Inc Examining the Residuals

42 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Examining the components of the chi square statistic… After rejecting H 0, we wonder where things differed. The residuals can be helpful. Even better – examine the components: or the standardized residuals: Standardized residuals are like z-scores. Tells us how far the observed is above or below the expected.

43 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Standardized Residuals for College Grads Engineering students disproportionately go to grad school and not pursue other activities. Agriculture students disproportionately pursue other activities.

44 Copyright © 2014, 2012, 2009 Pearson Education, Inc Chi-Square Test of Independence

45 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Race and Traffic Stops Data were collected on drivers’ race and whether they were searched when an officer stopped them. Two categorical variables Contingency tables are used to see if one categorical variable is contingent on another. Are the variables of race and having one’s car searched independent?

46 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Independence Two events are independent if P(A|B) = P(A). If Race and Car Search are independent, we would expect the proportions of “yes” to be the same for the three options for Race. Same criterion as homogeneity The nuances of homogeneity and independence statements are different. Same distribution vs. independent

47 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Hypotheses for a x 2 test of independence H 0 : the variables are independent ○ This is equivalent to saying that the distribution of counts is the same for each group (which was our H 0 for a Test of Homogeneity) H A : the variables are not independent ○ OR can say: (they are dependent, i.e. associated) ○ This is equivalent to saying that the distribution of counts are different among the groups (which was our H A for a Test of Homogeneity. Slide 1- 47

48 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Assumptions and Conditions Counted Data Condition: Always use counts, not proportions, percents, or measurements. Independence Assumption: The data were collected independently. Expected Cell Frequency Condition: All expected counts at least 5. 10% Condition: Data come from a random sample of less than 10% of the population.

49 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Race and Traffic Stops When the assumptions/conditions are met, we can proceed with a Chi-Square Test of Independence. Hypotheses: H 0 : Race and Search are independent. H A : Race and Search are not independent.

50 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chi-Square Mechanics Find the df, expected frequency for Black drivers who were stopped and that cell’s component of  2. df = (2 – 1)(3 – 1) = 2 Expected Frequency: Component (  2 Contribution):

51 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chi-Square Mechanics (TI 83/84) Matrix…Edit…enter to get into the matrix [A]. MATRIX[A] 2x3 Fill in the matrix as it appears above. Remember to hit enter after each data entry. Then 2 nd Quit. Stat -> Test -> #C x 2 -test. Calculate

52 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Race and Traffic Stop Searches P-value < There is strong evidence to conclude that police decisions to search are associated with race. Large residuals for Whites suggests that police search Whites less than independence would predict. It appears that Black drivers’ cars are searched more often.

53 Copyright © 2014, 2012, 2009 Pearson Education, Inc. X 2 test for Independence using the TI A sociologist wishes to see whether the number of years of college a person has completed is related to his or her place of residence. A sample of 84 is selected and classified as shown. LocationNo CollegeBachelorsMasters PhD Urban Suburban Rural67127 At  =0.05, can the sociologist conclude that the years of college education are independent of the residence location? We need to get this contingency table into the TI using the Matrx function. Matrx…Edit…enter to get into the matrix [A]. A matrix is called by its rows first and then its columns. So this is a 3 x 4 matrix. Enter these numbers accordingly. Next fill in the matrix as it appears above. Remember to hit enter after each data entry. Then 2 nd Quit. Stat -> Test -> #C x 2 -test. We placed the observed data in [A] and the TI will place the expected values in [B] or anywhere you stipulate. Then ask the TI to calculate. You get a computed x 2 of and a p-value of.398. Thus we cannot reject the null hypothesis. This suggests independence. Therefore, place of residence and education level are independent of each other Slide 1- 53

54 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Tattoos and Hepatitis Are Tattoo Status and Hepatitis Status independent? Think → Plan: Test for independence. I have a contingency table of 626 patients. Hypotheses: H 0 : Tattoo Status and Hepatitis Status are independent. H A : Tattoo Status and Hepatitis Status are not independent.

55 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Tattoos and Hepatitis Model: The bar chart suggests a difference in Hepatitis C based on tattoo status. Counted Data Condition: I have counts of individuals categorized on two variables. Independence Assumption: Not SRS, but likely independent.

56 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Tattoos and Hepatitis Model: (Continued)  Expected Cell Frequency Condition: Two expected counts < 5. Not all the assumptions are met, but I will carefully go ahead with the chi-square test for independence with df = (3 – 1) × (2 – 1) = 2. I will check the residuals carefully. Find the  2 test statistic and the p-value using your TI.

57 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Tattoos and Hepatitis C Show → Mechanics: With df = 2 the  2 graph looks very different. Use StatCrunch or TI-83/84 just like with Homogeneity.  2 = 57.91, P-value <

58 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Tattoos and Hepatitis C Tell → Conclusion: With P-value < , reject H 0. Tattoo Status and Hepatitis Status are not independent. Since the expected counts are < 5, I need to check that the cells with small expected counts did not influence the results too greatly.

59 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Examining the Residuals Think (again) → Parlor/Hepatitis C residual is large positive. More Hepatitis C from those with tattoos from parlors than expected. None/Hepatitis C residual large negative. Less Hepatitis C from those without tattoos than expected. Cell counts small for Parlor/Hepatitis C. We should report this as a warning, or rethink the data.

60 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Combining Groups Show (More) → Combining all those with tattoos gives large enough counts. Compute  2 again. New  2 = (df = 1) New P-value < Tell (All) → We conclude that Tattoo Status and Hepatitis C Status are not independent. We are concerned, but need more evidence to suggest that tattoo parlors are a problem.

61 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Here is a table showing who survived the sinking of the Titanic based on whether they were crew members or passengers booked in first, second or third-class staterooms: CrewFirstSecond ThirdTotal Alive Dead Total Determine if surviving was independent of cabin status (use alpha=0.01). H0: Cabin class and survivorship are independent. HA: Cabin class and survivorship are dependent. Test Statistic (X2) = P-Value = Tiny, tiny # Conclusion: Since the P-value is smaller than alpha, we reject the null hypothesis. Therefore, we can conclude that there is an association (dependence) between cabin class and survivorship. Slide 1- 61

62 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 34. Use the following data to do a test of independence to see if left- handedness is independent of gender (use alpha=0.05): Left-HandedRight-Handed Male1783 Female H0: “Handedness” and gender are independent. HA: Handedness and gender are dependent. Test Statistic (X2) = 5.52 P-Value = Conclusion: Since the P-value is smaller than alpha, we reject the null hypothesis. Therefore, we can conclude that there is an association (dependence) between handedness and gender. Slide 1- 62

63 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 35. Use the following data to do a test of independence to see if height is independent of gender (use alpha=0.05) ShortTall Male325 Female172 H0: Height and gender are independent. HA: Height and gender are not independent. Test Statistic (X2) = P-Value = Small # Conclusion: Since the P-value is smaller than alpha, we reject the null hypothesis. Therefore, we can conclude that there is an association (dependence) between height and gender. Slide 1- 63

64 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chi-Square and Causation Just like correlation, rejecting H 0 for a chi-square test does not imply causation. Maybe there is a lurking variable. Is there a subculture that tends to have both? Just state independence, not causation.

65 Copyright © 2014, 2012, 2009 Pearson Education, Inc. What Can Go Wrong? Don’t use chi-square methods unless you have counts. Convert percents and proportions back to counts. Beware large samples. There are no confidence intervals to see “how not independent” the categories are. Rarely are categories perfectly independent. Don’t say that one variable “depends” on the other just because they’re not independent “Depends” suggest causation. Just state that the categories are “not independent” or are “associated.”