The χ 2 (Chi-Squared) Test. Crazy Dice? You roll a die 60 times and get: 3 ones, 6 twos, 19 threes, 22 fours, 6 fives, and 4 sixes  Is this a fair die?

Slides:



Advertisements
Similar presentations
Chi-square test or c2 test
Advertisements

Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
AP Statistics Tuesday, 15 April 2014 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi-square test for independence;
Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
The Analysis of Categorical Data and Goodness of Fit Tests
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square Goodness of Fit Test
Presentation 12 Chi-Square test.
Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Chapter 13: Inference for Tables – Chi-Square Procedures
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chapter 11: Inference for Distributions of Categorical Data.
Chi-square test Chi-square test or  2 test Notes: Page 217, and your own notebook paper 1.Goodness of Fit 2.Independence 3.Homogeneity.
Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
Chi-square test or c2 test
Chi-square test Chi-square test or  2 test Notes: Page Goodness of Fit 2.Independence 3.Homogeneity.
Chapter 26 Chi-Square Testing
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
+ Chi Square Test Homogeneity or Independence( Association)
Warm up On slide.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
AP Statistics Tuesday, 05 April 2016 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi- square test for independence;
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Goodness-of-Fit and Contingency Tables Chapter 11.
Chapter 12 Lesson 12.2b Comparing Two Populations or Treatments 12.2: Test for Homogeneity and Independence in a Two-way Table.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Chi-square test or c2 test
Inference for Tables: Chi-Squares procedures (2 more chapters to go!)
Chi-square test or χ2 test
Chi-square test or c2 test
Chi-squared test or c2 test
Chi-square test or c2 test
Chi-square test or c2 test
Chi-square test or c2 test
Chi-square test or c2 test
The Analysis of Categorical Data and Chi-Square Procedures
Chi-square test or c2 test
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
UNIT V CHISQUARE DISTRIBUTION
Chi-square test or c2 test
Presentation transcript:

The χ 2 (Chi-Squared) Test

Crazy Dice? You roll a die 60 times and get: 3 ones, 6 twos, 19 threes, 22 fours, 6 fives, and 4 sixes  Is this a fair die? How could we test this to prove it?  Not a mean number of anything  Six proportions to test!

Chi-Squared Test countsTests counts of categorical data Does observed data match what we expect to happen? Three types of χ 2 tests: –Goodness of Fit (one variable) –Independence (two variables) –Homogeneity (one variable from two different samples)

χ 2 Distribution df = 3 df = 5 df = 10

χ 2 Distribution Different df = different curves Skewed right normalAs df increase, curve shifts right & becomes more normal

χ 2 Conditions ReasonablyReasonably random sample CountsCounts of categorical data; we expect each category to happen at least once Sample sizeSample size must be large enough  we expect at least five in each category ***Be sure to list expected counts! Combine these together: All expected counts are at least 5

χ 2 Formula

χ 2 Goodness of Fit Test Univariate data How well do observed counts “fit” what we expect the counts to be? χ 2 cdfUse χ 2 cdf to find p-values categoriesdf = number of categories – 1

Hypotheses: Written in words! H 0 : The data fits what we expect H a : The data doesn't fit what we expect (In context!)

Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to suggest successful people are more likely to be born under some signs than others? Aries 23Libra18Leo20 Taurus20Scorpio21Virgo19 Gemini18Sagittarius19Aquarius24 Cancer23Capricorn22Pisces29 How many would you expect in each sign if there were no difference between them? How many degrees of freedom? We expect CEOs to be born equally under all signs: 256/12 = Since there are 12 signs, df = 12 – 1 = 11

Conditions: Reasonably random sample All expected counts (21.33) > 5 H 0 : The same number of CEO’s are born under each sign. H a : More CEO’s are born under some signs than others. p-value = χ 2 cdf(5.094, 10^99, 11) =.9265 α =.05 Since p-value > α, we fail to reject H 0. There is not sufficient evidence to suggest that more CEOs are born under some signs than others.

A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts, and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112g of Brazil nuts, 183g of cashews, 207g of almonds, 71g of hazelnuts, and 446g of peanuts. You wonder: Is your mix significantly different from what the company advertises? Why is the chi-squared goodness-of-fit test NOT appropriate here? What could we do to use chi-squared? The can has 300 total nuts. What are the expected counts of each type of nut? counts We don't have counts of nuts Count Count the nuts instead of weighing them TypeBrazilCashewAlmondHazelnutPeanut Exp. Count

Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short). A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df? Expected counts: Y & N = Y & S = E & N = E & S = = 16 So we expect 9/16 to be Y & N, 3/16 to be Y & S, 3/16 to be E & N, and 1/16 to be E & S 4 categories  df = 4 – 1 = 3

Are the results consistent with the genetic model's predictions? Conditions: Reasonably random sample All expected counts > 5 (Y/N = 56.25, Y/S = 18.75, E/N = 18.75, E/S = 6.25) H 0 : The distribution of flies is the same as the genetic model. H a : The distribution of flies is different from the genetic model. p-value = χ 2 cdf(5.671, 10^99, 3) =.129α =.05 Since p-value > α, we fail to reject H 0. There is not sufficient evidence to suggest that the distribution of fruit flies is different from the genetic model.

χ 2 Test for Independence Bivariate data from one sample Are two categorical variables dependent or independent? χ 2 -TestUse χ 2 -Test to find p-values

Same conditions and formula!

Hypotheses: Written in words! H 0 : The variables are independent H a : The variables are dependent (In context!)

A beef distributor wants to determine whether there is a relationship between geographic region and preferred cut of meat. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

If beef preference is independent of geographic region, how would we expect this table to be filled in? NorthSouthTotal Cut A150 Cut B275 Cut C75 Total

Expected Counts Assuming H 0 is true,

Degrees of Freedom Or: 1. Cover up one row & one column 2. Count the number of cells remaining

In the actual sample of 500 consumers, the observed counts were as follows: Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? NorthSouthTotal Cut A Cut B Cut C Total

Conditions: Reasonably random sample All expected counts > 5 H 0 : Geographic region and beef preference are independent H a : Geographic region and beef preference are dependent p-value =.0226 α =.05 Since p-value < α, we reject H 0. There is sufficient evidence to suggest that geographic region and beef preference are dependent. Expected Counts:N S A90 60 B C45 30

χ 2 Test for Homogeneity One categoricaltwo (or more) independent samplesOne categorical variable from two (or more) independent samples Are the two populations the same (homogeneous)? χ 2 -TestUse χ 2 -Test to find p-values

Conditions: THE SAME! Formula: THE SAME! Expected counts & df: THE SAME AS INDEPENDENCE! Only Only change? Hypotheses!

Hypotheses: Written in words! H 0 : The distributions are the same H a : The distributions are different (In context!)

The following data is on drinks per week for independently chosen random samples of male and female students. (low = 1-7, moderate = 8- 24, high = 25 or more) Does there appear to be a gender difference with respect to drinking behavior? MenWomenTotal None Low Moderate High Total

Conditions: Reasonably random samples All expected counts > 5 H 0 : Drinking behavior is the same for men & women H a : Drinking behavior is not the same for men & women p-value = 0 α =.05 Since p-value < α, we reject H 0. There is sufficient evidence to suggest drinking behavior is not the same for men & women. Expected Counts: Men Women None Low Mod High