Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.

Slides:



Advertisements
Similar presentations
Chapter 11 Other Chi-Squared Tests
Advertisements

Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Tables
AP Statistics Section 14.2 A. The two-sample z procedures of chapter 13 allowed us to compare the proportions of successes in two groups (either two populations.
Chi Square Procedures Chapter 11.
Does Background Music Influence What Customers Buy?
Chapter 13: Inference for Distributions of Categorical Data
Chapter 26: Comparing Counts
CHAPTER 11 Inference for Distributions of Categorical Data
Analysis of Two-Way Tables Inference for Two-Way Tables IPS Chapter 9.1 © 2009 W.H. Freeman and Company.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 11: Inference for Distributions of Categorical Data Section 11.2 Inference.
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
AP Statistics Section 14.2 A. The two-sample z procedures of chapter 13 allowed us to compare the proportions of successes in two groups (either two populations.
Goodness-of-Fit Tests and Categorical Data Analysis
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
A random sample of 300 doctoral degree
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
11.2 Inference for Relationships. Section 11.2 Inference for Relationships COMPUTE expected counts, conditional distributions, and contributions to the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. 1.. Section 11-2 Goodness of Fit.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Chapter 26 Chi-Square Testing
CHAPTER 11 SECTION 2 Inference for Relationships.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
13.2 Chi-Square Test for Homogeneity & Independence AP Statistics.
Analysis of Two-Way tables Ch 9
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 14: Chi-Square Procedures – Test for Goodness of Fit.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 11: Inference for Distributions of Categorical Data Section 11.2 Inference.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Lesson Inference for Two-Way Tables. Knowledge Objectives Explain what is mean by a two-way table. Define the chi-square (χ 2 ) statistic. Identify.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
The Practice of Statistics Third Edition Chapter 14: Inference for Distributions of Categorical Variables: Chi-Square Procedures Copyright © 2008 by W.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 14 Inference for Distribution of Categorical Variables: Chi-Squared Procedures.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Textbook Section * We already know how to compare two proportions for two populations/groups. * What if we want to compare the distributions of.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from sample data Expected Values– row total *
Introduction The two-sample z procedures of Chapter 10 allow us to compare the proportions of successes in two populations or for two treatments. What.
CHAPTER 11 Inference for Distributions of Categorical Data
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Inference for Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Inference for Two Way Tables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
11.2 Inference for Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Chi Square Procedures Chapter 14

Chi-Square Goodness-of-Fit Tests Section 14.1

 Observed Count – count of your sample  Expected Count – count expected based on proportions  Proportion for each category mult. by sample size  Round to two decimal places  Chi-Square Statistic  Think of it like a distance – either zero or positive, never negative

Properties of Chi Square distributions

Degrees of Freedom and Conditions  Degrees of freedom: number of categories – 1  Conditions:  Large Sample Size All expected count are at least 5

Table D

Calculator  Enter the observed counts in L1  Calculate the expected counts separately and enter them in L2  If you have a TI84 go to Stat, Test, χ 2 GOF and enter info.  TI-83: If that isn’t an option, define L3 as (L1 – L2) 2 /L2 Go to Math/List/Sum (L3) to calculate χ 2 To find the p-value, go to DISTR, χ 2 CDF and enter (your test statistic, very large number, df)

Inference for two way tables Section 14.2

Homogeneity vs. Independence tests  What is important here for what APstat wants is what you SAY about the result. This distinction rests on HOW you get the data.  There will always be 2 factors or variables... you might be looking at the connection between males/females... and if they agree or disagree with some question. So, one variable or factor is gender and the other is response. Now, think of the data you could get.

 A. You could take an SRS of say n= and AFTER the fact, find out if they are M or F and if they said Y or N, and do the cross tabs on the data. For neither of the two factors were you in control of how many you get, how many M or F are in the sample, or how many Y or N responses you get. B. You could decide ahead of time to separate the people into lists of M and F FIRST... and take an SRS of say n=100 from each grouping... n=100 M and n=100 F. Then after that, you sample and ask the question and get a Y or N from each of the 200 people. You still make a cross tabs table... with gender and response as the factors... BUT you know that you will have 100 M and you know you will have 100 F... so those values at the bottom or side headings of the table are KNOWN, ahead of time. In this case, you know what the overall results will be for gender... what you don't know is how they break down WITHIN each of the M or F categories... in terms of response of Y or N. So, here you control n’s for one of the two factors... but not both. APstat says the first will be called a test of independence... and the second will be called a test of homogeneity.

 Homogeneity asks whether the distribution of ONE VARIABLE is the same in TWO (or more) POPULATIONS. The design of the study involves groups that have been sampled (or assigned) separately, and the responses will address that one variable in question. Because the group members were identified in advance of collecting the data, the table's column (or perhaps row) totals are fixed in advance.  Independence asks whether there's an association between TWO VARIABLES in ONE POPULATION. The design of the study involves one group that has been cross-categorized based on responses to two variables. Only the table's total number of respondents is known in advance; the row and column totals appear only after the data have been tallied.

 If we wonder whether there's any association between eye color and handedness, we'd probably choose a random sample of people (one population) and ask them what color their eyes are and what hand they write with (2 variables). That's a question of independence.  We could examine a question about eye color and sex either way.  1) Select one sample of people, then ask about eye color and record the sex of each respondent = independence.  2) Select separate samples of males and females, then ask about eye color = homogeneity.

Homogeneity of populations chi square  The 2 sample Z procedures of last chapter allow us to compare the proportions of successes in two groups. What if we want to compare more than 2 groups?  Example: Does background music influence wine purchases (conditional distributions)  Researchers know that background music can influence the mood and purchasing behavior of customers so they played different music types and recorded the numbers of bottles of each type of wine purchased

Data then Column percents for wine and music Music WineNoneFrenchItalianTotal French Italian Other Total Music WineNoneFrenchItalianTotal French Italian Other Total100

Comparisons of different types of wine sold for different music conditions ss

Comparisons of types of wine sold for different music conditions ss

The problem of Multiple comparisons  Researches expected that music would influence type of wine purchased so music is the explanatory variable and type of wine purchased is the response variable.  In general, easiest way to describe this kind of relationship is to compare the conditional distributions of the response variables for each value of the explanatory variable  This is the column percents that give the conditional distribution of purchases for each type of music played.  But this still doesn’t give us an accurate way of analyzing b/c this becomes multiple comparisons!

Two-Way Tables  First step in comparing several proportions is to arrange the data in a two-way table that gives counts for both successes and failures. Our table in the example is a 3x3 because it has 3 rows and 3 columns (not counting total). It shows the counts for all 9 combinations of our variables.  A table with r rows and c columns is an r x c table

Stating Hypothesis  H 0 : The distribution of the response variable is the same in all “c” populations  For this example we are comparing three populations:  1. bottles of wine sold when no music playing  2. bottles of wine sold when french music playing  3. bottles of wine sold when Italian music is playing  We have 3 independent samples of sizes 84, 75, and 84 from each population.  H 0 : The proportions of each type of wine sold are the same in all 3 populations.

Computing expected cell counts  Chi Square Test: The Chi Square GOF test we did in the first section is the same, just that columns = 1.  With homogeneity tests, df = (r-1)(c-1) but everything else is calculated the same.

Conditions aa

Full example response for Wine  Step 1: Populations and Parameters : We want to use X 2 to compare the distribution of types of wine selected for each type of music. Our hypotheses are  H 0 : The distributions of wine selected are the same in all three populations of music types  H a : The distributions of wine selected are not all the same

 Step 2: Conditions: To use the chi-square test for homogeneity of populations:  The data must come from independent SRS’s from the populations of interest. We are willing to treat the subjects in the three groups as SRS’s from their respective populations.  All expected cell counts are greater than 1, and no more than 20% are less than 5.  Step 3: Calculations:  df (3-1)(3-1) = 4  X 2 = (will show you on calc shortly)  P value =.0019  Step 4: Interpretation:  There is strong evidence to reject the null and conclude that type of music being played has a significant effect on wine sales.

Calculator…finally!  Use a matrix to store observed counts:  Enter observed counts in matrix [A]  Press 2 nd X -1 (MATRIX), to EDIT, choose 1: [A].  Enter the observed counts from the two way table in the matrix in the same locations.  Specify the chi-square test, the matrix where the observed counts are found, and the matrix where expected counts will be stored  Press STAT, TESTS, choose C: χ 2 -Test  Choose calculate or draw.  If you want to see the expected counts, simply display matrix [B]:  2 nd MATRIX choose 2: [B]

Chi Square vs. Z test  They are the same! The 2 prop Z test from section 13.2 and a chi square test (with 1 df) for a 2x2 table  P values will be the same

2. Chi Square test of Association/Independence  The null hypothesis of “no difference among treatments” takes the form of “no association between two categorical variables”  Ex: Exclusive territory clause vs. success of franchise (like mcD). ExclusiveTerritory SuccessYesNoTotal Yes No Total

 1: Hypothesis  H 0 : There is no association between success and exclusive territory  H A : There is an association between success and exclusive territory  2: Conditions  To use the chi-square test of association/independence, we must check that all expected cell counts are at least 1 and that no more than 20% are less than 5 (we’re good)  3: Calculations  The test statisticX 2 is , df = 1, P- value =.013  4: Interpretation  There is sufficient evidence of an association between success and exclusive territory in the population of franchise.  *Association does not imply causation!

How do I tell which Chi Square test to do?  Examine the design of the study  For Association/Independence: There is a single sample from a single population  For homogeneity of populations- there is a sample from each of two or more populations. Each individual is classified based on a single categorical variable.  The statement of the hypothesis differs on the sampling design. It is all in the way you collect the data.  If I survey 1000 females selected at random and 1000 males selected at random to determine political affiliation, then this is a test of homogeneity.  If I call 2000 people at random, then ask their gender and political affiliation, then this is a test for independence.  Asking one question--homogeneity, asking two questions-- independence.