FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.

Slides:



Advertisements
Similar presentations
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests.
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Hypothesis Testing IV Chi Square.
Chapter 13: The Chi-Square Test
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Chapter 26: Comparing Counts
CHAPTER 11 Inference for Distributions of Categorical Data
11-2 Goodness-of-Fit In this section, we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
AM Recitation 2/10/11.
Analysis of Count Data Chapter 26
Goodness-of-Fit Tests and Categorical Data Analysis
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square test or c2 test
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Chapter 26 Chi-Square Testing
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Other Chi-Square Tests
Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.2 © 2006 W.H. Freeman and Company.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Outline Goodness of Fit test Test of Independence.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chi-Square Goodness of Fit Test. In general, the chi-square test statistic is of the form If the computed test statistic is large, then the observed and.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
The p-value approach to Hypothesis Testing
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
AP Statistics Chapter 13 Section 1. 2 kinds of Chi – Squared tests 1.Chi-square goodness of fit – extends inference on proportions to more than 2 proportions.
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
Chapter 12 Lesson 12.2b Comparing Two Populations or Treatments 12.2: Test for Homogeneity and Independence in a Two-way Table.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Analyzing the Association Between Categorical Variables
Presentation transcript:

FPP 28 Chi-square test

More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies of nominal variable to hypothesized probabilities One categorical variable with more than two categories Chi-squared goodness of fit test Test if two nominal variables are independent Two categorical variables with at least one having more than two categories Chi-squared test of independence

Goodness of fit test Do people admit themselves to hospitals more frequently close to their birthday? Data from a random sample of 200 people admitted to hospitals Days from birthday Number of admissions within

Goodness of fit test Assume there is no birthday effect, that is, people admit randomly. Then, Pr (within 7) = =.0411 Pr (8 - 30) = =.1260 Pr (31-90) = =.3288 Pr (91+) = =.5041 So, in a sample of 200 people, we’d expect to be in “within 7” to be in “8 - 30” to be in “ ” to be in “91+”

Goodness of fit test If admissions are random, we expect the sample frequencies and hypothesized probabilities to be similar But, as always, the sample frequencies are affected by chance error So, we need to see whether the sample frequencies could have been a plausible result from a chance error if the hypothesized probabilities are true. Let’s build a hypothesis test

Goodness of fit test Hypothesis Claim (alternative hyp.) is admission probabilities change according to days since birthday Opposite of claim (null hyp.) is probabilities in accordance with random admissions. H 0 : Pr (within 7) =.0411 Pr (8 - 30) =.1260 Pr (31-90) =.3288 Pr (91+) =.5041 H A : probabilities different than those in H 0.

Goodness of fit test: Test statistic Chi-squared test statistic

Goodness of fit test: Test statistic CellObsExpDifDif 2 Dif 2 /Exp In

Goodness of fit test: Calculate p- value X 2 has a chi-squared distribution with degrees of freedom equal to number of categories minus 1. In this case, df = 4 – 1 = 3.

Goodness of fit test: Calculate p- value To get a p-value, calculate the area under the chi-squared curve to the right of Using JMP, this area is If the null hypothesis is true, there is a 70% chance of observing a value of X 2 as or more extreme than Using the table the p-value is between 0.9 and 0.70

Chi-squared table

JMP output admissions

Goodness of fit test: Judging p- value The.70 is a large p-value, indicating that the difference between the observed and expected counts could well occur by random chance when the null hypothesis is true. Therefore, we cannot reject the null hypothesis. There is not enough evidence to conclude that admissions rates change according to days from birthday.

Independence test Is birth order related to delinquency? Nye (1958) randomly sampled 1154 high school girls and asked if they had been “delinquent”. Eldest24450 In Between29312 Youngest35211 Only2370

Sample of conditional frequencies % Delinquent for each birth order status Based on conditional frequencies, it appears that youngest are more delinquent Could these sample frequencies have plausibly occurred by chance if there is no relationship between birth order and delinqeuncy Oldest.05 Middle.085 Youngest.14 Only.25

Test of independence Hypotheses Want to show that there is some relationship between birth order and delinquency. Opposite is that there is no relationship. H 0 : birth order and delinquency are independent. H A : birth order and delinquency are dependent.

Implications of independence Expected counts Under independence, Pr(oldest and delinquent) = Pr(oldest)*Pr(delinquent) Estimate Pr(oldest) as marginal frequency of oldest Estimate Pr(delinquent) as marginal frequency of delinquent Hence, estimate Pr(oldest and delinquent) as The expected number of oldest and delinquent, under independence, equals This is repeated for all the other cells in table

Test of independence Expected counts Next we compare the observed counts with the expected to get a test statistic Oldest In Between Youngest Only

Use the X 2 statistic as the test statistic:

Test of independence: Calculate the p-value X 2 has a chi-squared distribution with degrees of freedom: df = (number rows – 1) * (number columns – 1) In delinquency problem, df = (4 - 1) * (2 - 1) = 3. The area under the chi-squared curve to the right of is less than There is only a very small chance of getting an X 2 as or more extreme than

JMP output for chi-squared test This is a small p-value. It is unlikely we’d observe data like this if the null hypothesis is true. There does appear to be an association between delinquency and birth order.

Chi-squared test details Requires simple random samples. Works best when expected frequencies in each cell are at least 5. Should not have zero counts How one specifies categories can affect results.

Chi-squared test items What do I do when expected counts are less than 5? Try to get more data. Barring that, you can collapse categories. Example: Is baldness related to heart disease? (see JMP for data set) Baldness Disease Number of people None Yes 251 None No 331 Little Yes 165 Little No 221 Some Yes 195 Some No 185 Combine “extreme” and “much” categories Much Yes 50 Much or extreme Yes 52 Much No 34 Much or extreme No 35 Extreme Yes 2 Extreme No 1 This changes the question slightly, since we have a new category.

Chi-squared test for collapsed data for baldness example Based on p-value, baldness and heart disease are not independent. We see that increasing baldness is associated with increased incidence of heart disease.