Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a.

Slides:



Advertisements
Similar presentations
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan ANOVA SECTION 8.1 Testing for a difference in means across multiple categories.
Advertisements

Chi-Square Tests 3/14/12 Testing the distribution of a single categorical variable :  2 goodness of fit Testing for an association between two categorical.
Statistics: Unlocking the Power of Data Lock 5 Testing Goodness-of- Fit for a Single Categorical Variable Kari Lock Morgan Section 7.1.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Chi-Square Analysis Mendel’s Peas and the Goodness of Fit Test.
Chapter 11 Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 26: Comparing Counts
CHAPTER 11 Inference for Distributions of Categorical Data
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
Chapter 13: Inference for Tables – Chi-Square Procedures
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Testing Distributions Section Starter Elite distance runners are thinner than the rest of us. Skinfold thickness, which indirectly measures.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means.
Using Lock5 Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chapter 11: Inference for Distributions of Categorical Data.
Chapter 11: Inference for Distributions of Categorical Data
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Chapter 26 Chi-Square Testing
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Testing for an Association between two Categorical Variables
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Test of Goodness of Fit Lecture 43 Section 14.1 – 14.3 Fri, Apr 8, 2005.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Chapter Outline Goodness of Fit test Test of Independence.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Lecture 11. The chi-square test for goodness of fit.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : 
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.2 χ 2 test for association (7.2) Testing for an Association between.
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Statistics: Unlocking the Power of Data Lock 5 Section 6.8 Confidence Interval for a Difference in Proportions.
CHAPTER 11: INFERENCE FOR DISTRIBUTIONS OF CATEGORICAL DATA 11.1 CHI-SQUARE TESTS FOR GOODNESS OF FIT OUTCOME: I WILL STATE APPROPRIATE HYPOTHESES AND.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Chi-Square Goodness-of-Fit Test
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 10 Analyzing the Association Between Categorical Variables
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Analyzing the Association Between Categorical Variables
Inference for Distributions of Categorical Data
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a single categorical variable :  2 goodness of fit (7.1) Testing for an association between two categorical variables:  2 test for association (7.2)

Statistics: Unlocking the Power of Data Lock 5 Comments on Projects “Random sampling” is different than convenience sampling Even if your initial sample selected is representative, if people choose not to respond this introduces bias Generally a good idea to present the response rate

Statistics: Unlocking the Power of Data Lock 5 Comments on Projects Sample is whatever your variable is measured on (rows of your dataset) The type of inference we have learned can only generalize from sample to population (and only if a random sample) We have not learned how to account for uncertainty in the data values themselves

Statistics: Unlocking the Power of Data Lock 5 Sample and Population Was your sample the same as your population? (Did you have data on all the cases in your population?) a)Yes b)No

Statistics: Unlocking the Power of Data Lock 5 Comments on Projects If you have data on the whole population, inference is not necessary!

Statistics: Unlocking the Power of Data Lock 5 The Big Picture Population Sample Sampling Statistical Inference

Statistics: Unlocking the Power of Data Lock 5 Multiple Categories So far, we’ve learned how to do inference for categorical variables with only two categories Today, we’ll learn how to do hypothesis tests for categorical variables with multiple categories

Statistics: Unlocking the Power of Data Lock 5 Rock-Paper-Scissors (Roshambo) Play a game! Can we use statistics to help us win?

Statistics: Unlocking the Power of Data Lock 5 Rock-Paper-Scissors Which did you throw first? a)Rock b)Paper c)Scissors

Statistics: Unlocking the Power of Data Lock 5 Rock-Paper-Scissors ROCKPAPERSCISSORS How would we test whether all of these categories are equally likely?

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing 1.State Hypotheses 2.Calculate a statistic, based on your sample data 3.Create a distribution of this statistic, as it would be observed if the null hypothesis were true 4.Measure how extreme your test statistic from (2) is, as compared to the distribution generated in (3) test statistic

Statistics: Unlocking the Power of Data Lock 5 Hypotheses Let p i denote the proportion in the i th category. H 0 : All p i s are the same H a : At least one p i differs from the others OR H 0 : Every p i = 1/3 H a : At least one p i ≠ 1/3

Statistics: Unlocking the Power of Data Lock 5 Test Statistic Why can’t we use the familiar formula to get the test statistic? More than one sample statistic More than one null value We need something a bit more complicated…

Statistics: Unlocking the Power of Data Lock 5 Observed Counts The observed counts are the actual counts observed in the study ROCKPAPERSCISSORS Observed361227

Statistics: Unlocking the Power of Data Lock 5 Expected Counts The expected counts are the expected counts if the null hypothesis were true For each cell, the expected count is the sample size (n) times the null proportion, p i ROCKPAPERSCISSORS Observed Expected28.3

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Statistic A test statistic is one number, computed from the data, which we can use to assess the null hypothesis The chi-square statistic is a test statistic for categorical variables:

Statistics: Unlocking the Power of Data Lock 5 Rock-Paper-Scissors ROCKPAPERSCISSORS Observed Expected28.3

Statistics: Unlocking the Power of Data Lock 5 What Next? We have a test statistic. What else do we need to perform the hypothesis test? A distribution of the test statistic assuming H 0 is true How do we get this? Two options: 1)Simulation 2)Distributional Theory

Statistics: Unlocking the Power of Data Lock 5 Simulation Let’s see what kind of statistics we would get if rock, paper, and scissors are all equally likely. 1) Take 3 scraps of paper and label them Rock, Paper, Scissors. Fold or crumple them so they are indistinguishable. Choose one at random. 2) What did you get? (a) Rock(b) Paper(c) Scissors 3) Calculate the  2 statistic. 4) Form a distribution. 5) How extreme is the actual test statistic?

Statistics: Unlocking the Power of Data Lock 5 Simulation

Statistics: Unlocking the Power of Data Lock 5 Chi-Square (  2 ) Distribution If each of the expected counts are at least 5, AND if the null hypothesis is true, then the  2 statistic follows a  2 –distribution, with degrees of freedom equal to df = number of categories – 1 Rock-Paper-Scissors: df = 3 – 1 = 2

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Distribution

Statistics: Unlocking the Power of Data Lock 5 p-value using  2 distribution

Statistics: Unlocking the Power of Data Lock 5 Chi-Square p-values 1.StatKey – simulation or theoretical 2.RStudio: tail.p(“chisquare”,stat,df,tail=“upper”) 3.TI-83: 2 nd  DISTR  7:  2 cdf(  lower bound, upper bound, df Because the  2 is always positive, the p-value (the area as extreme or more extreme) is always the upper tail

Statistics: Unlocking the Power of Data Lock 5 Goodness of Fit A  2 test for goodness of fit tests whether the distribution of a categorical variable is the same as some null hypothesized distribution The null hypothesized proportions for each category do not have to be the same

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Test for Goodness of Fit 1.State null hypothesized proportions for each category, p i. Alternative is that at least one of the proportions is different than specified in the null. 2.Calculate the expected counts for each cell as np i. Make sure they are all greater than 5 to proceed. 3.Calculate the  2 statistic: 4.Compute the p-value as the area in the tail above the  2 statistic, for a  2 distribution with df = (# of categories – 1) 5.Interpret the p-value in context.

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment Source: Mendel, Gregor. (1866). Versuche über Pflanzen-Hybriden. Verh. Naturforsch. Ver. Brünn 4: 3–47 (in English in 1901, Experiments in Plant Hybridization, J. R. Hortic. Soc. 26: 1–32) In 1866, Gregor Mendel, the “father of genetics” published the results of his experiments on peas He found that his experimental distribution of peas closely matched the theoretical distribution predicted by his theory of genetics (involving alleles, and dominant and recessive genes)

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment Mate SSYY with ssyy:  1 st Generation: all Ss Yy Mate 1 st Generation:  2 nd Generation  Second Generation S, Y: Dominant s, y: Recessive PhenotypeTheoretical Proportion Round, Yellow9/16 Round, Green3/16 Wrinkled, Yellow3/16 Wrinkled, Green1/16

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment PhenotypeTheoretical Proportion Observed Counts Round, Yellow9/16315 Round, Green3/16101 Wrinkled, Yellow3/16108 Wrinkled, Green1/1632 Let’s test this data against the null hypothesis of each p i equal to the theoretical value, based on genetics

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment PhenotypeTheoretical Proportion Observed Counts Round, Yellow9/16315 Round, Green3/16101 Wrinkled, Yellow3/16108 Wrinkled, Green1/1632 The  2 statistic is made up of 4 components (because there are 4 categories). What is the contribution of the first cell (315)? (a) 0.016(b) 1.05(c) 5.21(d) 107.2

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment Phenotypepipi Observed Counts Expected Counts Round, Yellow 9/  9/16 = Round, Green 3/  3/16 = Wrinkled, Yellow 3/  3/16 = Wrinkled, Green 1/  1/16 = 34.75

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment  2 = 0.47 Two options: o Simulate a randomization distribution o Compare to a  2 distribution with 4 – 1 = 3 df

Statistics: Unlocking the Power of Data Lock 5 Mendel’s Pea Experiment p-value = Does this prove Mendel’s theory of genetics? Or at least prove that his theoretical proportions for pea phenotypes were correct? a)Yes b)No You can never accept the null hypothesis!!! The null hypothesis can only be rejected with statistically significant results. A high p-value doesn’t let you conclude anything.

Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables The statistics behind a  2 test easily extends to two categorical variables A  2 test for association (often called a  2 test for independence) tests for an association between two categorical variables

Statistics: Unlocking the Power of Data Lock 5 Rock-Paper-Scissors Did reading the example on Rock- Paper-Scissors in the textbook influence the way you play the game? Did you read the Rock-Paper-Scissors example in Section 6.3 of the textbook? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Rock-Paper-Scissors RockPaperScissorsTotal Read Example82616 Did Not Read Example Total H 0 : Whether or not a person read the example in the textbook is not associated with how a person plays rock-paper-scissors H a : Whether or not a person read the example in the textbook is associated with how a person plays rock-paper-scissors

Statistics: Unlocking the Power of Data Lock 5 Expected Counts Calculate the expected count for your cell. Note: Maintains row and column totals, but redistributes the counts as if there were no association

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Statistic  2 = 0.617

Statistics: Unlocking the Power of Data Lock 5 Randomization Test

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Statistic If the expected counts are all at least 5, this can be compared to a  2 distribution with (r – 1)(c – 1) degrees of freedom, where r = number of rows, c = number of columns

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Distribution df = (r – 1)(c – 1) = (2 – 1)(3 – 1) = 2

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Test for Association 1.H 0 : The two variables are not associated H a : The two variables are associated 2.Calculate the expected counts for each cell: Make sure they are all greater than 5 to proceed. 3.Calculate the  2 statistic: 4.Compute the p-value as the area in the tail above the  2 statistic, for a  2 distribution with df = (r – 1)  (c – 1) 5.Interpret the p-value in context.

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins Let’s return to the question of whether metal tags are detrimental to penguins. Source: Saraux, et. al. (2011). “Reliability of flipper-banded penguins as indicators of climate change,” Nature, 469, SurvivedDiedTotal Metal Tag Electronic Tag Total Is there an association between type of tag and survival? (a) Yes(b) No

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins Calculate expected counts (shown in parentheses). First cell: Calculate the chi-square statistic: p-value: SurvivedDiedTotal Metal Tag Electronic Tag Total SurvivedDiedTotal Metal Tag33 (47.38)134 (119.62)167 Electronic Tag68 (53.62)121 (135.38)189 Total df = (2 – 1) × (2 – 1) = 1 p-value = There is strong evidence that metal tags are detrimental to penguins.

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins Difference in proportions test: z = (we got something slightly different due to rounding) p-value = (two-sided) Chi-Square test:  2 = p-value = Note: (-3.387) 2 = 11.47

Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables with Two Categories Each When testing two categorical variables with two categories each, the  2 statistic is exactly the square of the z-statistic The  2 distribution with 1 df is exactly the square of the normal distribution Doing a two-sided test for difference in proportions or doing a  2 test will produce the exact same p-value.

Statistics: Unlocking the Power of Data Lock 5 Summary The  2 test for goodness of fit tests whether one categorical variable differs from a hypothesized distribution The  2 test for association tests whether two categorical variables are associated For both, you calculate expected counts for each cell, compute the  2 statistic as and find the p-value as the area above this statistic on a  2 distribution

Statistics: Unlocking the Power of Data Lock 5 To Do Read Chapter 7 Do Homework 6 (due Tuesday, 11/6)Homework 6