Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ.

Slides:



Advertisements
Similar presentations
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan ANOVA SECTION 8.1 Testing for a difference in means across multiple categories.
Advertisements

Chi-Square Tests 3/14/12 Testing the distribution of a single categorical variable :  2 goodness of fit Testing for an association between two categorical.
Statistics: Unlocking the Power of Data Lock 5 Testing Goodness-of- Fit for a Single Categorical Variable Kari Lock Morgan Section 7.1.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
11-2 Goodness-of-Fit In this section, we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
Fundamentals of Hypothesis Testing: One-Sample Tests
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
Chi-square test or c2 test
Chapter 26 Chi-Square Testing
Chapter 20 Testing hypotheses about proportions
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Testing for an Association between two Categorical Variables
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.1 © 2006 W.H. Freeman and Company.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Lesson Inference for Two-Way Tables. Knowledge Objectives Explain what is mean by a two-way table. Define the chi-square (χ 2 ) statistic. Identify.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Lecture 11. The chi-square test for goodness of fit.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : 
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.2 χ 2 test for association (7.2) Testing for an Association between.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Chapter 13 Section 2. Chi-Square Test 1.Null hypothesis – written in words 2.Alternative hypothesis – written in words – always “different” 3.Alpha level.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
CHAPTER 11: INFERENCE FOR DISTRIBUTIONS OF CATEGORICAL DATA 11.1 CHI-SQUARE TESTS FOR GOODNESS OF FIT OUTCOME: I WILL STATE APPROPRIATE HYPOTHESES AND.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Statistics: Unlocking the Power of Data Lock 5 Section 6.6 Test for a Single Mean.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-Square Goodness-of-Fit Test
Chapter 10 Analyzing the Association Between Categorical Variables
Chi-Square Goodness-of-Fit Tests
CHAPTER 11 Inference for Distributions of Categorical Data
Inference for Relationships
Analyzing the Association Between Categorical Variables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11 Inference for Distributions of Categorical Data
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ 2 goodness of fit (7.1) Chi-Square Goodness-of-Fit Test

Statistics: Unlocking the Power of Data Lock 5 Multiple Categories So far, we’ve learned how to do inference for categorical variables with only two categories Today, we’ll learn how to do hypothesis tests for categorical variables with multiple categories

Statistics: Unlocking the Power of Data Lock 5 Question of the Day Are children with ADHD younger than their peers?

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young? In British Columbia, Canada, the cutoff date for entering school in any year is December 31 st, so those born late in the year are almost a year younger than those born early in the year Is it possible that younger students are being over- diagnosed with ADHD? Are children diagnosed with ADHD more likely to be born late in the year, and so be younger than their peers? Data: Sample of kids aged 6-12 in British Columbia who have been diagnosed with ADHD Morrow, R., et. al., “Influence of relative age on diagnosis and treatment of attention- deficit/hyperactivity disorder in children,” Canadian Medical Association Journal, April 17, 2012; 184(7): Influence of relative age on diagnosis and treatment of attention- deficit/hyperactivity disorder in children

Statistics: Unlocking the Power of Data Lock 5 Categories For simplicity, we’ll break the year into four categories:  January – March  April – June  July – September  October – December Are birthdays for kids with ADHD evenly distributed among these four time periods?

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing 1) State Hypotheses 2) Calculate a statistic, based on your sample data 3) Create a distribution of this statistic, as it would be observed if the null hypothesis were true 4) Measure how extreme your test statistic from (2) is, as compared to the distribution generated in (3)

Statistics: Unlocking the Power of Data Lock 5 Hypotheses Define parameters as  p 1 = Proportion of ADHD births in January – March  p 2 = Proportion of ADHD births in April – June  p 3 = Proportion of ADHD births in July – Sep  p 4 = Proportion of ADHD births in Oct – Dec H 0 : p 1 = p 2 = p 3 = p 4 = 1/4 H a : At least one p i ≠ 1/4

Statistics: Unlocking the Power of Data Lock 5 Observed Counts The observed counts are the actual counts observed in the study Jan-MarApr-JunJuly-SepOct-Dec Observed

Statistics: Unlocking the Power of Data Lock 5 Test Statistic Why can’t we use the familiar formula to get the test statistic? We need something a bit more complicated…

Statistics: Unlocking the Power of Data Lock 5 Expected Counts The expected counts are the expected counts if the null hypothesis were true For each cell, the expected count is the sample size (n) times the null proportion, p i

Statistics: Unlocking the Power of Data Lock 5 Expected Counts n = 32,968 boys with ADHD H 0 : p 1 = p 2 = p 3 = p 4 = ¼ Expected count for each category, if the null hypothesis were true: 32,968(1/4) = 8242 Jan-MarApr-JunJuly-SepOct-Dec Observed Expected8242

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Statistic Need a way to measure how far the observed counts are from the expected counts… Use the chi-square statistic : Jan-MarApr-JunJuly-SepOct-Dec Observed Expected8242

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Statistic ObservedExpectedContribution to χ 2 Jan-Mar Apr-Jun Jul-Sep Oct-Dec

Statistics: Unlocking the Power of Data Lock 5 StatKey More Advanced Randomization Tests: χ 2 Goodness-of-Fit

Statistics: Unlocking the Power of Data Lock 5 Minitab Stat > Tables > Chi-Square Goodness-of-Fit Test

Statistics: Unlocking the Power of Data Lock 5 What Next? We have a test statistic. What else do we need to perform the hypothesis test?

Statistics: Unlocking the Power of Data Lock 5 Upper-Tail p-value To calculate the p-value for a chi-square test, we always look in the upper tail Why?  Values of the χ 2 statistic are always positive  The higher the χ 2 statistic is, the farther the observed counts are from the expected counts, and the stronger the evidence against the null

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution Can we reject the null? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Chi-Square (χ 2 ) Distribution If each of the expected counts are at least 5, AND if the null hypothesis is true, then the χ 2 statistic follows a χ 2 –distribution, with degrees of freedom equal to df = number of categories – 1 ADHD Birth Month: df = 4 – 1 = 3

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Distribution

Statistics: Unlocking the Power of Data Lock 5 χ 2 Null Distribution

Statistics: Unlocking the Power of Data Lock 5 p-value

Statistics: Unlocking the Power of Data Lock 5 Conclusion This is a TINY p-value We have incredibly strong evidence that birthdays of boys with ADHD are not evenly distributed throughout the year.

Statistics: Unlocking the Power of Data Lock 5 Chi-Square Test for Goodness of Fit 1.State null hypothesized proportions for each category, p i. Alternative is that at least one of the proportions is different than specified in the null. 2.Calculate the expected counts for each cell as np i. 3.Calculate the χ 2 statistic: 4.Compute the p-value as the proportion above the χ 2 statistic for either a randomization distribution or a χ 2 distribution with df = (# of categories – 1) if expected counts all > 5 5.Interpret the p-value in context.

Statistics: Unlocking the Power of Data Lock 5 Births Equally Likely? Maybe all birthdays (not just for boys with ADHD) are not evenly distributed throughout the year We have the overall proportion of births for all boys during each time period in British Columbia:  January – March:  April – June:  July – September:  October – December: 0.241

Statistics: Unlocking the Power of Data Lock 5 Hypotheses Define parameters as  p 1 = Proportion of ADHD births in January – March  p 2 = Proportion of ADHD births in April – June  p 3 = Proportion of ADHD births in July – Sep  p 4 = Proportion of ADHD births in Oct – Dec H 0 : p 1 = 0.244, p 2 = 0.258, p 3 = 0.257, p 4 = H a : At least one p i is not as specified in H 0

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young? (BOYS) BirthdayProportion of Births ADHDExpected Counts Jan-Mar Apr-Jun Jul-Sep Oct-Dec n = 32,968 The expected count for the Jan-Mar cell is a) b) c) d)

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young? (BOYS) BirthdayProportion of Births ADHDExpected Counts Contribution to χ 2 Jan-Mar Apr-Jun Jul-Sep Oct-Dec The contribution to χ 2 for the Oct-Dec cell is a) 32.2 b) 55.9 c) d) 168.5

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young? (BOYS) Birth DateProportion of Births ADHD Diagnoses Expected Counts Contribution to χ 2 Jan-Mar Apr-Jun Jul-Sep Oct-Dec

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young (Boys) We have VERY (!!!) strong evidence that boys with ADHD do not have the same distribution of birthdays as all boys in British Columbia. p-value ≈ 0

Statistics: Unlocking the Power of Data Lock 5 A Deeper Understanding The p-value only tells you that the counts provide evidence against the null distribution If you want to know more, look at  Which cells contribute most to the χ 2 statistic?  Are the observed or expected counts higher?

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young? (BOYS) Birth DateProportion of Births ADHD Diagnoses Expected Counts Contribution to χ 2 Jan-Mar Apr-Jun Jul-Sep Oct-Dec Big contributions to χ 2 : beginning and end of the year Jan-Feb: Many fewer ADHD than would be expected Oct-Dec: Many more ADHD cases than would be expected

Statistics: Unlocking the Power of Data Lock 5 ADHD and Birth Year Stop and think about the implications of this! In British Columbia, 6.9% of all boys 6-12 are diagnosed with ADHD 5.5% of boys 6-12 receive medication for ADHD How many of these diagnoses are simply due to the fact that these kids are younger than their peers??? ARE YOU CONCERNED?

Statistics: Unlocking the Power of Data Lock 5 ADHD or Just Young? (Girls) Birth Date Proportion of Births ADHD Diagnoses Jan-Mar Apr-Jun Jul-Sep Oct-Dec Want more practice? Here is the data for girls. (χ 2 = 236.8)

Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 7.1 Do HW 7.1 (due Friday, 11/20)