Inference for Categorical Data Chi-SquareCh.11
Facts about Chi-Square ► Takes only positive values and the graph is skewed to the right ► Test Statistic ( on AP sheet) ► Conditions: Expected cells are at least 5 and observations are based on a random sample.
3 types ► Goodness of Fit Test ► Test of Independence ► Homogeneity
Goodness of Fit Test ► Is used to determine how well a set of observed values matches a set of expected values.
Goodness of Fit Test ► 1 Categorical Variable ► 1 Population ► df=n-1 (n is the number of categories) ► Expected counts is equal to proportion of sample size ► Large Test statistic means more evidence against the null hypothesis
Chi-Square Goodness of Fit Test (Example from 5 Book pg 279) ► The following are the approximate percentages for the different blood types: A: 40 % B:11% AB: 4% O: 45% A random sample of 1000 black Americans yielded the following blood type data: A- 270, B-200, AB- 40 and O Does this sample provide evidence that the distribution of blood types among black Americans differs from that of white Americans or could the sample values simply be due to sampling variation? One categorical variable- blood type One population- black Americans
Example Continue ► We need to compare the observed values in the sample with the expected values we would get if the sample of black Americans really had the same distribution of blood types as white Americans. Blood Type Observe d Values Expected Values A270.40(1000) =400 B AB4040 O490450
Another example (yellow workbook pg 145) ► A Philadelphia newspaper report claims that 24.1 % of 18-to 24-year-olds who attend a local college are from Delaware, 15.4% are from New Jersey, 50.7% are from Pennsylvania, and the remaining 9.8% are from other states in the region. Suppose that a random sample (size 150) of 18-to-24 year olds is taken at the college and the number from each state/region is recorded.
Continue ► Suppose that a random sample (size 150) of 18-to-24 year olds is taken at the college and the number from each state/region is recorded. The following is our observed values StateNumber of Students Delaware30 New Jersey39 Pennsylvania71 Other10
Continue ► Do these data provide evidence at the α=.05 level that the newspaper report is correct? ► (Answer in workbook pg )
Test of Independence ► 1 Population ► 2 Categorical Variables ► df=( r-1)(c-1) Use matrix ► Null hypothesis: Two variables are independent in the population (not related) ► Alternate hypothesis: They are not independent in the population ( are related)
Example of Test of Independence (5 Book pg 284) ► A random sample of 400 residents of large western city are polled to determine their attitudes concerning the affirmative action admissions policy of the local university. The residents are classified according to ethnicity ( white, black, Asian) and whether or not they favor the affirmative action policy. The results are presented in the following table.
Attitude Toward Affirmative Action Favor Do Not Favor Total White Black Asian Total
Attitude towards Affirmative Action ► We are interested in whether or not, in this population of 400 citizens, ethnicity and attitude towards affirmative action are related ( we have 1 population and two categorical variables)
Another Example Test of Indep. (yellow workbook pg 150) ► A Survey was taken to determine if there is a relationship between students having computers in their homes and in their school divisions (elementary, middle, secondary). A random sample of size 250 produced the following results:
Continue- Computer in Home DivisionYesNo Elementary1461 Middle5025 Secondary8614
► Is there evidence that school division and having a home computer are independent? ► Use a.05 level of significance.
Test of Homogeneity of Proportion or Populations ► 1 Categorical Variable and 2 or more populations ► Degrees of freedom: (r-1)(c-1) same as independent test. ► Null Hypothesis: p1=p2…. ► Alternate Hypothesis: p1 does not equal p2…
Example of Homogeneity (5 Book pg 288) ► We have a random sample of 20 males from the population of males in the school and another independent, random sample of 16 females from the population of females in the school. Within each sample we classify the students as Democrat, Republican, and Independent. The results are presented in the following table.
Continue DemocratRepublican Indepen dent Total Male Female78116 Total
Continue ► We are asking if the proportions of Democrats, Republicans, and Independents are the same within the populations of Males and Females.
Another example: Test of Homogeneity (yellow wb pg 148) ► The table shows the number of Central High School students who passed the AP Calculus AB exam. Has the distribution of scores changed over the past 3 years? Give appropriate statistical evidence to support your answer.
ScoreYear 1Year 2Year
► Has there been a change in the distribution of passing grades on the AB Calculus exam over these three years?