Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-Square Distributions. Recap Analyze data and test hypothesis Type of test depends on: Data available Question we need to answer What do we use to.

Similar presentations


Presentation on theme: "Chi-Square Distributions. Recap Analyze data and test hypothesis Type of test depends on: Data available Question we need to answer What do we use to."— Presentation transcript:

1 Chi-Square Distributions

2 Recap Analyze data and test hypothesis Type of test depends on: Data available Question we need to answer What do we use to examine patterns between categorical variables? Gender Location Preferences

3 t-distribution df = 4 df = 100

4 F-distribution

5 χ-square distribution df = 4 df = 2 df = 10

6 Cumulative Probability

7 Goodness of fit Test for homogeneity Test for independence

8 Testing one categorical value from a single population Example: A manufacturer of baseball cards claims 30% of all cards feature rookies 60% feature veterans 10% feature all-stars Reference: http://stattrek.com/Lesson3/ChiSquare.aspx

9 Data is collected from a simple random sample (SRS) Population is at least 10 times larger than sample Variable is categorical Expected value for each level of the variable is at least 5

10 Steps in the Process State the hypothesis Form an analysis plan Analyze sample data Interpret results

11

12 Analysis Plan Specify the significance level Determine the test method Goodness of fit Independence Homogeneity

13

14 Goodness of fit example Problem Acme Toy Company prints baseball cards. The company claims that 30% of the cards are rookies, 60% veterans, and 10% are All-Stars. The cards are sold in packages of 100. Suppose a randomly-selected package of cards has 50 rookies, 45 veterans, and 5 All-Stars. Is this consistent with Acme's claim? Use a 0.05 level of significance.

15 npE sub i 1000.660 1000.330 1000.110

16 npE sub iO sub i 1000.660453.75 1000.3305013.333 1000.11052.50 19.583

17 Another G of F problem Blemishes01234 # of cars242943842

18 Test for homogeneity Single categorical variable from 2 populations Test if frequency counts are distributed identically across both populations Example: Survey of TV viewing audiences. Do viewing preferences of men and women differ significantly? We make the same assumptions we did for the goodness of fit test Data is collected from a simple random sample (SRS) Population is at least 10 times larger than sample Variable is categorical Expected value for each level of the variable is at least 5 We use the same approach to testing

19 State the hypothesis Data collected from r populations Categorical variable has c levels Null hypothesis is that each population has the same proportion of observations, i.e.: H 0 : P level 1, pop 1 = P level 1, pop 2 =… = P level 1. pop r H 0 ; P level 2, pop 1 = P level 2, pop 2 - … = P level 2, pop r … H 0 : P level c, pop 1 = P level c, pop 2 =…=P level c, pop r Alternative hypothesis: at least one of the null statements if false

20 Analyze the sample data

21 Degrees of freedom d.f.=(r-1) x (c-1) Where r= number of populations c= number of categorical values

22 Analyze the sample data

23

24 Test for homogeneity Problem In a study of the television viewing habits of children, a developmental psychologist selects a random sample of 300 fifth graders - 100 boys and 200 girls. Each child is asked which of the following TV programs they like best. Family GuySouth Park The Simpsons Total Boys503020100 Girls508070200 Total10011090300

25 State the hypotheses Null hypothesis: The proportion of boys who prefer Family Guy is identical to the proportion of girls. Similarly, for the other programs. Thus: H 0 : P boys who like Family Guy = P girls who like Family Guy H 0 : P boys who like South Park = P girls who like South Park H 0 : P boys who like The Simpsons = P girls who like The Simpsons Alternative hypothesis: At least one of the null hypothesis statements is false.

26 Analysis plan

27 Compute the expected frequency counts E r,c = (n r * n c ) / n E 1,1 = (100 * 100) / 300 = 10000/300 = 33.3 E 1,2 = (100 * 110) / 300 = 11000/300 = 36.7 E 1,3 = (100 * 90) / 300 = 9000/300 = 30.0 E 2,1 = (200 * 100) / 300 = 20000/300 = 66.7 E 2,2 = (200 * 110) / 300 = 22000/300 = 73.3 E 2,3 = (200 * 90) / 300 = 18000/300 = 60.0 Family GuySouth Park The Simpsons Total Boys 503020100 Girls 508070200 Total 10011090300

28 Analysis plan Family GuySouth Park The Simpsons Total Boys 50 (33.3)30 (36.7)20 (30.0)100 Girls 50 (66.7)80 (73.3)70 (60.0)200 Total 10011090300

29 Analysis plan p-value use the Chi-Square Distribution Calculator to find P(Χ 2 > 19.91) = 1.0000 Interpret the results

30 Test for independence Almost identical to test for homogeneity Test for homogeneity: Single categorical variable from 2 populations Test for independence: 2 categorical variables from a single population Determine if there is a significant association between the 2 variables Example Voters are classified by gender and by party affiliation (D,R,I). Use X 2 test to determine if gender is related to voting preference (are the variables independent?)

31 Test for independence Same assumptions Same approach to testing Hypotheses Suppose variable A has r levels and variable B has c levels. The null hypothesis states that knowing the level of A does not help you predict the level of B. The variables are independent. H 0 : Variables A and B are independent H a : Variables A and B are not independent Knowing A will help you predict B Note: Relationship does not have to be causal to show dependence

32 Test for independence Problem A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified by gender (male or female) and by voting preference (Republican, Democrat, or Independent). Do men’s preferences differ significantly from women’s? RepublicanDemocratIndependentTotal Men20015050400 Women25030050600 Total450 1001000

33 Test for independence Hypotheses H 0 : Gender and voting preferences are independent. H a : Gender and voting preferences are not independent. Analyze sample data Degrees of freedom Expected frequency counts Chi-square statistic p-value or critical value Interpret the results


Download ppt "Chi-Square Distributions. Recap Analyze data and test hypothesis Type of test depends on: Data available Question we need to answer What do we use to."

Similar presentations


Ads by Google