Presentation is loading. Please wait.

Presentation is loading. Please wait.

13.1 The Chi-square Goodness-of-Fit test

Similar presentations


Presentation on theme: "13.1 The Chi-square Goodness-of-Fit test"— Presentation transcript:

1 13.1 The Chi-square Goodness-of-Fit test
Agenda for 4/8 and 4/9 NEW GROUPS – pick your own CHOOSE Wisely (no less than 3 members) Introduction of X 2 Discussion of Homework

2 Warm Up (please have out HW)
1. List the three types of X 2 tests and what each one is used for. 2. What are the conditions for running a X 2 test? 3. What is the formula for the X 2 test statistic?

3 X 2 - Chi (Ki) The chi-square test is a statistic used to compare and decide whether two or more populations, variables or characteristics are the same as a claim.

4 X 2 - Chi (Ki) It does not matter what the distributions of the populations are so long as the relative frequencies are known for each population or the population and some standard population frequencies.

5 c2 distribution characteristics –
df=3 df=5 df=10

6 c2 distribution characteristics
Different df have different curves Skewed right As df increases, curve shifts toward right & becomes more like a normal curve

7 1st Test - X 2 Goodness of Fit
Goodness of fit – used to test to see if the null hypothesis population distribution is the same as a referenced distribution. (ex: is the companies claim actually true?)

8 2nd - X 2 Test of Homogeneity
Homogeneity – is an overall test that tells us whether the data give a good indication that the categorical variable is the same in multiple populations. (ex: do python eggs hatch more or less in cold, neutral or warm waters?)

9 3rd – X 2 Test of Association/Independence
Test for Independence – used to test the association/independence between categorical variables (ex: is there a relationship between patient survival and pet ownership?)

10 We will focus on GOF today

11 Homework 13.1 p. 736 a) X 2 =1.41, df = 1, p-value is between .20 and .25 and can be written .20 < p < .25 b) X 2 =19.62, df = 9, .02 < p-value < .025 c) X 2 =7.04, df = 6, p-value is off the chart to the left, therefore the p-value >.25

12 Homework #13.2 Are you married? Page 736

13 Step 1 – State the Ho and Ha
Ho: The marital-status distribution of year old males is the same as that of the population as a whole (as stated in the 2000 census). Ha: The marital-status distribution of year old males is different as that of the population as a whole.

14 Step 2 - Choose the appropriate test and Check Conditions
We can use a goodness of fit test to measure the strength of evidence against the hypothesized distribution (marital status) provided all expected counts are greater than 5. Expected (np) (500x.281)140.5 (500x.563)281.5, (500x.064)32 (500x.092)46 , since all EV’s > 5 Therefore we can proceed with the test.

15 Step 3 – Carry out the Inference procedure
Martial Status Never Married Widowed Divorced Percent 28.1% 56.3% 6.4% 9.2% Freq. 260 220 20 Expected Df=?

16 Step 3 – Carry out the Inference procedure
Martial Status Never Married Widowed Divorced Percent 28.1% 56.3% 6.4% 9.2% Freq. 260 220 20 Expected 140.5 281.5 32 46 Df=?

17 Step 3 – Carry out the Inference procedure
Martial Status Never Married Widowed Divorced Percent 28.1% 56.3% 6.4% 9.2% Freq. 260 220 20 Expected 140.5 281.5 32 46 101.64 13.436 14.696 161.77 Df=4-1

18 Step 4-Interpret results in the context of the problem
Since the X 2 = with a df = 3, our p-value is off the chart to the right and essentially 0. With an alpha level of 5% or even 1%, this is strong evidence to reject the Ho and claim that the distribution of marital status is different among 25-29year old males than that of the population as a whole.

19 Crossing Tobacco Plants
13.3 Genetics: Crossing Tobacco Plants Ho: The ratio of green to yellow-green to albino tobacco plants has a 1:2:1 ratio. (25%, 50%, 25%) Ha: The ratio of green to yellow-green to albino tobacco plants does not have a 1:2:1 ratio. Obs % Exp np Chi Stat 22 green 25% .25x84= 21 .048 50 green- Yellow 50% .50x84= 42 1.524 12 yellow 3.857 N=84 Sum = 5.4286 Df = 3-1=2 .10>p>.05 P=.066 We will use a Chi squared GOF given all expected counts > 5. 21, 42, 21 > 5; we may proceed

20 Step 4-Interpret results in the context of the problem
Since the X 2 = with a df = 2, our p-value = 0.066, with an alpha level of 5% there is not strong evidence and fail to reject the Ho. We claim that the distribution of tobacco plants is the same as the genetic model.

21 13.4 Doctoral Degrees and Race/Ethnicity
Obs % Exp np Chi Stat White 189 78.9% 236.7 9.6125 Black 10 3.9% 11.7 .24701 Hispanic 6 1.4% 4.2 .77143 Asian/ Pacific 14 2.7% 8.1 4.2975 Am. Ind Alas. Nat 1 0.4% 1.2 .03333 Non alien 80 12.8% 38.4 45.067 N=300 DF=5 P<.0005 P= Sum = 60.029 13.4 Doctoral Degrees and Race/Ethnicity Ho: The distribution of doctoral degrees in 1994 is the same as in 1981 Ha: The distribution of doctoral degrees in 1994 is not the same as in 1981 We will use a Chi squared GOF given all expected counts > 5. Hmmm, 2 are not > 5; we will proceed with caution

22 Step 4-Interpret results in the context of the problem
Since the X 2 = with a df = 5, our p-value is off the chart to the right and essentially zero. With an alpha level of 5% or even 1%, we have strong evidience to reject the Ho and claim that the distribution of Doctorates by ethnicity is NOT the same in 1994 as it was in 1981. However, we need to be cautious of our findings since we did not meet the expected value criteria.

23 Homework Read and take notes 13.2
Figure out what type of M&M your group wants to bring or you all can choose skittles. You will need to buy ONE LARGE Family Size Bag of the item. Do #’s 13 (13.1),14,16, 20, 22

24 # 7 - Is your random number
generator working? Turn to page 742. We will work a and b as a class.

25 What do we know about random digits?
A table of random digits is a long string of digits 0,1,2,3,4,5,6,7,8,9 with 2 properties: 1. Each entry in the string (or table) is equally likely to be any of the 10 digits 0-9. 2. The entries are independent of each other.

26 a) Step 1 – State the Ho and Ha
Ho: p0=p1=p2…….=p9 which is = .1 Ha: At least one of the p’s is not = .1 You are looking for uniform, therefore all =

27 b) RUN SIMULATION In this case, we want everyone to have the same values. DO this : 123 → rand Then randInt(0,9,200) → in list 4

28 C and D C) Histogram; using trace get the observed counts and place in list 1 D) Expected Counts - Expected is (np) therefore .1x200 = 20 → list 2

29 e) Step 2-Choose the appropriate test and Check Conditions
We can use a goodness of fit test to measure the strength of evidence against the hypothesized distribution (the claim is that the proportion’s are = .1) provided all expected counts are greater than 5. Expected (np) are .1x200 = 20 all > 5 Therefore we can proceed with the test.

30 Step 3 – Carry out the Inference procedure RUN SIMULATION (seed to 123 to get my values) 123 → rand, then randInt(0,9,200) → in list X 1 2 3 4 5 6 7 8 9 P(X) .1 Obs Exp Df ?

31 Step 3 – Carry out the Inference procedure RUN SIMULATION (seed to 123 to get my values) 123 → rand, then randInt(0,9,200) → in list X 1 2 3 4 5 6 7 8 9 P(X) .1 Obs 13 19 25 23 20 17 27 12 Exp Df ?

32 Step 3 – Carry out the Inference procedure RUN SIMULATION (seed to 123 to get my values) 123 → rand, then randInt(0,9,200) → in list X 1 2 3 4 5 6 7 8 9 P(X) .1 Obs 13 19 25 23 20 17 27 12 Exp n=200 Df ?

33 Step 3 – Carry out the Inference procedure RUN SIMULATION (seed to 123 to get my values) 123 → rand, then randInt(0,9,200) → in list X 1 2 3 4 5 6 7 8 9 P(X) .1 Obs 13 19 25 23 20 17 27 12 Exp 2.45 .05 1.25 .45 3.2 Df?

34 Step 3 – Carry out the Inference procedure RUN SIMULATION (seed to 123 to get my values) 123 → rand, then randInt(0,9,200) → in list X 1 2 3 4 5 6 7 8 9 P(X) .1 Obs 13 19 25 23 20 17 27 12 Exp 2.45 .05 1.25 .45 3.2 13.2 9Df P = .1537 .20<p <.15

35 Step 4-Interpret results in the context of the problem
Since the ΣX 2 = 13.2 with a df = , our p-value is .15<p<.20 With an alpha level of 5%, this is not significant evidence to reject the null hypothesis, therefore we can say there is no evidence to say the sample data was generated from a distribution different from the uniform distribution.

36 Warm Up

37 Carnival Games # 13 page 744

38 Run the test Part I II III IV Freq 95 105 135 165 Exp. 125 7.2 3.2 .8
12.8 =24 Off the chart to the right p<.0005

39 What were we thinking? Ho: The carnival wheel is balance and all 4 parts are evenly distributed. Ha: The carnival wheel is not balanced and all 4 parts are NOT evenly distributed. Since all exptected values are > than 5 (E=125) we can use the X 2 test. Running the test gave us a X 2 of 24 w/ 3df and a p<.0005 We have sufficient evidence to reject the null and make a claim that the wheel is not balanced. Where is the most significant X 2 ?

40 2nd - X 2 Test of Homogeneity (or two way tables)
Homogeneity – is an overall test that tells us whether the data give a good indication that the categorical variable is the same in multiple populations. We are testing population proportions for a categorical variable. The null hypothesis states that all the proportions are equal. The alternative states that they are not all equal

41 Expected Counts for two way tables

42 3rd – X 2 Test of Association/Independence
Test for Independence – used to test the association/independence between categorical variables. An SRS is drawn from a population and observations are classified according to two categorical variables. The null hypothesis is there is NO relationship between the row variable and the column variable. The alternative would state that there is a relationship.

43 Expected Counts for two way tables

44 Homework Pulling together 13 Do #25,35,39

45 How to quit smoking # 14,16

46 Smoking by Students and Their Parents
# 20, 22


Download ppt "13.1 The Chi-square Goodness-of-Fit test"

Similar presentations


Ads by Google