Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Similar presentations


Presentation on theme: "CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution."— Presentation transcript:

1 CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution will be used in carrying out hypothesis to analyze whether: i.A sample could have come from a given type of POPULATION DISTRIBUTION. ii.Two nominal variable/categorical variable could be INDEPENDENT and HOMOGENEOUS of each other. The chi-square test that will be discussed are: i.Goodness-of-fit Test ii.The Chi-square Test For Homogeneity iii. The Chi-square Test For Independence

2 1) Goodness-of-fit Test In Goodness-of-fit test, chi-square analysis is applied for the purpose of examine whether sample data could have been drawn from a population having a specific probability distribution. In Goodness-of-fit test, the test procedures are appropriate when the following conditions are met : i.The sampling method is simple random sampling. ii.The population is at least 10 times as large as the sample. iii.The variable under study is categorical (qualitative variable). iv.The expected value (e i ) for each level of the variable is at least 5.

3 The table frequency distribution layout: where Test procedure to run the Goodness-of-fit test: 1.State the null hypothesis and alternative hypothesis 2.Determine: i. The level of significance, ii. The degree of freedom, Find the value of from the table of chi-square distribution Category12… k Frequency…

4 3.Calculate the value of where the 4.Determine the rejection region to reject : i. or ii.. 5.Make decision/conclusion.

5 Example: The authority claims that the proportions of road accidents occurring in this country according to the categories User attitude (A), Mechanical Fault (M), Insufficient Sign Board (I) and Fate (F) are 60%, 20%, 15% and 5% respectively. A study by an independent body shows the following data. Can we accept the claim at significance level ? Solution: 1. CategoryAMIFTotal Frequency13035305200

6 2. 3. 4.Since. Thus we accept 5.We conclude that we have no evidence to reject the claim.

7 Example: The number of students playing truancy in a school over 200 school days is showing below. If X is a random variable representing the number of students playing truancy per day, test the hypothesis that X follows the Poisson distribution with mean 3 per day at No. of truancy01234 No of days123245503526 Solution:

8 3) 4) Since 4.472<15.086, so we accept Ho. 5) We conclude that there is not enough evidence to reject the claim. # of truancy # of days, 0120.0498200(0.0498)=9.9 6 0.4178 1320.1493200(0.1493) =29.86 0.1534 2450.2241200(0.2241) =44.82 0.0007 3500.2240200(0.2240) =44.80 0.6036 4350.1681200(0.1681) =33.62 0.0566 260.1847200(0.1847)=36. 94 3.2399

9 2)The Chi-Square Test for Homogeneity The homogeneity test is used to determine whether several populations are similar or equal or homogeneous in some characteristics. This test is applied to a single categorical variable from two different population The test procedure is appropriate when satisfy the below conditions : i.For each population, the sampling method is simple random sampling ii.Each population is at least 10 times as large as the sample iii.The variable under study is categorical iv.If sample data are displayed in contingency table (population x category levels), the expected value (e i ) for each cell of the table is at least 5.

10 Two dimensional Contingency Table layout: The above is contingency table ( r x c ) where r denotes as the number of categories of the row variable, c denotes as the number of categories of the column variable is the observed frequency in cell i, j be the total frequency for row category i be the total frequency for column category j be the grand total frequency for all cell ( i, j ) Column Variable Category B 1 Category B 2 …Category B c Total Row Variable Category A 1 … Category A 2 … Category ……………… Category A r … Total…

11 Test procedure to run Chi-square test for homogeneity: 1.State the null hypothesis and alternative hypothesis Eg: 2.Determine: i. The level of significance, ii. The degree of freedom, where Find the value of from the table of chi-square distribution 3. Calculate the value of using the formula below:

12 4.Determine the rejection region TO REJECT Ho: i. If ii. If p – value approach; 5.Make decision

13 Example: Four machines manufacture cylindrical steel pins. The pins are subjected to a diameter specification. A pin may meet the specification or it may be too thin or too thick. Pins are sampled from each machine and the number of pins in each category is counted. Table below presents the results. Test Too thinOKToo Thick Machine 1101028 Machine 2341615 Machine 312799 Machine 4106010

14 Solution: 1. 2. From table of chi-square: 3.Construct a contingency table: Calculation of the expected frequency: Too thinOKToo ThickTotal Machine 1101028120 Machine 2341615200 Machine 312799100 Machine 410601080 Total6640232500

15 Using the observed and expected frequency in the contingency table, we calculate using the formula given:

16

17 4. 5.

18 Exercise: 200 female owners and 200 male owners of Proton cars selected at random and the colour of their cars are noted. The following data shows the results: Use a 1% significance level to test whether the proportions of colour preference are the same for female and male. Car Colour BlackDullBright GenderMale4011050 Female2080100

19 3)Chi-Square Test for Independence This test is applied to a single population which has 2 categorical variables. To determine whether there is a significant association between the 2 categorical variables. Eg : In an election survey, voter might be classified by gender (female and male) and voting preferences (democrate,republican or independent). This test is used to determine whether gender is related to voting preferences. The test is appropriated if the following are met : 1.The sampling method is simple random sampling ii.Each population is at least 10 times as large as the sample iii.The variable under study is categorical iv.If sample data are displayed in contingency table (population x category levels), the expected value for each cell of the table is at least 5.

20 Note: The procedure for the Chi-square test for independence is the same as the Chi-square test for homogeneity. The only different between these two test is at the determination of the null and alternative hypothesis. The rest of the procedure are the same for both tests. This theorem is useful in testing the following hypothesis:

21 Example: Insomnia is disease where a person finds it hard to sleep at night. A study is conducted to determine whether the two attributes, smoking habit and insomnia disease are dependent. The following data set was obtained. Use a 5% significance level to conduct the study. Insomnia YesNo HabitNon-smokers1070 Ex-smokers832 Smokers2238

22 Solution: The contingency table 1. 2. Insomnia YesNoTotal HabitNon-smokers107080 Ex-smokers83240 Smokers223860 Total40140180

23 3.Using the observed and expected frequency in the contingency table, we calculate using the formula given:

24 4.Since 5.We conclude that the smoking habit and insomnia disease are not independent.

25 Exercise: A study is conducted to determine whether student’s academic performance are independent of their active in co-curricular activities. The following data set was obtained: Use a 5% significance level to conduct the study. Academic Performance LowFairGood Co-curricular Activities Inactive408060 Active309060


Download ppt "CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution."

Similar presentations


Ads by Google