Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTIK PENDIDIKAN EDU5950 SEM1 2015-16 STATISTIK INFERENSI: PENGUJIAN HIPOTESIS BAGI ANALISIS KHI-KUASA DUA Rohani Ahmad Tarmizi - EDU5950 1.

Similar presentations


Presentation on theme: "STATISTIK PENDIDIKAN EDU5950 SEM1 2015-16 STATISTIK INFERENSI: PENGUJIAN HIPOTESIS BAGI ANALISIS KHI-KUASA DUA Rohani Ahmad Tarmizi - EDU5950 1."— Presentation transcript:

1 STATISTIK PENDIDIKAN EDU5950 SEM1 2015-16 STATISTIK INFERENSI: PENGUJIAN HIPOTESIS BAGI ANALISIS KHI-KUASA DUA Rohani Ahmad Tarmizi - EDU5950 1

2 Chi – Square (x²) Chi – Square (x²)

3 ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) Ini juga merupakan analisis hubungan tetapi lebih dikenali sebagai analisis perkaitan (association) Analisis ini digunakan pakai bagi menentukan perkaitan antara pasangan pembolehubah yang diukur pada skala nominal atau ordinal ataupun jika salah satunya dipadankan dengan data sela dan nisbah. Dengan itu pembolehubah seperti –Bangsa, –Jantina, –Suka/tidak suka makanan, –Tinggi pencapaian/rendah pencapaian, –Kebimbangan tinggi/ kebimbangan sederhana/ kebimbangan rendah Data frekuensi dicerap dengan membilang kejadian (occurance setiap perkara). Sesuai untuk kajian tinjauan Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” memberi kita makluman bahawa ada/tiada perkaitan antara kedua-dua pemboleh ubah.

4 ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) KATAKANLAH, penyelidik mengumpul maklumat tentang bangsa bagi responden dan juga kategori amalan pemakanan setiap responden, ATAU penyelidik tinjau pelajar dibeberapa buah sekolah dari segi jantina dan minta/tidak minat kepada aliran sains ATAU penyelidik tinjau bapa-bapa dan mengumpul maklumat tahap pendidikan (tinggi/ sederhana/ rendah) dan dikaitkan dengan kategori gaji Bagi ketiga-tiga contoh tersebut analisis yang sesuai dijalankan adalah analisis tak parametrik (analisis kuasa-dua khi) dan seterusnya dibina jadual kontingensi atau jadual“crosstabulation”. Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” memberi kita makluman bahawa ada/tiada perkaitan antara kedua-dua pemboleh ubah.

5 ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) Terdapat dua cara/kategori – CHI-SQUARE TEST OF GOODNESS OF FIT dan TEST OF INDEPENDENCE/DEPENDENCE TEST GOODNESS OF FIT – menjawab persoalan “adakah terdapat perbezaan kadar bagi sesuatu perkara/kejadian/persetujuan” TEST OF INDEPENDENCE/ DEPENDENCE – menjawab persoalan “adakah terdapat perkaitan/kebersandaran/ hubungan antara dua perkara

6 ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) Dapatan bagi analisis ini lazimnya dalam bentuk jadual frekuensi yang dipanggil jadual kontingensi atau jadual “crosstabulation”. Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” ini memberi kita makluman bahawa ada/tiada perkaitan yang signifikan antara kedua-dua pembolehubah yang dikaji Ataupun ada/tiada perbezaan frekuensi yang signifikan antara kategori-kategori yang dikaji.

7 Daripada jadual tersebut kita boleh telitikan atau kajikan sama ada terdapat hubungan atau perkaitan antara kedua-dua pemboleh ubah tersebut. Selanjutnya analisis pengujian hipotesis perlu dijalankan ia itu untuk menguji terdapatnya perkaitan antara kedua-dua pemboleh ubah tersebut dengan signifikan. Pengujian hipotesis ini adalah ujian kuasa dua khi. Sekiranya, terdapat perkaitan yang signifikan maka langkah seterusnya adalah dengan menentukan darjah atau magnitud hubungan tersebut.

8 Bagi analisis ini, data adalah dalam bentuk kekerapan dan sudah semestinya taburan skor adalah tidak normal. Dengan itu taburan ini dipanggil taburan bebas (distribution-free). Ujian ini juga dipanggil ujian tak parametrik oleh kerana ia tidak bertabur secara normal. Sebagai “rule-of-thumb” penggunaan ujian parametrik digalakkan oleh kerana oleh kerana “power” atau kekuatannya, walaubagaimana pun jika data adalah dalam bentuk nominal serta juga terdapat taburan data yang tidak normal maka ujian tak parametrik diterima pakai. Ujian-ujian parametrik – sign test, Mann-Whitney U test, Wilcoxon matched-pairs signed ranks, Kruskal-Wallis, Chi-square.

9 9 Different Scales, Different Measures of Association Scale of Both VariablesMeasures of Association Interval or Ratio ScalePearson r Ordinal ScaleSpearman’s rho Nominal ScalePearson Chi-Square (χ 2 ) -Indicates there is significant association between two categorical variables Measure of association is: Phi coefficient Contingency coefficient Cramer’s V coefficient

10 Types of Chi-Square Test 1. Goodness-of-fit To test for certain assumption regarding one categorical variable 2. Test of Independence Test on association between variables regarding contingency tables

11 The Chi-Square Distribution ♠ The Chi-Square distribution has only one parameter degrees of freedom ♠ The shape of the distribution curve is skewed to the right for small degrees of freedom and becomes symmetric for large degrees of freedom ♠ The entire Chi-square distribution lies to the right of the y-axis ♠ the Chi-square distribution assumes nonnegative values

12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 df = 2 df = 7 df = 12 The Chi – Squares Distribution

13 x² Goodness - of - fit

14 14 Steps in Test of Hypothesis – 1.State the null and alternative hypothesis. 2.Determine the appropriate sampling distribution and the critical value. 3.Calculate the test statistics - calculation based on: O – Observed frequency E – Expected frequency Formula to calculate χ ² =  ( O-E) ² / E 4.Make decision. 5.Conclusion.

15 Goodness-of-fit ♠ Test assumption for categorical variable ♠ Only one variable ♠ calculation based on: O – Observed frequency E – Expected frequency ♠ Formula to calculate x² x² = Σ ( O – E )² E 1

16 Hypothesis Test 1 State H O and H A 2 Determine the sampling distribution and Critical Value 3 Calculate x² 4 Decision 5 Conclusion Based on: O and E x² = Σ ( O – E )² E

17 Step in testing Hypothesis 1. State the null and alternative hypotheses H O : Statement of assumption H A : Statement opposite of the assumption 2. Calculate the best statistic x² value 3. Determine critical value α df = k-1 4. Make your decision 5. Make conclusion CriteriaDecision x² cal > x² critical Reject H O, Accept H A x² cal ≤ x² critical Fail to Reject H O CriteriaDecision Sig- x² < αReject H O, Accept H A Sig- x² ≥ αFail to Reject H O Manual SPSS

18 Example 1: The following table displays the age distribution for a sample summoned for traffic violations. Test the hypothesis that the proportion of people summoned for traffic violations is different for all age groups at 0.05 level of significance. Age 49 Summon 32 25 19 16 8 Assumption: p1 = p2 = p3 = p4 = p5 = 0.20

19 Answer: 1. Hypotheses H O : The proportion of people involve in traffic violation is the same for all age groups H A : The proportion of people involve in traffic violation is not the same for all age groups 2. Test statistic AgeOE( O – E ) ( O – E )² E <203220 12 1447.20 20-292520 5 251.25 30-391920 -1 10.05 40-491620 -4 160.80 >49820 -12 1447.20 10016.50 ( O – E )²

20 3. Critical value df = k – 1 = 5 – 1 = 4 x² 4,0.05 = 9.49 4. Decision Since x² cal (16.5) is bigger than x² critical (9.49) Reject H O, H Accept H A. 5. Conclusion: The proportion of people summoned for traffic violations is not the same significantly for all age groups, x² (n=100) = 16.5, p <.05. This indicated that people of different age group differ significantly in frequency of traffic violation. Findings also indicated that the lower age group performed more traffic violations. Fail to reject H O Reject H O α = 0.05

21 SPSS Chi-Square output Age group involved in traffic offences Observed NExpected NResidual <20 32 20.0 12.0 20-29 25 20.0 5.0 30-39 19 20.0 -1.0 40-49 16 20.0 -4.0 >49 8 20.0 -12.0 Total 100 Test Statistics Age group involved In traffic offences Chi-Square 16.500 df 4 Asymp. Sig. 0.002 x² is valid if less than 20% of the cells with Expected values < 5

22 Example 2: In the 2008 poll, adults were asked, “Do you agree with the move to increase highway speed limit to 120 km/hour?” Results revealed that 58% said yes, 31% said no and 11% said do not know. Suppose the result hold true for the 2008. A recent poll produced the following distribution in response to the same question. Test the hypothesis that the current distribution of adult belonging to the three categories is different from that for 2008 at 0.01 level of significance. Table 2 CategoryYesNoDo not know Frequency313146 41 Prob. Yes0.58 No0.31 Do not know0.11 variable – agreement towards the move to increase highway speed limit to 120 km/hour

23 Answer 1. Hypotheses H O : The current percentage distribution of adults belonged to the three categories as that for 2008 H A : The current percentage distribution of adults do not belonged to the three categories from that for 2008 2. Test statistic OpinionOE( O – E ) ( O – E )² E Yes313290 23 5291.824 No146155 -9 810.523 Do not know 41 55 -14 1963.564 5005.911 ( O – E )²

24 3. Critical value df = k – 1 = 3 – 1 = 2 x² 2,0.01 = 9.21 4. Decision Since x² cal (5.911) is smaller than x² critical (9.21) Fail to reject H O 5. Conclusion: The current percentage distribution of adults do belong to the three categories as from that of 2008 at 0.01 level of significance, x² (2, n=500) = 5.911, p >.01. The current percentage is not different from the distribution in 2008. Therefore the agreement of adults in the current survey Is similar to those in year 2008. Fail to reject H O Reject H O α = 0.01 9.21

25 SPSS Chi-Square output Perception Observed NExpected NResidual Agree313290.023.0 Disagree146155.0-9.0 Don’t know4155.0-14.0 Total 500 Test Statistics Perception Chi-Square 5.910 df 2 Asymp. Sig. 0.052 a a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 55.0.

26 x² Test of Independence

27 27 Assumptions Chi Square test of independence-dependence is used when two variables are measured on a nominal scale. Chi-square goodness-of-fit is used for test of differences when you have only one variable. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population.

28 Assumptions All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances

29 29 Steps in Test of Hypothesis – 1.State the null and alternative hypothesis. 2.Determine the appropriate sampling distribution and the critical value. 3.Calculate the test statistics - calculation based on: O – Observed frequency E – Expected frequency Formula to calculate χ ² =  ( O-E) ² / E 4.Make decision. 5.Conclusion.

30 30 The Hypothesis: Whether There is an Association or Not H o : The two variables are independent H a : The two variables are associated or dependent

31 31 Calculating Test Statistics Contrasts observed frequencies in each cell of a contingency table with expected frequencies. The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). Expected frequency of two unrelated events is product of the row and column frequency divided by total number of cases. E or F E = F r F c / N

32 32 Calculating Test Statistics

33 33 Calculating Test Statistics Observed frequencies Expected frequency

34 34 Determine Degrees of Freedom df = (R-1)(C-1) Number of levels in column variable Number of levels in row variable

35 35 Compare computed test statistic against a tabled/critical value The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable The critical tabled values are based on sampling distributions of the Pearson chi- square statistic If calculated  2 is greater than  2 table value, reject H o

36 36 Example Suppose a researcher is interested in voting preferences on environmental control issues. A questionnaire was developed and sent to a random sample of 90 voters. The researcher also collects information about the political party membership of the sample of 90 respondents.

37 37 Bivariate Frequency Table or Contingency Table AGREE (V1)/ TYPE OF VOTERS (V2) FavorNeutralOpposef row Barisan10 3050 Pakatan15 1040 f column 25 40n = 90

38 38 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Barisan10 3050 Pakatan15 1040 f column 25 40n = 90 Observed frequencies

39 39 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Barisan10 3050 Pakatan15 1040 f column 25 40n = 90 Row frequency

40 40 Bivariate Frequency Table or Contingency Table FavorNeutralOpposef row Barisan10 3050 Pakatan15 1040 f column 25 40n = 90 Column frequency

41 41 Determine The Hypothesis  Party Membership ( 2 levels) and Nominal  Preference ( 3 levels) and Nominal Ho : There is no difference between B & P in their opinion on environmental control issue. Ha : There is difference between B & P in their opinion on environmental control issue. Ho: There is no association between responses to the environmental survey and the party membership in the population. Ha: There is association between responses to the environmental survey and the party membership in the population.

42 42 Calculating Test Statistics FavorNeutralOpposef row Barisanf o =10 f e =13.9 f o =10 f e =13.9 f o =30 f e =22.2 50 Pakatanf o =15 f e =11.1 f o =15 f e =11.1 f o =10 f e =17.8 40 f column 25 40n = 90

43 43 Calculating Test Statistics FavorNeutralOpposef row Barisanf o =10 f e =13.9 f o =10 f e =13.9 f o =30 f e =22.2 50 Pakatanf o =15 f e =11.1 f o =15 f e =11.1 f o =10 f e =17.8 40 f column 25 40n = 90 = 50*25/90

44 44 Calculating Test Statistics f o = O f e = E FavorNeutralOpposef row Barisanf o =10 f e =13.9 f o =10 f e =13.9 f o =30 f e =22.2 50 Pakatanf o =15 f e =11.1 f o =15 f e =11.1 f o =10 f e =17.8 40 f column 25 40n = 90 = 40* 25/90

45 45 Calculating Test Statistics = 11.03

46 46 Determine Degrees of Freedom df = (R-1)(C-1) = (2-1)(3-1) = 2

47 3. Critical value df = 2 x² 2,0.05 = 5.99 4. Decision Since x² cal (11.02) is bigger than x² critical (5.99) Reject H O 5.Conclusion: There is significant association between responses to the environmental survey and the party membership (Barisan or Pakatan) in the population, x² ( 2,n=90) = 11.02, p <.05. OR There is significant difference between the Barisan and Pakatan voters in their opinion on environmental control issue, x² ( 2,n=90) = 11.02, p <.05. Fail to reject H O Reject H O α = 0.05

48 48 Compare computed test statistic against a tabled/critical value α = 0.05 df = 2 Critical tabled value = 5.991 Test statistic, 11.03, exceeds critical value Null hypothesis is rejected Barisan & Pakatan differ significantly in their opinions on gun control issues

49 49 Phi Coefficient Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship Phi coefficient is the measure of the strength of the association

50 50 Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable. Total Number of cases Smallest of number of rows or columns

51 51 Contingency (C) When the table is larger than 2 by 3, a different index must be used to measure the strength of the relationship between the variables. One such index is C. If Contingency (C)is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

52 52 SPSS Output for Gun Control Example

53 Test of Independence ♠ Test the null hypothesis that the two variable are not related (independent) against the alternative hypothesis that the two variable are related (dependent) ♠ Contingency table – two variables ♠ Calculation based on: O – Observed frequency E – Expected frequency E = RT x CT GT ♠ Formula to calculate x² x² = Σ ( O – E )² E 2

54 What to expect? Hypothesis Test 1 State HO and HA 2 Calculate x² : 3 Critical Value 4 Decision 5 Conclusion Based on: O and E = np x² = Σ ( O – E )² E Measure of Relationship Phi Contingency Cramer’s V

55 Step in Testing Hypothesis: 1. State the null and alternative hypotheses H O : DV is independent of IV H A : DV is dependent on IV 2. Calculate the test statistic x² Value 3. Determine critical value α df = (R-1)(C-1) 4. Make your decision 5. Make conclusion CriteriaDecision x² cal > x² critical Reject H O x² cal ≤ x² critical Fail to Reject H O CriteriaDecision Sig- x² < αReject H O Sig- x² ≥ αFail to Reject H O Manual SPSS

56 Example 1: A study was conducted to test the relationship between student group (campus and PJJ) and academic performance. Data collected from a randomly selected sample follow. 1. Test the hypothesis on the relationship between the two variables at 0.01 level of significance. 2. Calculate and describe an appropriate measure of association between the two variables. StudentAcademic Performance GroupHighModerateLow Campus937012 PJJ87326

57 Answer: 1. Hypotheses testing a. Hypotheses H O : Academic performance is independent of student group H A : Academic performance is dependent on student group b. Test statistic Calculated expected value for each cell: StudentAcademic PerformanceRow GroupHighModerateLowTotals Campus937012175 (105.0)(59.5)(10.5) PJJ87326125 (75.0)(42.5)(7.5) Column Totals18010218300

58 Chi-Square: AgeOE( O – E ) ( O – E )² E C-H93105.0-121441.371 C-M7059.510.5110.251.853 C-L1210.51.52.250.214 P-H8775.0121441.920 P-M3242.5-10.5110.252.594 P-L67.5-1.52.250.300 3008.252 ( O – E )²

59 3. Critical value df = ( R – 1 )( C – 1 ) = ( 2 – 1 )( 3 – 1 ) = 1 x 2 = 2 x² 2,0.01 = 9.21 4. Decision Since x² cal (8.252) is smaller than x² critical (9.21) Fail to reject H O 5. Conclusion There is not enough evidence from the sample to conclude that the two variables, student groups and academic performance, are dependent at 0.01 level of significance.

60 2. Measure of association for a 2 x 3 contingency table, both contingency and Cramer’s V coefficients are appropriate x² x² + n 8.252 8.252 + 300 0.164 √ x² n(k-1) 8.252 300(2-1) 0.166 √ C = √ V = Negligible association between student group and academic performance Negligible association between student group and academic performance

61 SPSS Chi-Square output Performance is Statistics High Moderate LowTotal Student Campus Count 93 70 12 175 Group Expected Count 105.5 59.5 10.5 175.0 PJJ Count 87 32 6 125 Expected Count 75.0 42.5 7.5 125.0 Total Count 180 102 18 300 Expected Count 180.0 102.0 18.0 300.0 Student group * Performance in Statistics Cross tabulation

62 Asymp. Sig. Value df (2-sided) Pearson Chi-Square 8.253 2 0.016 Likelihood Ratio 8.370 2 0.015 Linear-by-Linear Association 3.762 1 0.009 N of Valid cases 300 Chi-Square Tests a. 0 cells (0.0%) have expected count less than 5. the minimum expected count is 7.50 Value Approx. Sig. Nominal by Phi 0.166 0.016 Nominal Cramer’s V 0.166 0.016 Contingency Coefficient 0.164 0.016 N of valid cases 300 Symmetric Measures a. Not assuming the null hypothesis b. Using the asymptotic standard error assuming the null hypothesis

63 Example 3: Dr Irwan is interested to test the relationship between gender and program of study. Data taken from a randomly selected sample follow. 1. Test the hypothesis on the relationship at 0.01 level of significance. 2. Calculate and describe an appropriate measure of association between the two variables. Gender Program of Study Science Social Science Male 60 110 Female 75 55

64 Answer: 1. Hypotheses testing a. Hypotheses H O : Academic performance is independent of student group H A : Academic performance is dependent on student group b. Test statistic Calculated expected value for each cell: GenderProgram of StudyRow ScienceSocial ScienceTotals Male 60 110170 (76.5) (93.5) Female 75 55130 (58.5) (71.5) Column Totals 135 165300

65 Chi-Square: G-POSOE( O – E ) ( O – E )² E M_Sc6076.5-16.5272.253.559 M_SS11093.5 16.5272.252.912 F_Sc7558.5 16.5272.254.654 F_SS5571.5-16.5272.253.808 30014.933 ( O – E )²

66 With Yate’s correction: G-POSOE( O – E ) (|O – E|-1/2 )² E M_Sc6076.5-16.5256.03.346 M_SS11093.5 16.5256.02.738 F_Sc7558.5 16.5256.04.376 F_SS5571.5-16.5256.03.580 30014.040 (|O – E|-1/2 )²

67 3. Critical value df = ( R – 1 )( C – 1 ) = ( 2 – 1 )( 2 – 1 ) = 1 x 1 = 1 x² 1,0.01 = 6.63 4. Decision Since x² cal (14.040) is bigger than x² critical (6.63) Reject H O 5. Conclusion There is not strong evidence from the sample to conclude that the two variables, gender and program of study, are dependent at 0.01 level of significance.

68 2. Measure of association For a 2 x 2 contingency table, phi coefficients is the most appropriate to be used x² n 14.933 300 0.223 √ √ Ø = Low association between gender and program of study

69 Chi-Square output Performance is Statistics ScienceSocial ScienceTotal Gender Male Count 60 110170 Expected Count 76.5 93.5170.0 Female Count 75 55130 Expected Count 58.5 72.5130.0 Total Count 135 165300 Expected Count 135.0 165.0300.0 Gender * Program of Study Cross tabulation Expected Values Observed Values Contingency Table, 2 x 2


Download ppt "STATISTIK PENDIDIKAN EDU5950 SEM1 2015-16 STATISTIK INFERENSI: PENGUJIAN HIPOTESIS BAGI ANALISIS KHI-KUASA DUA Rohani Ahmad Tarmizi - EDU5950 1."

Similar presentations


Ads by Google