Analysis of Discrete Variables Gender (x1 = male, x2 = female) Educ. level (x1 = low, x2 = middle, x3 = high) 5-point scales (x1 = 1, x2 = 2, ..., x5 = 5) Diagnosis (x1 = Neurosis, x2 = Schizophrenia, ...)
Distribution of Discrete Variables: The General Case x x x .... x 1 2 3 k p p p .... p 1 2 3 k
An Example of Discrete Distribution xi: 1 2 3 pi: 0.20 0.35 0.40 0.05
Analysis of 1 discrete variable in 1 population
Distribution Fitting Hypothetical example: Which do you prefer best: Coke (x1), Pepsi (x2), or Fanta (x3)? Null hypothesis: H0: P(x1) = P(x2) = P(x3) = 1/3 Obtained frequencies (ni): Out of 150 Ss n1 = 80, n2 = 50, n3 = 20 Expected frequencies (ni): If H0 were true, one would expect 50-50-50 for each ni.
Distribution Fitting with 2-test The greater the difference between obtained (ni) and expected (ni) frequencies, the greater the likelihood that H0 is false. A possible measure of the difference: c2 = (n1 - n1)2/n1 + (n2 - n2)2/n2 + ... + (ng - ng)2/ng If H0 is true, the distribution of c2 is approximately chi-square, with df = g - 1.
Calculations ni: 80 50 20 S=150 ni: 50 50 50 S=150 18 + 0 + 18 = 36 > 9.210 = c20.01 (df = 2) Thus we reject H0 and say: ‘The three proportions differ significantly.’
2-test c2 < c20.05 c2 ³ c20.05 Keep H0 Condition: ni ³ 5 H0: P(x1) = p1, P(x2) = p2 , ... , P(xg) = pg X-sample 0,6 c2 (df=1) 0,4 0,2 (df =g - 1) 0.95 0.05 c2 1 2 3 0.05 c2 < c20.05 c2 ³ c20.05 Keep H0 HA: For at least one i: P(xi) ¹ pi
Comparing 2 Populations by means of 1 Discrete Variable Example: Is there a difference between males and females with respect to education level (EL)? H0: The distribution of EL is the same among males and females P(xi|Males) = P(xi|Females), (i = 1, 2, 3) x1 = Low, x2 = Middle, x3 = High
Two-Way Frequency Table Low Middle High Total Male 16 32 32 n1=80 Female 18 45 27 n2=90 Total 34 77 59 N=170
Two-Way Frequency Table: Row Percentages Low Middle High Total Male 20% 40% 40% 100% Female 20% 50% 30% 100% Total 20% 45.3% 34.7%
2-test for Comparing Groups If H0 is true then Follows c2-distribution with df=(g-1)·(h-1). Decision c2 < c20.05: Keep H0 (p > .05 n. s.). c2 ³ c20.05 : Reject H0 (p < .05 significant).
Comparison of Males and Females Number of rows: g = 2 Number of columns: h = 3 Degrees of freedom: df = (2-1)×(3-1) = 2 Critical values: c20.1 = 4.605; c20.05 = 5.991; c20.01 = 9.210 Computed chi-square value: c2 = 2.155 Decision: Keep H0 (p > .10 n. s.).
General Case Condition: nij ³ 5 df = (g-1)×(h-1) nij= (ni×mj)/N Samples X=x X=x X=x3 ... Total 1 2 Sample 1 n n n n 11 12 13 1 Sample 2 n n n n 21 22 23 2 nij= (ni×mj)/N ... Total m m m N 1 2 3 df = (g-1)×(h-1) Condition: nij ³ 5
Comparing the Distribution of 2 Variables in 1 Population Example: Lecture about the disadvantages of smoking. Outcome: 8 of 36 students give up smoking, 3 start smoking. Any effect? H0: The proportion of smokers does not change. Indicatior of change: x1= positive change, x2 = negative change H0: P(x1) = P(x2)
Computation: McNemar’s test FrequencyTable: Smoking Time 2: No Time 2: Yes Time 1: No a b = 8 Time 1: Yes c = 3 d Computation: McNemar’s test Condition: (b+c)/2 ³ 5, that is b+c ³ 10
More General Cases X is arbitrary, two related samples: Bowker’s test X is dichotomous, h related samples: Cochran’s Q test
Relationship of 2 Discrete Variables Girls at 15 Makes friends easily Test of independence = Comparisons
Table of Row Percentages Girls at 15 Makes friends easily
Table of Column Percentages Girls at 15 Makes friends easily
Strength of Relationship Cramér’s contingency coefficient: Ordinally scaled variables: Kendall’s G Dichotomous variables: G= Yule’s Q