Categorical data analysis
Types of variables Numerical Categorical discrete continuous nominal ordinal
Confidence interval two sided confidence interval error table value – normal distribution u0,05 = 1,96
One – sample test (two sided) H0: π = π0 test criterion table value → u0,05 = 1,96
Two – sample test (two sided) H0: π1 = π2 test criterion table value → u0,05 = 1,96
Contingency tables 2 x 2 Var A/Var B B1 B2 Total A1 a b a+b A2 c d c+d b+d n
Contingency tables i x j Var A/Var B B1 B2 … Bj Total A1 n11 n12 n1j n1. A2 n21 n22 n2. Ai ni1 ni2 nij ni. n.1 n.2 n.j n
Testing independence in two-way CT Null hypothesis Alfa Test criterion computation Table value χ2[(m-1).(n-1)] Conditions for testing n > 40 → χ2 test n (20;40>, some expected frequency is < 5 → Fisher test, if all expected freq are > 5 → χ2 test n <= 20 → Fisher test
Theoretical frequencies Var A/Var B B1 B2 … Bj Total A1 n11 n12 n1j n1. A2 n21 n22 n2. Ai ni1 ni2 nij ni. n.1 n.2 n.j n
Testing independence in two-way CT Fisher test find cell with the lowest value go down by 1 in the cell (final value is 0), all marginal freq are the same computation of probability for each created table ∑pi > 0,05 → H0 is valid
Testing independence in CT table value χ2[(m-1).(n-1)] conditions for testing by χ2 test max 20 % of expected freq is < 5 no expected freq is less than 1
Intensity of dependent Pearson coefficient Cramer coefficient V, where h is min (k;m)
Two-way tables – dependent observation Mc Nemar test 2 x 2 table 1 group of units, observation „before – after“ 2nd measure (after) + - 1st measure (before) a b c d
Two-way tables – dependent observation McNemar test test criterion table value χ2[(m-1).(n-1)] condition b + c > 8 correction
Chances and risks in two-way tables Threats Exposition Var A/Var B B1 B2 Total A1 a b a+b A2 c d c+d a+c b+d n
Chances and risks in two-way tables Relative risk (part 1) if the categories of the B variable are independent on categories of the A variable, RR1 = 1 RR > 1 → in the cell „a“ will be occur higher share of the total frequency than in the cell „c“ how many times is higher the probability of threats
Chances and risks in two-way tables Relative risk (part 2)
Chances and risks in two-way tables Odds ratio ratio of two alternatives of relative risk results (RR1 a RR2) OR values are between zero and infinity independent between variables → OR = 1 how many times is higher chance to threats
Chances and risks in two-way tables Attributive risk difference of probability of incidence B1 for both categories of the first variable – A1 and A2 AR is <-1;1> indicates changes of threat probability
Chances and risks in two-way tables Relative attributive risk is based on attributive risk and it is percentage change of probability of incidence B1 for both categories – A1 and A2 basis for computation is share of frequency incidence in the cell „a“ in relation to marginal frequency of category A1 (share a/(a+b))