Presentation is loading. Please wait.

Presentation is loading. Please wait.

Contingency (frequency) tables

Similar presentations


Presentation on theme: "Contingency (frequency) tables"— Presentation transcript:

1 Contingency (frequency) tables
Dependence of two qualitative variables

2 Examples of problems Is survival of a person send to choleric area dependent on the fact whether the person have been vaccinated against cholera or not? Is there any connection between hair colour and sex? Are parasite species distributed independently?

3 Contingency table

4 Dependence of survival on vaccination
Mutual dependence of two species

5 Relationship between two categorial variables in table
in the case, when one from the variables is manipulated in the case, when one of the variables is probably a cause and the second one is a consequence (response), but the study is based on non-manipulative observations And finally, in the case, when the possible causality is unclear

6 Basic rules from theory of probability
Probability of common occurrence of two independent events is Pi,j = Pi . Pj Example: In population is a half of its members male gender (Pmale=0.5) and a tenth of all individuals are albino (Palbino =0.1). If albinos are equally common in both sexes (i.e. albinism and sex are independent events), then probability that randomly chosen individual is albino male is Pmale * Palbino 0.5 * 0.1 = 0.05

7 Basic rules from theory of probability
Expected number of successes E(a) from n experiments, where probability of a success is Pa is E(a)=Pa . n Example: Probability that mutation occurs is in 100 randomly chosen individuals we expect 2 individuals with this mutation

8 How we compute 2 ? How we obtain expected values?
H0 says – events are independent – so, with help of probability of common occurrence of two independent events.

9 Calculation of expected values
With help of marginal sums Pi. = Ri /n P.j = Cj / n Pij=Pi.P.j, E (fij) = Pij . n = (Ri / n) . (Cj / n) . n = Ri . Cj / n

10 What I need to know to know result of complete experiment (given the fixed marginal frequencies?)
df = (c-1) . (r - 1) number of rows number of columns

11 Critical value on 5% level of significance by df=3.

12 What we usually write to our paper
This area is 0.029, so we write 2 =8.99, df=3, P=0.029

13 Even here is sometimes (when extremely low expected frequencies) used Yates’ correlation
better protection against Type I error, but weaker test

14 Another test criteria, but also with 2 distribution
so-called 2 likelihood ratio (LR)

15 Similar results “Normal” 2 =8.99

16 2 by 2 tables Notice, that for null hypothesis’ table holds ad = bc

17 Statistical and causal dependence
Causal dependence can be proved just due to manipulative experiment For “correct” experiment everyone has to be vaccinated, but half of them gets just placebo (compare what is possible and what is demanded by statistics).

18 Fundamentals of experimenter
Every treatment has to have its control Control differs from treatment just in impact, which I want to prove (it is often very difficult) I have to have independent replications

19 Advantages of experiment and observation study
Causality can be proved due to experiment Range of experimental manipulations is usually limited Almost every experimental impact has side effects, which are sometimes unpredictable

20 Fisher’s exact test How big is probability, that I get such or more different table in given marginal frequencies (providing that null hypothesis is true, computed with help of combinatorics). It is used for 2 x 2 table when numbers of observations are low.

21 If I have table Than Fisher’s test computes directly probability of this table, and all (from the view of H0) more extreme, i.e. Sum of all these probabilities is reached level of significance for one-way test (that’s why statistics also prints 2*p)

22 Let us compare two tables:
2 and power of test grow with number of observations - hereat both tables are choice from one population in great probability

23 Measurements of association stregth in 2 x 2 table – independent on sample size
Y = ad/bc =f11f22 / f21f12 - disadvantage - asymmetric: 0 for negative association, 1 for independence, to + infinity for positive association from -1 over 0 for independence to + 1; -1 and + 1 (maximal possible association for given values of marg. frequencies) from -1 over 0 for independence to + 1; -1 and + 1 (maximal possible association for any values of marg. frequenies)

24 Multidimensional frequency tables
Years present Species A absent present absent Species B Nowadays generalized linear models are used in these cases.


Download ppt "Contingency (frequency) tables"

Similar presentations


Ads by Google