Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis variance (ANOVA).

Similar presentations


Presentation on theme: "Analysis variance (ANOVA)."β€” Presentation transcript:

1 Analysis variance (ANOVA).
ANOVA compares 2 or more normally distributed random variables. Their variances are the same, therefore ANOVA compares 2 or more their (theoretical) means. If we want to compare 2 means, we can use t-test or ANOVA. The results (probability levels) are the same. We want to compare weights of the clumps for 4 potato varieties: 1-st variety: 0.9, 0.8, 0.6, 0.9 2-nd variety: 1.3, 1, 1.3,1.2 3-th variety: 1.3, 1.5, 1.6, 1.1 4-th variety: 1.1, 1.2, 1.1,1 Are there differences among means of varieties? The first calculations: The sample mean for weights of varieties: 𝑋 𝑗 = 1 𝑛 𝑖=1 𝑛 𝑋 𝑖𝑗 , 𝑗=1, …, π‘Ž (a = 4 varieties, n = 4 measurements) 𝑋 1 =0.8, 𝑋 2 =1.2, 𝑋 3 =1.375, 𝑋 4 =1.1. the grandmean 𝑋 = 1 π‘Ž 𝑋 𝑖 = 𝑋 =

2 𝛼 𝑖 βˆ’ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝑖𝑗 = Ξ± 𝑖 βˆ’ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝑖𝑗 = Ξ± 𝑖 .
The model of analysis of variance: Xij = m + ai + error ij , where errorij is of N(0,  2 ). 𝑋 is estimation of the common value πœ‡ and we will calculate 𝑋 𝑖𝑗 βˆ’ 𝑋 = ai + error ij 𝛼 𝑖 βˆ’ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝑖𝑗 = Ξ± 𝑖 βˆ’ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝑖𝑗 = Ξ± 𝑖 . ANOVA is able to detected the shift of columns by Ξ± 𝑖 in positive or negative direction from common value ΞΌ Tests in ANOVA: H0: πœ‡ 1 = πœ‡ 2 = πœ‡ 3 = πœ‡ 4 (H0: πœ‡+a1 = πœ‡+ Ξ± 2 = πœ‡+ Ξ± 3 = πœ‡+ Ξ± 4 ) H1: at least one equality in H0 is not valid. Equivalently H0: 𝛼 1 = 𝛼 2 = 𝛼 3 = 𝛼 4 H1: at least one equality in H0 is not valid.

3 Assumptions of ANOVA: normal distribution – is not tested homogenity of variances – the base assumption – is tested 2 estimations of variances 𝜎 2 : Variability between columns (derived from S.E.): 𝑆 1 2 =𝑛 𝑆.𝐸. 2 = 𝑛 π‘Ž 𝑖=1 π‘Ž ( 𝑋 𝑖 βˆ’ 𝑋 ) 2 , 𝑆 1 2 β‰ˆ 𝜎 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 2 , Variability within the dataset: 𝑆 2 2 = 1 π‘Žπ‘› 𝑖=1 π‘Ž 𝑗=1 𝑛 ( 𝑋 𝑖𝑗 βˆ’ 𝑋 𝑖 ) 2 , 𝑆 2 2 β‰ˆ 𝜎 π‘€π‘–π‘‘β„Žπ‘–π‘› 2 . It is possible to derive that 𝜎 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 2 𝜎 π‘€π‘–π‘‘β„Žπ‘–π‘› 2 = 𝜎 2 + 𝑛 π‘Žβˆ’1 𝑖=1 π‘Ž 𝛼 𝑖 2 𝜎 2 ≑ 𝜎 2 + πœ‘ 2 𝜎 2 If H0 is valid, then Ξ± 𝑖 =0 for all i and then 𝜎 2 + πœ‘ 2 𝜎 2 = 𝜎 2 𝜎 2 = 1. Therefore, the method is called analysis of variance (ANOVA). Estimation of 𝜎 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 2 is 𝑆 1 2 , estimation of 𝜎 π‘€π‘–π‘‘β„Žπ‘–π‘› 2 is 𝑆 𝑆 1 2 and 𝑆 2 2 are random variables and 𝑆 𝑆 β‰ˆπΉ(π‘Žβˆ’1, π‘Ž π‘›βˆ’1 )

4 Note. If we have different numbers of measurements in columns (in our example we have 𝑛 1 = 𝑛 2 = 𝑛 3 = 𝑛 4 =4), the the degrees of freedom are (π‘Žβˆ’1, 𝑛 𝑖 βˆ’π‘Ž ). If a = 2, we can use t- test or ANOVA. The probability levels are the same, although the test’s characteristics (t, F) are different. It is not possible to use the series of t-tests for a > 2 οƒž the error of the first type (Ξ±) is increasing.

5 In our example P < 0.05, thereore we reject H0, therefore some of equalities in H0 are not valid. Means 95% conf. interval The varieties 1 a 2, 1 a 3 can be different.

6 Post – hoc tests: H0: mean of the column i = mean of the column j, i β‰  j H1: inequality between means of columns i and j. Post-hoc test is answer to the question, WHERE is the difference that was detected by ANOVA. We can see the differences between varieties 1and 2 (averages 0.8 and 1.2, P = < 0.05) and 1 and 3 (averages 0.8 a 1.375, P = < 0.05). In our example was 1 factor β€œvarietyβ€œ  single - factor ANOVA

7 Multifactor analysis of variance.
A. Factorial design. Example: The rats were fed with fresh or rancid fat during 73 days. We are interested in the consumption of the fat of various quality depending on the sex. fresh rancid sum males 709 592 679 638 699 476 2087 1706 3793 females 657 508 594 505 677 539 1928 1552 3480 4015 3258 Males eat more than females Fresh fat is more attractive than rancid fat. We have two criteria: sex: we have 2 normally distributed random variables (males, females) fat: we have 2 normally distributed random variables (rancid, fresh). We assume homogenity of all variances We will test the difference between two means for sex and We will test the difference between two means for fat

8 Therefore we have 2 null hypothesis:
H01: there is no difference between means of males and females H02: there is no difference between means of consumption of fresh and rancid fat. Model: 𝑋 𝑖𝑗 = ΞΌ+ 𝑠𝑒π‘₯ 𝑖 + π‘“π‘Žπ‘‘ 𝑗 + π‘–π‘›π‘‘π‘’π‘Ÿπ‘Žπ‘π‘‘π‘–π‘œπ‘› 𝑖𝑗 + π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ 𝑖𝑗 The lines are approximately parallel, There are no relations between factors. The factors are independent οƒž we can detected shift of factor levels. Therefore 3th null hypothesis: H03: there are no significant interactions. Here the interactions are not significant. fresh rancid females males males

9 H01 we don’t reject  there is no significant difference between males and females.
H02 we reject  there is difference in consumption of the fresh and rancid fat. H03 we don’t reject  there are no significant differences. Nonsignificant interactions  model is OK, it is possible to use Anova. Significant interactions  Something else (another factor) can influence the measurements  must be justified. Post-hoc test For fat only: the fresh fat is consumed more than rancid.

10 The lines are not parallel, There are some relations between factors.
Modification. fresh rancid sum males 592 709 538 679 476 699 1606 2087 3693 females 657 508 594 505 677 539 1928 1552 3480 3534 3639 The lines are not parallel, There are some relations between factors. The factors are dependent οƒž we cannot detected shift of factor levels. ANOVA can not be used rancid fresh H03 is rejected οƒž interactions are significant females males

11 B. Hierarchic Anova (Nested design).
We have 2 containers: To the first one is placed fresh fat, to the second container is placed rancid fat. To the first container we placed 6 rats randomly chosen, to the second container are placed 6 rats randomly chosen. 1-st container 2-nd container 6 rats 6 rats 2 factors: rat - random factor fat - fixed factor We are in the level β€œindividualsβ€œ – not in the level β€œsexβ€œ, we talk about the differences among individuals The factor β€œratβ€œ is nested in the factor β€œfatβ€œ This arrangement can be viewed as one way ANOVA and the rats are taken as repetition.

12 It was measured the water temperature in depths 0, 2 a 5 metres.
ANOVA – random blocks. Example: It was measured the water temperature in depths 0, 2 a 5 metres. Are there differences among temperatures in different depths? 1-st place 2-nd place 3-th place 10-th place 0 m 0 m 0 m 0 m …. 2 m 2 m 2 m 2 m 5 m 5 m 5 m 5 m random blocks

13 Each depth in given place is measured only ones.
the temperature in 10 places blok 0 m 2 m 5 m 1 20.8 18.5 11.1 2 21.3 20 12 3 19.9 18 9.5 4 10.2 5 20.1 19 10.1 6 19.8 9.9 7 19.5 8.5 8 19.3 8.8 9 10 This is actually a 1-factor Anova with the factor β€œdepthβ€œ. 1-factor ANOVA: F(2, 27) = , P = 0 2-factors ANOVA: F(2, 18) = , P = 0

14 Violation of assumptions
Homogenity of variances. If this assumption is not valid, something else enters into model, some factor that was not be included into design of experiment. Violation of homogenity of variances means β€œfalseβ€œ conclusions in the sense rejection or non rejection of null hypotheses. In practice it means the new experiment. Aditivity of the model. It means the significant interactions. It can be non-homogenity of conditions of experiment, or it can be not including all substantial factors of experiments. It follows the low sensitivity of method. In practice the interactions are often significant. If significance of interactions is high (Pο‚»0), the new experiment is necessary.

15 Normality. Like the t-tests, we can take the same data transformation to normality, or we can use nonparametric equivalent of Anova. Sensitivity of nonparametric tests is low and we often use the transformation to N (0,1). nonparametric equivalent of one faktor Anova is Kruskal-Wallis test, nonparametric equivalent of two factors Anova is Friedmann test (block design). Design of experiment should be as simple as possible. The number of factors should not be greater than two, as the explanation of possible significant interactions is difficult.


Download ppt "Analysis variance (ANOVA)."

Similar presentations


Ads by Google