Accuracy and power of randomization tests in multivariate analysis of variance with vegetation data Valério De Patta Pillar Departamento de Ecologia Universidade.

Slides:



Advertisements
Similar presentations
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Advertisements

CHAPTER 24 MRPP (Multi-response Permutation Procedures) and Related Techniques From: McCune, B. & J. B. Grace Analysis of Ecological Communities.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Chapter Seventeen HYPOTHESIS TESTING
Part I – MULTIVARIATE ANALYSIS
CHAPTER 22 Reliability of Ordination Results From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
Evaluating Hypotheses
Intro to Statistics for the Behavioral Sciences PSYC 1900
Lecture 9: One Way ANOVA Between Subjects
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
Experimental Evaluation
Inferences About Process Quality
Analysis of Variance & Multivariate Analysis of Variance
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
5-3 Inference on the Means of Two Populations, Variances Unknown
Outline Single-factor ANOVA Two-factor ANOVA Three-factor ANOVA
Multivariate Analysis of Variance, Part 1 BMTRY 726.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental design and analysis.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
QNT 531 Advanced Problems in Statistics and Research Methods
Essential Statistics in Biology: Getting the Numbers Right
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 15 Data Analysis: Testing for Significant Differences.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
MARE 250 Dr. Jason Turner Multiway, Multivariate, Covariate, ANOVA.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Chapter 10 Analysis of Variance.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Confidence intervals and hypothesis testing Petter Mostad
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Education 793 Class Notes Decisions, Error and Power Presentation 8.
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Statistics for Political Science Levin and Fox Chapter Seven
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
FORECASTING METHODS OF NON- STATIONARY STOCHASTIC PROCESSES THAT USE EXTERNAL CRITERIA Igor V. Kononenko, Anton N. Repin National Technical University.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Area Test for Observations Indexed by Time L. B. Green Middle Tennessee State University E. M. Boczko Vanderbilt University.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
Methods of Presenting and Interpreting Information Class 9.
Part Four ANALYSIS AND PRESENTATION OF DATA
Psych 231: Research Methods in Psychology
Presentation transcript:

Accuracy and power of randomization tests in multivariate analysis of variance with vegetation data Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul Porto Alegre, Brazil

Randomization testing: –Became practical with fast microcomputers. –Applicable to most cases analyzed by classical methods. –Applicable to cases not covered by classical methods.

How good is randomization testing? Is it accurate? Is it powerful enough?

Group comparison by randomization testing Choose a test criterion ( ) to compare the groups Permute the data according to the conditions stated by the null hypothesis (Ho) that the groups do not differ Calculate the test criterion  in the random data and compare it to the value found in the observed data. After many iterations, the probability P( o ≥ ) will be the number of iterations with o ≥ divided by the total number of iterations. Reject Ho if P( o ≥ ) is smaller than a threshold (  ) Manly, B. F. J Randomization, Bootstrap and Monte Carlo Methods in Biology. 2 ed. Chapman and Hall.

Randomization test criteria for multivariate comparisons of any number of groups *Pillar, V. D. & Orlóci, L J. Veg. Sci. 7:

An example How common is a Qb ≥ if Ho were true (that the composition is unrelated to group)? Observed squared distance matrix SQ within groups (Qw) = 0/ /2 + ( )/ /2 + 0/1 = Total sum of squares (Qt)= ( )/10 = SQ between groups(Qb) = = Is there a significant effect of N on vegetation composition as defined by these two PFTs?

Reference set under Ho If Ho true, the observation vector in a given sampling unit is independent from the group to which the unit belongs.

A random permutation and corresponding statistics Observed Permuted SQ within groups (Qw o ) = 0/ /2 + ( )/4 + 8/2 + 0/1 = Total sum of squares (Qt)= ( )/10 = SQ between groups(Qb o ) = = Since, < (Qb o < Qb), this iteration adds zero to the frequency of cases in which Qb o ≥ Qb. Permuted squared distance matrix

After random permutations…

Two-factor designs Test criterion: Q b = Q t - Q w is based on the groups defined by the joint states of the factors. Q b is partitioned as Q b = Q b|A + Q b|B + Q b|AB where Q b|A : sum of squares between l a groups according to factor A disregarding factor B Q b|B : sum of squares between l b groups according to factor B disregarding factor A Q b|AB : sum of squares of the interaction AB, obtained by difference. F-ratio = Q b /Q w

Unrestricted permutation in two-factor design

Two-factor Multivariate Analysis of Variance Observed One random permutation Data: Species (57) composition in 8 vegetation units surveyed in two landscape positions (factor A) and two grazing levels (factor B).

After random permutations… Data: Species (57) composition in 8 vegetation units surveyed in two landscape positions (factor A) and two grazing levels (factor B). Unrestricted random permutations. Test criterion F-ratio = Qb/Qw.

Restricted permutations In two-factor (not nested) designs, for testing one factor, permutations may be restricted to occur within the levels of the other factor (Edgington 1987). Restricted permutation within the levels of factor A (for testing factor B): Edgington, E. S Randomization Tests. Marcel Dekker, New York.

Permutations of residuals instead of raw data

Two-factor multivariate analysis of variance by randomization testing for the effects of landscape position and grazing level in natural grassland, southern Brazil (data from Pillar 1986). The data set contains 16 polled community stands by 60 species. Restricted random permutations for testing factors landscape and grazing. Permutation of residuals removing both factors for testing the interaction.

How good is randomization testing in two-factor multivariate analysis of variance?

Simulation of interaction For each case, 1000 data sets were generated, with distribution properties of real vegetation data and subject to multivariate analysis of variance with randomization testing. When factor or interaction effect is set to zero, the proportion of Ho rejection under a given  threshold estimates Type I Error, the probability of wrongly rejecting Ho when it is true. If Type I Error is equal to , the test is exact. When factor or interaction effect > 0, the proportion of Ho rejection estimates the power of the test, which is the one- complement of Type II Error, the probability of not rejecting Ho when it is false.

Simulated data generated with distributional properties of real data Data set: 16 grassland units described by cover of 60 species. Two factors: landscape position (top-convex, concave-lowland) and grazing levels (grazed, ungrazed). Procedure described by Peres-Neto & Olden (2000): 1.Calculate the mean (   ) and the standard deviation (  ij ) for each species vector i within each group j defined by the four factor level combinations; 2.Standardize these vectors for mean equal 0 and standard deviation equal 1, t hij =(x hij -   )/   ; 3.Randomly permute whole stand vectors across groups; 4.Restore the original dispersion within each group by computing new observations s hij = t hij  , defining in this way a data set with the conditions specified by Ho; 5.Apply to the species vectors the corresponding group differences for factor and interaction effects; 6.Perform the randomization tests using 1000 random permutations; 7.Repeat the steps (3) to (6) 1000 times, recording the proportion of Ho rejection. Peres-Neto, P.R. & Olden, J.D Animal Behaviour 61:

With no factor and interaction effects, type I error is not different from 0.05, as expected by using  = Results of power evaluation by data simulation in two-factor MANOVA. The proportion of Ho rejection at  = 0.05 was obtained for 1000 simulated data sets generated on the basis of plant community data with 16 units and 60 species, with increasing difference between the two groups for factor 1, with no interaction. Each factor combination had equal number of units. For each data set a randomization test was run with 1000 iterations. As the effect of factor 1 increases, type I error for factor 2 and interaction are underestimated with unrestricted permutations with Qb and  -ratio, but not with restricted permutations and residuals.

As the effects of both factors increase, type I error for the interaction is underestimated with unrestricted permutations with Qb and  -ratio, but not with residuals.

As the effect of interaction increases, type I error for both factors is underestimated with Qb and  -ratio, un- and restricted permutations. But, main factors should not be considered at all when interaction is present!

As the effects of both factors increase, the power of the test with permutations of raw data is decreased for detecting the interaction when using Qb and  -ratio, but not when permuting residuals.

Results of power evaluation by data simulation in two-way designs. The proportion of Ho rejection at  = 0.05 was obtained for 1000 simulated data sets generated on the basis of plant community data with 60 units and 60 species, with increasing relative difference between the four- group factor combinations. Factor combinations had unequal number of units (11: 31, 12: 9, 21: 9, 22: 11). For each data set a randomization test was run with 1000 iterations.

References Anderson, M.J. and ter Braak, C. 2003, Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computations and Simulations 73: Edgington, E. S Randomization Tests. Marcel Dekker, New York. Manly, B. F. J Randomization, Bootstrap and Monte Carlo Methods in Biology. 2 ed. Chapman and Hall. Peres-Neto, P.R. & Olden, J.D Assessing the robustness of randomization tests: examples from behavioural studies. Animal Behaviour 61: Pillar, V. D. & Orlóci, L On randomization testing in vegetation science: multifactor comparisons of relevé groups. Journal of Vegetation Science 7: Pillar, V. D MULTIV Software for multivariate analysis, randomization tests and bootstrapping. Available (minor version, manual included) at: