Spiders on Mazurian lake islands: Wigry –Mikołajki, Nidzkie, Bełdany) Analysis of variance Photo: Wigierski Park Narodowe Photo: Ruciane.net Araneus diadematus.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Multiple Analysis of Variance – MANOVA
Analysis of variance and statistical inference.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Research Support Center Chongming Yang
Advanced analytical approaches in ecological data analysis The world comes in fragments.
Hypothesis Testing Steps in Hypothesis Testing:
Community and gradient analysis: Matrix approaches in macroecology The world comes in fragments.
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Inference for Regression
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Lecture 14 Non-parametric hypothesis testing The ranking of data The ranking of data eliminates outliers and non- linearities. In most cases it reduces.
Variance and covariance M contains the mean Sums of squares General additive models.
Spiders on Mazurian lake islands: Wigry –Mikołajki, Nidzkie, Bełdany) Lecture 2 Analysis of variance Photo: Wigierski Park Narodowe Photo: Ruciane.net.
PSY 307 – Statistics for the Behavioral Sciences
Correlation and linear regression
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Biol 500: basic statistics
Topic 3: Regression.
Linear Regression Example Data
Chapter 11: Inference for Distributions
Today Concepts underlying inferential statistics
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Variance and covariance Sums of squares General linear models.
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Correlation & Regression
Leedy and Ormrod Ch. 11 Gray Ch. 14
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
Analysis of Covariance David Markham
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
ANOVA (Analysis of Variance) by Aziza Munir
GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 16 Data Analysis: Testing for Associations.
ANOVA: Analysis of Variance.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 18 Part 5 Analysis and Interpretation of Data DIFFERENCES BETWEEN GROUPS AND RELATIONSHIPS.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Stats Methods at IC Lecture 3: Regression.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Comparison of means test
Statistics in MSmcDESPOT
12 Inferential Analysis.
I. Statistical Tests: Why do we use them? What do they involve?
12 Inferential Analysis.
The Analysis of Variance
Chapter 13 Additional Topics in Regression Analysis
A protocol for data exploration to avoid common statistical problems
1-Way Analysis of Variance - Completely Randomized Design
Presentation transcript:

Spiders on Mazurian lake islands: Wigry –Mikołajki, Nidzkie, Bełdany) Analysis of variance Photo: Wigierski Park Narodowe Photo: Ruciane.net Araneus diadematus Salticidae Photo: Eurospiders.com

Spider species richness on Mazurian lake islands Does species richness differ with respect to the degree of disturbance? If we use the same test several times with the same data we have to apply a Bonferroni correction. Single test n independent tests

Spider species richness on Mazurian lake islands sH2sH2 sM2sM2 sL2sL2 sP2sP2 sT2sT2 If there would be no difference between the sites the average within variance s Within 2 should equal the variance between the sites s Between 2. One way analysis of variance Sir Ronald Aylmer Fisher ( ) s Between 2 We test for significance using the F-test of Fisher with k-1 (Between) and n-k (Within) degrees of freedom. n-1 = n-k + k-1 df Total df Within df Between

Welch test The Levene test compares the group variances using the F distribution. Variances shouldn’t differ too much (shouldn’t be heteroskedastic)!!! The Tuckey test compares simultaneously the means of all combinations of groups. It’s a t-test corrected for multiple comparisons (similar to a Bonferroni correction)

We include the effect of island complex (Wigry – Nidzkie, Bełdany, Mikołaiki) There must be at least two data for each combination of groups. We use a simple two way ANOVA Main effectsSecondary effects

The significance levels have to be divided by the number of tests (Bonferroni correction) Spider species richness does not significantly depend on island complex and degree of disturbance.

Correcting for covariates: Anaysis of covariance Instead of using the raw data we use the residuals. These are the area corrected species numbers. The conmparison of within group residuals and between group residuals gives our F-statistic.

Disturbance does not significantly influence area corrected species richness SS total = SS between + SS error Within group residuals Total residuals We need four regression equations: one from all data points and three within groups.

Repetitive designs In medical research we test patients before and after medical treatment to infer the influence of the therapy. We have to divide the total variance (SS total ) in a part that contains the variance between patients (SS between ) and within the patient (SS within ). The latter can be divided in a part that comes from the treatment (SStreat) and the error (SS error ) Medical treatment Before After SS within SS between

Before – after analysis in environmental protection In the case of unequal variances between groups it is save to use the conservative ANOVA with (n-1) df error and only one df Effect in the final F-test. df treat = k-1 df Error = (n-1)(k-1)

Bivariate comparisons in environmental protection The outlier would disturb direct comparisons of species richness Due to possible differences in island areas between the two island complexes we have to use the residuals. A direct t-test on raw data would be erroneous.

Permutation testing Observed P(t) Upper 2.5% confidence limit randomizations of observed values gives a null distribution of t-values and associated probability levels with which we compare the observed t. This gives the probability level for our t-test.

Bivariate comparisons using ANOVA t and F tests can both be used for pair wise comparisons.

Repeated measures Species richness of ground living Hymenoptera in a beech forest Photo Tim Murray Photo Simon van Noort

Advices for using ANOVA:  You need a specific hypothesis about your variables. In particular, designs with more than one predicator level (multifactorial designs) have to be stated clearly.  ANOVA is a hypothesis testing method. Pattern seeking will in many cases lead to erroneous results.  Predicator variables should really measure different things, they should not correlate too highly with each other  The general assumptions of the GLM should be fulfilled. In particular predicators should be additive. The distribution of errors should be normal.  It is often better to use log-transformed values  In monofactorial designs where only one predicator variable is tested it is often preferable to use the non-parametric alternatives to ANOVA, the Kruskal Wallis test. The latter test does not rely on the GLM assumptions but is nearly as powerful as the classical ANOVA.  Another non-parametric alternative for multifactorial designs is to use ranked dependent variables. You loose information but become less dependent on the GLM assumptions.  ANOVA as the simplest multivariate technique is quite robust against violations of its assumptions.

Starting hyotheses The degree of disturbance (human impact) influences species richenss. Species richness and abundance depends on island area and environmental factors. Island ensembles differ in species richness and abundance. Area, abundance, and species richness are non-linearly related. Latitude and longitude do not influence species richness. Sorting Area, abundance, and species richness are non- linearly related. Latitude and longitude do not influence species richness. Species richness and abundance depends on island area and environmental factors. Island ensembles differ in species richness and abundance. The degree of disturbance (human impact) influences species richenss. The hypotheses are not independent. Each hypothesis influences the way how to treat the next.

Area, abundance, and species richness are non-linearly related. Species – area and individuals area relationships

Latitude and longitude do not influence species richness. Is species richness correlated with longitude and latitude? Does the distance between islands influence species richness? Are geographically near islands also similar in species richness irrespective of island area? R(S-Long) = 0.22 n.s. R(S-Lat) = 0.28 n.s.) That there is no significant correlation does not mean that latitude and longitude do not have an influence on the regression model with environmental variables. Spatial autocorrelation S1 S3 S5S6 S2 S4 In spatial autocorrelation the distance between study sites influence the response (dependent) variable. Spatialy adjacent sites are then expected to be more similar with respect to the response variable.

Moran’s I as a measure of spatial autocorrelation Moran’s I is similar to a correlation coefficient all applied to pairwise cells of a spatial matrix. It differs by weighting the covariance to account for spatial non-independence of cells with respect to distance. If cell values were randomly distributed (not spatially autocorrelated) the expected I is Statistical significance is calculated from a Monte Carlo simulation S1 S3 S5S6 S2 S4 All combinations of sites

Individuals/trap is slightly spatially autocorrelated Latitude and longitude slightly influence species richenss. Even this weak effect might influence the outcome of a regression analysis.

High multicollinearity Solution: prior factor analysis to reduce the number of dependent variables Too many variables!!

Stepwise variable elimination Standardized coefficients (  -values) are equivalents of correlation coefficients. They should have values above 1. Such values point to too high correlation between the predictor variables (collinearity). Collnearity disturbs any regression model and has to be eliminated prior to analysis. Highly correlated variables essentially contain the same information. Correlations of less than 0.7 can be tolerated. Hence check first the matrix of correlation coefficients. Eliminate variables that do not add information.

The final model after stepwise variable elimination Simple test wise probability levels. We yet have to correct for multiple testing. Bonferroni correction To get an experiment wise error rate of 0.05 our test wise error rates have be less than 0.05/n The best model is not always the one with the lowest AIC or the highest R 2. Species richness is positively correlated with island area and negatively with soil humidity.

Island ensembles differ in species richness and abundance. Analysis of covariance (ANCOVA) Species richness depends on environmental factors that may differ between island ensembles. A simple ANOVA does not detect any difference

Analysis of covariance (ANCOVA) ANCOVA is the combination of multiple regression and analysis of variance. First we perform a regression anlyis and use the residuals of the full model as entries in the ANOVA. ANCOVA is the ANOVA on regression residuals. We use the regression residuals for further analysis The metrically scaled variables serve as covariates. Sites with very high positive residuals are particularly species rich even after controlling for environmental factors. These are ecological hot spots. Regression analysis serves to identify such hot spots

ANCOVA Species richness does not differ between island ensembles.

The degree of disturbance (human impact) influences species richenss. Species richness of spiders on lake islands appears to be independent of the degree of disturbance

How does abundance depend on environmental fatcors? The full model and stepwise variable elimination Most coefficients are highly significant! Standardized coefficients are above 1. This points to too high collinearity We furthr eliminate uninformative variables. Abundance does not significally depend on environmental variables

How does abundance depend on the degree of disturbance? Abundance of spiders on lake islands appears to be independent of the degree of disturbance

Literature