Experimental design and analysis Graphical Exploration of Data Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Graphical displays Exploration –assumptions (normality, equal variances) –unusual values –which analysis? Analysis –model fitting Presentation/communication of results
Space shuttle data NASA meeting Jan 27th 1986 –day before launch of shuttle Challenger Concern about low air temperatures at launch Affect O-rings that seal joints of rocket motors Previous data studied
Joint temp. o F Number of incidents O-ring failure vs temperature Pre 1986
Joint temp. F o Number of incidents O-ring failure vs temperature
Checking assumptions - exploratory data analysis (EDA) Shape of sample (and therefore population) –is distribution normal (symmetrical) or skewed? Spread of sample –are variances similar in different groups? Are outliers present –observations very different from the rest of the sample?
Distributions of biological data Bell-shaped symmetrical distribution: normal y Pr(y) y Skewed asymmetrical distribution: log-normal poisson
Common skewed distributions Log-normal distribution: proportional to measurement data, e.g. length, weight etc. Poisson distribution: = 2 count data, e.g. numbers of individuals
Exploring sample data
Example data set Quinn & Keough (in press) Surveys of 8 rocky shores along Point Nepean coast 10 sampling times ( ) 15 quadrats (0.25m 2 ) at each site Numbers of all gastropod species and % cover of macroalgae recorded from each quadrat
Frequency distributions NORMALLOG-NORMAL Value of variable (class) Number of observations Observations grouped into classes Value of variable (class)
Number of Cellana per quadrat Number of Cellana per quadrat Frequency Survey 5, all shores combined Total no. quadrats = 120
Dotplots Number of Cellana per quadrat Each observation represented by a dot Number of Cellana per quadrat, Cheviot Beach survey 5 No. quadrats = 15
Boxplot 25% of values } } } } " " " spread outlier hinge median * GROUP VARIABLE largest value smallest value
1. IDEAL2. SKEWED 4. UNEQUAL VARIANCES3. OUTLIERS * * * * *
SFPERRSPCPECBLBCPW Site Number of Cellana per quadrat Boxplots of Cellana numbers in survey 5
Scatterplots Plotting bivariate data Value of two variables recorded for each observation Each variable plotted on one axis (x or y) Symbols represent each observation Assess relationship between two variables
Cheviot Beach survey 5 n = % cover of Hormosira per quadrat Number of Cellana per quadrat
Scatterplot matrix Abbreviated to SPLOM Extension of scatterplot For plotting relationships between 3 or more variables on one plot Bivariate plots in multiple panels on SPLOM
SPLOM for Cheviot Beach survey 5 CELLANA - numbers of Cellana SIPHALL - numbers of Siphonaria HORMOS - % cover of Hormosira n = 15 quadrats
Transformations Improve normality. Remove relationship between mean and variance. Make variances more similar in different populations. Reduce influence of outliers. Make relationships between variables more linear (regression analysis).
Log transformation LognormalNormal y = log(y) Measurement data
Power transformation PoissonNormal y = (y), i.e. y = y 0.5, y = y 0.25 Count data
Arcsin transformation SquareNormal y = sin -1 ( (y)) Proportions and percentages
Outliers Observations very different from rest of sample - identified in boxplots. Check if mistakes (e.g. typos, broken measuring device) - if so, omit. Extreme values in skewed distribution - transform. Alternatively, do analysis twice - outliers in and outliers excluded. Worry if influential.
Assumptions not met? Check and deal with outliers Transformation –might fix non-normality and unequal variances Nonparametric rank test –does not assume normality –does assume similar variances –Mann-Whitney-Wilcoxon –only suitable for simple analyses
Category or line plot Mean number of Cellana per quadrat Survey Cheviot Beach Sorrento Mean number of Cellana per quadrat Survey