Designing powerful sampling programs Dr. Dustin Marshall
Review Confounding Replication Randomisation
Causation or correlation?
Differences between treatments Does this address the hypothesis? Farm originForest origin Moved to new placeMoved to a new place Now at forestNow at farm
One treatment could react differently to movement that the other confounding
Differences between treatments What is the unit of replication? Farm originForest origin Moved to new placeMoved to a new place Now at forestNow at farm Moved to new place but same environment
Summary of how to do a transplant experiment Must have transplant controls – manipulation that causes same disturbance but doesn’t change the environment Must replicate You should be able to draw any transplant experiment with controls
The real world Limited funds/time Can’t sample everything everywhere Could sample at multiple scales but how do you allocate resources?
Different sources of variation “Real” Variability Error 1. Imperfect accuracy 2. Imprecision
Accuracy vs precision
So, three sources of variation Minimise the artefactual sources of variation (minimise inaccuracy) Get an accurate estimate of ‘real variation’ Maximise precision but not at the expense of accuracy
The basics of a powerful experiment design Var among Var within = Ratio N Big significant
The basics of a powerful experiment design Var among Var within = Ratio N
The basics of a powerful experiment design Var among Var within = Ratio N
Design 1 Measure length of fish to the nearest micron Catch fish by chasing them Take 400 samples High precision but not accurate due to bias
Design II Measure fish to the nearest micron Sample from the population randomly Take 10 samples High precision but inaccurate due to few samples of highly variable population
Design III Measure fish to the nearest metre Sample from the population randomly Take 100 samples Very low precision and therefore poor ability to distinguish variation
Design IV Catch fish randomly Measure to the nearest mm Measure 500 Low precision but high accuracy
The ideal, powerful design High precision –careful measurement –Fancy equipment High accuracy –Unbiased sampling –Replication –Precision High Power –Minimise unexplained variation (more about this later)
Scales of sampling Subsampling can increase precision Replication increases accuracy Is there ever a reason to sample at higher scales?
Higher levels of sampling Sampling at higher spatial or temporal scales increases our confidence in the generality of our findings Increases the size of your sampling universe
Continuum of sampling scales Scale of replication SubsampleHigh level unitReplicate PrecisionAccuracyGenerality
A trade-off Accuracy Precision Generality
Example: Does pollution reduce the quality of offspring that coral produce in the GBR? Polluted reefs vs unpolluted reefs Corals occur in patches Each coral can spawn 1,000,000 eggs
What/where to sample to sample? Site Reef Patch Coral Egg What is the unit of replication?
Where/what to sample Site Reef Patch Coral Egg Putting most effort here will maximise n But with no replication here, Var within will go up bad Putting effort here will increase generality
You must replicate at the scales you want to make inferences about! Not just in relation to treatment effects but also how far you would like to extrapolate (within reason)
When should you increase your subsamples? When you are estimating only a portion of an entity and that estimate could be unreliable
Time – the fourth dimension Whenever you are measuring change, you must replicate over space and time Deciding how to allocate effort to each can be tricky Depends on what you’re interested in
So we’ve learnt how to design our sampling program to maximise accuracy but does that automatically translate into a powerful design? Not necessarily
The basics of a powerful experiment design Var among Var within = Ratio N Big significant
Getting accurate estimates of these elements is helpful, as is maximising ‘n’ Var among Var within = Ratio N
Summary Precision, Accuracy, Generality Blocking – using random factors to partition variation and reduce unexplained variation Power analyses – used to give confidence that there is no type II error