Fixed sites and monitoring/bioassessment bias in ephemeral systems Wayne Robinson Charles Sturt University
Context Aggregating assessments to large or small spatial scales More than one sample in time For example, assessing a local asset or a region or national program The Living Murray condition monitoring program
Fixed sites Average differences between sites Average trend of sites Bad for bias and true status Great for trend? - assumes initial bias is constant through time Review of typical approaches
Random sites Differences between population average Trend of population averages Not as good for trend for 8 – 15 years Less bias thus great for status Review of typical approaches
Rotating panel Trend + population averages Trade off between fixed and rotating Review of typical approaches
Sites dropout for numerous reasons Access unavailable Landholder tolerance No longer representative Managed Act of nature Equipment failure Logistics Etc The longer a study goes for, the higher site attrition is likely to be and the less likely the samples represent the (initial) population These are generally termed ‘missing at random’ (MAR)
Sometimes sites dropout because of the ‘moving sampling frame’. E.g less watered area Original sites have an inclusion probability when selected New sites have different ‘inclusion probabilities’ when selected Some data providers may add some more sites Requires statistical adjustments in calculations Q1. Should we really be worried about dropout sites? i.e. Are there consequences for the assessments? Missing non-random (MNAR) Drop out sites
Furthermore, in ephemeral systems, sometimes sites are added because of the moving sampling frame New sites have different inclusion probabilities when selected Q1. Should we really be worried about the moving sampling frame? i.e. Are there consequences for the assessments? Imagine doing a shopping basket survey for a large city, based on the city limits of 40 years ago!
Common methods for dealing with dropout sites Oversample subjects/sites at the start to allow for attrition e.g. common approach in medical studies Restrict the sampling frame to subjects/sites less likely to dropout e.g. only report in more permanent sites (SRA) Refresh the samples by adding new subjects/sites e.g. stock market indices requires adjustments for different inclusion probabilities
Common methods for dealing with dropout sites Oversample subjects/sites at the start to allow for attrition e.g. common approach in medical studies Restrict the sampling frame to subjects/sites less likely to dropout e.g. only report in more permanent sites (SRA) Refresh the samples by adding new subjects/sites e.g. stock market indices requires adjustments for different inclusion probabilities *This is the only approach that deals with an expanding sampling frame
The Living Murray – Icon Sites Large scale, asset based monitoring program Three very similar forests relevant to this report Koondrook-Perricoota Forest Gunbower Forest Barmah-Millewa Forest All use similar within site sampling protocols for sampling fish Different site selection protocols
The Living Murray – Icon Sites Large scale, asset based monitoring program Three very similar forests relevant to this report Koondrook-Perricoota Forest Gunbower Forest Barmah-Millewa Forest All use similar within site sampling protocols for sampling fish Different site selection protocols Prescribed by MDBA to use fixed monitoring sites
$60 Million in works to flood KP forest
Delivering up to 6 ML/Day
Up to 100 days/year
Koondrook-Perricoota Forest 32,000 Ha Starting from a dry state Missed the brief about using fixed sites (luckily) Estimated 32 sites for decent estimates of status of native fish The forest flooded in 2010, and 80% of available sites were sampled at random (n=32), and there has been a census almost every other year 54 sites in total sampled for fish since 2011
Koondrook-Perricoota Forest (Fish Sampling) Starting from a dry state Missed the brief about using fixed sites (luckily) The forest flooded in 2010, and 80% of available sites were sampled at random (n=32), and there has been a census almost every other year 54 sites in total sampled since 2010 All available habitat in the forest is mapped each year before sampling
Barmah – Millewa Forest (Fish Sampling) 21 fixed sites, across 3 strata Sampling Strata Permanent River Sites Permanent Creek Sites Wetland/Lak e Sites 1 MAR in two years 1 Refresh site 1 MNAR for three years Occasional MAR 1 Refresh site 5 MNAR for various years No MAR This strata is equivalent to KPF IRES/wetlands They do not know what their sampling frame is
Koondrook-Perricoota Forest
Koondrook-Perricoota Forest Census 80% 91% Census
Retrospective look at KPF fish monitoring data How ‘bad’ is the bias caused by dropout sites (after 5 years)? Randomly select n sites from those sampled in year 1 A.Follow only these through the study (No Refresh) B.Supplement with refresh sites where possible, from The 2011 frame All available sites in any year C.Compare with the census data Do this lots of times with varying sample sizes All samples are subjected to additional calculations for; inclusion probabilities size of waterbody Finite population corrections
Retrospective look at KPF fish monitoring data How ‘bad’ is the bias caused by dropout sites (after 5 years)? Randomly select n sites from those sampled in year 1 A.Follow only these through the study (No Refresh) B.Supplement with refresh sites where possible, from The 2011 frame All available sites in any year C.Compare with the census data Do this lots of times with varying sample sizes All samples are subjected to additional calculations for; inclusion probabilities size of waterbody Finite population corrections CAVEAT! Only 80 % of the population was sampled in 2011, Thus the initial sample is also subject to random sampling bias Assumed not too large as such a large proportion of the populations sampled
Interpreting the results Red reference line is Census mean Green reference line is mean of initial group of sites (a little bias here) Histogram is the distribution of the random samples All results are basic fish nativeness scores
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 1: Proportion Native Fish Species Richness
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 1: Proportion Native Fish Species Richness No Bias in my bootstrap sampling methodology (random error)
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 1: Proportion Native Fish Species Richness No refresh = biased + 16% + 14% + 13% -3% ns -22%
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 1: Proportion Native Fish Species Richness No refresh = biased Refresh from initial frame = biased + 16% + 17% + 14% + 13% + 17% -3% ns - 2% -22%
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 1: Proportion Native Fish Species Richness No refresh = biased Refresh from initial frame = biased Refresh from all available sites = [less] biased + 16%+ 7% + 17% + 14%+ 8% + 14% + 13%+ 6% + 17% -3% ns-.3%ns - 2% -22%-.0%ns - 22%
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 1: Proportion Native Fish Species Richness Adjusting site weights using new site inclusion probabilities practically eliminates bias + 16%+ 7% + 17% +1% ns + 14%+ 8% + 14% +4% ns + 13%+ 6% + 17% -2% ns -3% ns-.3%ns - 2% -.3% ns -22%-.0%ns - 22% -4%
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 2: Proportion Native Fish catch Adjusting site weights using new site inclusion probabilities practically eliminates bias +10%+3% +8% +3% ns +2% ns+1% ns + 2% +1% ns +16% +6%+18% -1% ns +12%+3% +8% +1 % ns -31%-.0%ns -31% -6%
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 3: Proportion Native Fish biomass Adjusting site weights using new site inclusion probabilities practically eliminates bias + 49%+ 7% +26% -14% ns + 46%+ 33% + 54% -26% ns - 40%+10% ns + 21% -6% ns -22%-.1%ns - 8% +7 % ns +25%-.0%ns +25% +5%
Results N = 7 sites Comparable with BMF, BMF does not use refresh sites Index 3: Proportion Native Fish biomass (Random) Starting BIAS is not constant through time + 49%+ 7% +26% -14% ns + 46%+ 33% + 54% -26% ns - 40%+10% ns + 21% -6% ns -22%-.1%ns - 8% +7 % ns +25%-.0%ns +25% +5%
Summary It is clear that not reviewing the sampling frame for each survey leaves the results susceptible to bias When sites dropout they should be replaced using the current sampling frame Inclusion probabilities are required for correct calculations
Questions My question for you DO YOU have a good big census data set that I can borrow?
Data Providers NSW DPI Fisheries, Arthur Rylah Institute, North Central Catchment management Authority, Murray Darling Basin Authority