Bias, Information, Signal and Noise in Citizen Science data Nick Isaac Phot ocredit: Rich

Slides:



Advertisements
Similar presentations
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
The estimation of the SZ effects with unbiased multifilters Diego Herranz, J.L. Sanz, R.B. Barreiro & M. López-Caniego Instituto de Física de Cantabria.
Nick Isaac Biological Records Centre Centre for Ecology & Hydrology Interpreting biodiversity under diverse syndromes of recording behaviour.
Trend analysis: considerations for water quality management Sylvia R. Esterby Mathematics, Statistics and Physics, University of British Columbia Okanagan.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Transmission and dose–response experiments for social animals: a reappraisal of the colonization biology of Campylobacter jejuni in chickens by Andrew.
Big Data (and official statistics) Piet Daas and Mark van der Loo* Statistics Netherlands MSIS 2013, April 25, Paris * With contributions of: Edwin de.
Barteld Braaksma and Kees Zeelenberg “Re-make / Re-model”: Should big data change the modelling paradigm in official statistics?
New technologies and approaches for Citizen Science Deborah Procter Senior Monitoring Ecologist
Detecting Temporal Trends In Species Assemblages With Randomization Procedures And Hierarchical Models Nick Gotelli University of Vermont USA.
Species interaction models. Goal Determine whether a site is occupied by two different species and if they affect each others' detection and occupancy.
458 Estimating Extinction Risk (the IUCN criteria) Fish 458; Lecture 24.
Permanent settlements, transportation dramatically change where wildlife can survive Humans move around a lot of species to areas they were never found.
Approaches to the infrasound signal denoising by using AR method N. Arai, T. Murayama, and M. Iwakuni (Research Dept., Japan Weather Association) 2008.
KDD for Science Data Analysis Issues and Examples.
J Ryan Allen Advisor: Joe Bishop. Undergraduate Degree in Environmental, Population and Organism Biology. Work background in Museum Collections. Worked.
Fishing Effort: fishery patterns from individual actions Dr. Darren M. Gillis, Biological Sciences, University Of Manitoba, Winnipeg,
Survey Science Group Workshop 박명구, 한두환 ( 경북대 )
Nick Isaac, Tom August & Gary Powney Trends in British Biodiversity since
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Speaker: Oscar Corcho Building Semantic Sensor Webs and Applications ESWC 2011 Tutorial 29 May 2011.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Butterfly Monitoring: experiences with citizen scientists
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Detecting trends in dragonfly data - Difficulties and opportunities - Arco van Strien Statistics Netherlands (CBS) Introduction.
Using historic data sources to calibrate and validate models of species’ range dynamics Giovanni Rapacciuolo University of California Berkeley
Data Collection and Sampling
Data-Model Assimilation in Ecology History, present, and future Yiqi Luo University of Oklahoma.
Sunday, July 22, 2012 Plan Areas of coverage: high-level neurological system process, inc. sensory perception, sensory processing, cognition transmission.
EVOLUTION An Introduction. Evolution Evolution = a change over time In biology, it is the series of facts, observations, and hypotheses about the history.
Nonresponse Rates and Nonresponse Bias In Surveys Robert M. Groves University of Michigan and Joint Program in Survey Methodology, USA Emilia Peytcheva.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Gap-filling and Fault-detection for the life under your feet dataset.
Theoretical Perspectives THE SOCIAL SCIENCES. THEORETICAL PERSPECTIVES Disciplines are specific branches of learning. Identifies a point of view based.
Power analysis for a Mark recapture project Dan Bachen Department of Ecology Montana State University.
What is involved in Psychological Methods? Within this chapter, we will focus on: 1.The various methods used by psychologists to study behaviour scientifically.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
WP3 WP6 USE CASE DATA MODEL FUSION USING PHENOLOGICAL DATA TO INFORM PRODUCTIVITY MODEL Andy Fox, David Moore, Jesus Marco de Lucas, Jeff Taylor, and many.
BIOL 4240 Field Ecology. Ecologists are often interested in spatial data… Plant ecologists, distribution of individuals. Animal ecologists, distribution.
Association between genotype and phenotype
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
J. Jasche, Bayesian LSS Inference Jens Jasche La Thuile, 11 March 2012 Bayesian Large Scale Structure inference.
Remote-sensing and biodiversity in a changing climate Catherine Graham SUNY-Stony Brook Robert Hijmans, UC-Berkeley Lianrong Zhai, SUNY-Stony Brook Sassan.
Workshop on Applied Hierarchical Modeling in BUGS and unmarked Patuxent Wildlife Research Center November 2015.
Developing long-term homogenized climate Data sets Olivier Mestre Météo-France Ecole Nationale de la Météorologie Université Paul Sabatier, Toulouse.
Locations. Soil Temperature Dataset Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty.
Observation vs. Inferences The Local Environment.
1 Guess the Covered Word Goal 1 EOC Review 2 Scientific Method A process that guides the search for answers to a question.
 1 Species Richness 5.19 UF Community-level Studies Many community-level studies collect occupancy-type data (species lists). Imperfect detection.
Comparative methods wrap-up and “key innovations”.
Multiple Detection Methods: Single-season Models.
Extracting time series from occurrence records Nick Isaac Cross-taxa analysis of community dynamics: 4/11/15.
Nick Isaac Arco van Strien*, Tom August & David Roy Biological Records Centre, Centre for Ecology & Hydrology *Statistics Netherlands Extracting trends.
Nordic Cooperation on Biodiversity Informatics Hannu Saarenmaa NordBIN meeting Uppsala /03.
1 Occupancy models extension: Species Co-occurrence.
National Biological Information Infrastructure Tom Lahr USGS Biological Resources Division, Office of Biological Informatics and Outreach Information Technology.
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
Multiple Season Study Design. 2 Recap All of the issues discussed with respect to single season designs are still pertinent.  why, what and how  how.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
 Occupancy Model Extensions. Number of Patches or Sample Units Unknown, Single Season So far have assumed the number of sampling units in the population.
David Amar, Tom Hait, and Ron Shamir
Combining Species Occupancy Models and Boosted Regression Trees
Scientific Method.
Detection of anthropogenic climate change
To learn more, visit The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions.
Sic Transit Gloria Telae: Towards an Understanding of the Web's Decay
Ecology Lesson 3 What causes populations to change in size?
WHAT IS CITIZEN SCIENCE?
Presentation transcript:

Bias, Information, Signal and Noise in Citizen Science data Nick Isaac Phot ocredit: Rich

Defaunation in the Anthropocene Dirzo et al., (2014) Science, 345: 401–406

Biological Recording A rich history Millions of records Opportunistic recording is biased in time in space detectability effort per visit Effort Number of Species

The problem For any research question, how can we extract biological signal from noisy data?

Detecting signal amidst the noise Methods for trend estimation: Aggregation Data Selection methods Correction for sampling effort Bayesian Occupancy models (modelling the data collection process)

Statistics for Citizen Science Occupancy models are robust to several forms of biases in opportunistic data, and more powerful than other methods Isaac et al (2014) Methods in Ecology & Evolution 5:

Occupancy: modelling data collection Extant Extinct Occupancy (unobserved) Separation of “state” and “data generation” process Annual estimates of both occupancy & detection probabilities Observer model: p Detection ~ ListLength Observations Data generation process Year 1Year 2Year 4Year 3Year 5

Occupancy models for British bees Nick Owens Bombus bohemicus

Bias vs Information Isaac & Pocock (2015) Biol J Linn Soc 115: We can’t tell the difference between these ! The information content of a dataset is question-dependent and depends on survey effort

Do citizen scientists record assemblages? Biological recording is unstructured, but many citizen science projects have structure What is the assemblage?

Do citizen scientists record assemblages? Information about the data collection process (meta-data) is critical for making robust inferences from citizen science data What would happen if I treat the data as biological records?

What does this mean? More sophisticated models? Pagel et al. (2014) Methods in Ecology & Evolution 5: Your data will outlive your project! We need to invest in better systems: Meta-data Data standards Ontologies and controlled vocabularies We need to understand more about the behaviour, motivation and aptitude of citizen scientists

What have we learned? We have the tools to model biodiversity change using citizen science data We shouldn’t remove the bias but model it Occupancy models make this possible A little bit of meta-data would go a long way = a vast untapped resource (but it could be improved)

Acknowledgments Tom August, Arco van Strien, Marnix de Zeeuw, David Roy Michael Pocock Charlie Outhwaite, Gary Powney Colin Harrower, Helen Roy, Chris Preston, Mark