Biodiversity Informatics at the Natural History Museum Ed Baker Terrestrial Invertebrates, Department of Life Sciences & NHM Informatics Initiative
Science as a Slow Cooker Only the surface visible Lid kept on for extended periods of time Uses cheap cuts of raggy meat Ingredient lose their nutritional value Children at risk due to high temperatures
We like data 70 million+ specimens collected over 400 years 350,000+ books ??? Unpublished datasets in archive, notebooks, computers ??? In the minds of staff
How do we provide access? Digitisation of specimens and associated data Scanning and transcribing books, journals, archives Providing tools for managing the data life cycle Changing the way we publish: data publication
Flowing Data Publication Collection Curation Use
Flowing Data Collection Curation Somebody retires Somebody dies Project is cancelled Sits in desk drawer or on a hard drive until….
Flowing Data Collection Curation Use Data Publication Re-use Publication Re-use
Flowing Data: from collection to reuse Collection Curation Use Data Publication Re-use Publication Re-use
Collection Citizen Science Automated identification and monitoring Traditional taxonomic sources
Flowing Data: from collection to reuse Curation Use Data Publication Re-use Publication Re-use
Curation Websites for communities to publish and curate: Taxonomy / nomenclature Bibliographies Specimen information Character matricies
Flowing Data: from collection to reuse Use Data Publication Re-use Publication Re-use
Use: Oboe
Flowing Data: from collection to reuse Data Publication Re-use Publication Re-use
Publication (Data) Datasets Single species descriptions Checklists Software
Flowing Data: from collection to reuse Re-use Publication Re-use
Publication (Research) Traditional research Systematic zoology Phylogeny Biogeography
Flowing Data: from collection to reuse Re-use
The Problem of Scale Data is being generated by tens of thousands of researchers, in thousands of institutions Hard to find what you need Hard to know if what you need actually exists Impossible to go through researcher by researcher
NHM Data Portal Aggregator for NHM science data Visualisation tools for datasets Allows export of NHM data for re-use
The Informatics Landscape >18K specimen records (local small scale coverage) >276M specimen records (worldwide coverage)
The Informatics Landscape A webpage for every species Aggregate specimen and observation data globally
Wikimedian in Residence Make NHM content available under open licenses for use on Wikimedia projects (and elsewhere) Reach of Wikipedia: BBC, Encyclopedia of Life Wikisource: Transcription and translation crowd-sourcing
Flowing Data: from collection to reuse ? ?
"Everybody makes mistakes. And if you don't expose your raw data, nobody will find your mistakes." Jean-Claude Bradley