Computer, what is the trajectory of the planet Seti Alpha 5?
How many algal species can be found on this planet?
What species is this?
BIG = data-centric (like particle physics and astronomy) Characterized by data sharing via a virtual pool New = new skill sets, tools, cyber- infrastructure to exploit the data pool Data driven discovery as a new means of understanding GenBank as a model within the Life Sciences
Large number of providers with small amounts of data. Small number of providers with lots of data.
Aa paleacea Limulus polyphemus Kiwa hirsuta Osedax frankpressi Kingia australis Pieris japonica Pieris rapae Trypanosoma brucei Homo sapiens
Didimosphenia geminata Didymosphenia geminata Rock snot Didymo Echinella geminata Gomphonema geminatum Gomphonema vulgare
Didymosphenia geminata Didimosphenia geminata Didymo Rock Snot Echinella geminata Gomphonema geminatum Gomphonema vulgare
Didymosphenia geminata Didimosphenia geminata Didymo Rock Snot Echinella geminata Gomphonema geminatum Gomphonema vulgare
Contextual data Diatom Chloroplast Frustule Benthic Marine Disambiguate by authority, species, contextual data Contextual data Food Moth Wings Exoskeleton Caterpillar
Provider Services DATA AND SERVICE CONSUMERS DATA AND SERVICE PROVIDERS EXPERTS Consumer Services GNA
Managing names to manage biodiversity data - All names (scientific vernacular surrogate) - For all organisms - Many names for one species reconciled - One name for many species disambiguated Global Names Architecture - a virtual layer, using names services to link together distributed data Globalnames.org Micro*scope (microscope.mbl.edu) and Encyclopedia of Life (eol.org)
Narrative tradition in biology Too much for a human Can we get a machine to do the work? NLP!!!
Use NLP/machine learning to extract names and characters Hong Cui
Spirogyra:chloroplasts:present
Spirogyra:chloroplasts:present:attribution
coffee is a drink
Triple Store
Informatics/computing training Modified workflows Importance of data management and preservation
Big New Biology is coming, taxonomy can benefit from being a part of it Existing data can be made machine-readable using information extraction algorithms Existing workflows can be modified to capture data close to the source Data can be shared using the semantic web
Dima Mozzherin David Shorthouse Sayeed Choudhury Pete DeVries