Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB Jack Williams, Allan Ashworth, Brian Bills, Jessica Blois, Don Charles, Simon Goring, Russ Graham, Eric Grimm, Alison Smith, & Mark Uhen Part I: Building the Middle Tail: Community-Led Data Repositories Part II: Interconnecting the Middle Tail: Cyberinfrastructure for the Paleogeosciences
Many Big Questions require assembly of individual paleorecords into larger networks Do global temperatures lead or lag CO 2 during deglaciations? 21,000 11,000 Modern 15,000 7,000 % Spruce distributions: last glacial maximum to present % % % No Data Williams et al. (2004) Ecological Monographs Spruce Pollen Ice How far and fast can species migrate when climates change? Global temperatures & CO 2 : 22ka->0ka Shakun et al. (2012) Nature
Paleoecological Data: Key characteristics ‘Long Tail’: Collected in the field by small scientific teams. Scientists vary w.r.t. data management expertise, capacity, interest Highly valuable: specimens & samples collected decades ago are still analyzed Distributed scientific expertise: by proxy type, region, time period, and/or taxonomic group C4P “Big Data” “Long Tail” Datasets Data Size Neotoma DB
Solution: Community-Led Data Repositories (COLDARs) as ‘middle tail’ for long-tail data Neotoma DB Key Characteristics Open Data Curated by Community Added Value by serving community-specific needs (e.g. age models, taxonomy) Paleobiology DB paleobiodb.org
Neotoma DB accessible small data BIG DATA findable identification, persistence identification, persistence authorization, protocols authorization, protocols context, provenance context, provenance re-usable harmonized, community governance & input interoperable “… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” - C.L. Borgman Moving up the Value Chain: Generic Depositories vs. Community-Led Repositories Modified from K. Lehnert Community- Led Repositories Community- Led Repositories Generic Depositories
Neotoma Paleoecology Database: Community- Led Repository for Quaternary and Pliocene Data Design Concepts Spatiotemporal Database: species occurrences & abundances in space & time Age Controls and Age Models stored Centralized IT and Distributed Scientific Governance Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP) Open Data accessible via Explorer, APIs, R Neotoma Broad User Community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB
Time: Late Neogene (~last 5 million years) Most records: yrs Space: North American to Global Paleoecological Data Plants & pollen Vertebrates Ostracodes Diatoms Insects Testate Amoebae Physical Sedimentology Brewer et al TREE Neotoma Domain Temporal Domains of Paleoecological Databases Neotoma DB
Recent uploads to Neotoma Pubs Citing Neotoma & Constituent DBs Neotoma Uploads, Citations, and Usage Last updated: July Usage Statistics Neotoma Explorer: 1,918 unique users Neotoma APIs: 1,562 unique users Neotoma APIs: 241,469 requests Neotoma DB
Data Preparation & Submission Data Search & Retrieval Neotoma Explorer APIs neotoma (R) Neotoma DB Tilia Data Exploration & Visualization Data Archival Ice Age Mapper Niche Viewer Stratigraphic Diagrams Explorer Data Submission Web Application Downloadable Database Snapshots Neotoma Software Ecosystem Exists In Development
Amoebae Data Stewards Developer Team Bills (lead) Anderson Buckland Davis Goring Grimm Roth Williams Executive Team Grimm, Williams + 1 more Users & Informaticists Paleobiological Data Consortium Neotoma Leadership Council Graham, Blois, Davis, Barnosky, Colburn, Etnier, Jacisin, Maguire, Milideo, Smith, Warren Josh Miller, Russ Graham Grimm, Williams, Bills + 1 Developer & 3 Data Stewards Bob Booth Betancourt, Holmgren, Latorre, Rylander Ashworth, Buckland, Punel Alison Smith, Brandon Curry Don Charles, Sonja Hausmann Bob Booth Suzanne Pilaar Birch, Chris Widja Jon Nichols Grimm, Bradshaw, Giesecke, Williams, Goring, Evans, Fletcher, Hopf, Markgraf, McGeever, Mitchell Training Workshops Diatoms Insects Middens Pollen Plant Macros Vertebrates Biomarkers Isotopes Taphonomy Ostracodes Neotoma Governance (Proposed) Neotoma DB
Next Challenge: Organizing and Interconnecting the Middle Tail C4P CINERGI Catalog: 224 Databases, 23 with geologic time metadata C4P CINERGI
EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P) Goals Build new partnerships and collaborations among geoscientists and technologists Survey and catalog existing resources Share news of the latest advances in cyberscience and paleogeoinformatics Facilitate development of common standards and semantic frameworks C4P
EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P) C4P Activities Webinars & YouTube Channel: r4paleo r4paleo CINERGI Catalog of paleoresources (databases, software, etc.) c4p-resource-viewer c4p-resource-viewer Paleobiology Workshop (May 2014) Geochronology Workshop (Oct 2014) Early Career Workshops – GSA 2014, 2015 New Initiatives: Paleobiological Data Consortium (Neotoma/PBDB/…, PBDB-iDigBio, Open Core Data (CDSCO/IEDA/Neotoma/…)
PALEOBIOLOGICAL DATA CONSORTIUM COMMUNITY GEODATA OPEN-SOURCE BIODATA Paleobiology DB NOW DB Continental Scientific Drilling Office (CDSCO) Digimorph NOAA Paleoclimatology DarwinCore iDigPaleo MorphoBank Neotoma DB VertNet Early Career Members-at-Large ROpenSci GBIF/BISON STEPPE Open Geospatial Consortium Integrated Earth Data Alliance iDigBio C4P Share best practices & protocols Build compatibility between geo- & bioinformatics
Current & Future Neotoma, C4P, & PDC Activities 1.Data Uploads (Neotoma; e.g. MIOMAP, Mexican Quaternary Mammal DB, ongoing) 2.All Hands Neotoma Workshop at AGU (Neotoma; Dec 2015) 3.One-Stop Queries for Neotoma & Paleobio DBs (Harmonized APIs & R packages) (PDC, ongoing) 4.Hackathon for Paleobiological Data (C4P; Summer 2016, invitations TBD!) 5.New tools for data visualization & exploration (Neotoma Taxa Mapper & Niche Viewer) Neotoma DB PDC
Sounds great! What’s in it for me? 1.Interested in using Neotoma to archive your data and make it available to others? Catch me after session Talk to a Data Steward WebEx training for new Stewards 2. Interested in using Neotoma & other paleobio resources? Neotoma Explorer walkthrough exercise: neotoma (R) paper (Goring et al Open Quaternary) User workshops: ESA2016, IBS2017 Hackathon Summer Interested in integrating your resource (software/DBs) to Neotoma & other paleobio resources? Catch me after session Hackathon Summer 2016 Neotoma DB PDC
This talk represents the work of many Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams C4P RCN Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams Neotoma DB NSF-Geoinformatics NSF-Earth Cube Eric Grimm C4P Paleobiological Data Consortium: Mark Uhen, Jack Williams, Brian Bills, Jessica Blois, Ed Davis, Simon Goring, Russ Graham, Michael McClennen, Shanan Peters, Alison Smith NSF-Earth Cube Paleobio Data Consortium