Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB Neotoma DB Jack Williams Simon Goring UW-Madison
Many Big Questions require assembly of individual records into larger networks Do global temperatures lead or lag CO 2 during deglaciations? 21,000 11,000 Modern 15,000 7,000 % Spruce distributions: last glacial maximum to present % % % No Data Williams et al. (2004) Ecological Monographs Spruce Pollen Ice How far and fast can species migrate when climates change? Global temperatures & CO 2 : 22ka->0ka Shakun et al. (2012) Nature
Paleoecological Data: Key characteristics ‘Long Tail’: Collected in the field by small scientific teams. Workers vary w.r.t. data management expertise, capacity, interest Highly valuable – specimens & samples collected decades ago are still analyzed Scientific expertise distributed by proxy type, region, time period, and/or taxonomic group C4P
Community Data Repositories have emerged to tackle these bigger questions Neotoma DB Key Characteristics Open Data Curated by Community Standardized Taxonomy Time: Age Controls and Age Models Paleobiology DB
PALEOBIOLOGICAL DATA CONSORTIUM COMMUNITY GEODATA OPEN-SOURCE BIODATA Paleobiology DB NOW DB Continental Scientific Drilling Office (CDSCO) Digimorph NOAA Paleoclimatology DarwinCore iDigPaleo MorphoBank Neotoma DB VertNet Early Career Members-at-Large ROpenSci GBIF/BISON STEPPE Open Geospatial Consortium Integrated Earth Data Alliance iDigBio C4P Share best practices & protocols Build compatibility between geo- & bioinformatics
Neotoma Paleoecology Database: Design Concepts Spatiotemporal database: species occurrences & abundances in space and time Age controls and age models stored Centralized IT and Distributed Scientific Governance. Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP) Open data accessible via Explorer, APIs, R Neotoma Broad user community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB
Time: Late Neogene (~last 5 million years) Most records: yrs Space: North American to Global Datasets: Plants & pollen Vertebrates Ostracodes Diatoms Insects Testate Amoebae Physical Sedimentology Brewer et al TREE Neotoma Domain Temporal Domains of Paleoecological Databases Neotoma DB
Paleoecol- ogists Ecosystem Modelers Biogeograph- ers Neotoma DB Neotoma as Boundary Organization Data Users Paleoecologists Pollen Vertebrates Insects Diatoms Ostracodes Amoebae Packrat Middens Informatics & Computer Scientists IEDA GeoWS Open Core Paleoclimat- ologists Best Practices Shared Protocols Data New Questions
Paleodata Workflows: State of Field 1.Cores Collected 2.Cores Split, Sampled, Logged 3.Proxies Measured by PIs 4.Papers Written 5.Data & Metadata Assembled 6.Data Deposited (Journals, NOAA-Paleo, Neotoma, etc.) Consequences: Variably documented data Challenging project management Multiple inefficiencies, sources of data friction Synthetic research hard at anything beyond site scale Neotoma DB 7.Data Synthesized into Regional-Global Studies 9.New Analyses. 8.Metadata gaps discovered
Key Need: Integrated Data Workflows 1.Cores Collected, Tagged with IGSNs, Metadata Logged In Field 2.Cores Split, Sampled, Logged, Samples Tagged with IGSNs, Data Stored in Common Data Structures (Open Core Data) 3.Proxies Measured by PIs, Data Stored in Common Data Formats 4.Papers Written, Embargoed Data Passed to Community Data Repositories (e.g. Neotoma) 5.Data & Metadata Assembled 6.Paper Published, Embargo Lifted from Repository Neotoma DB
Current & Future Neotoma Activities 1.Data Uploads 2.Partnership with LacCore/CDSCO et al. to establish common standards & linked data flows 3.neotoma R – establishing data models, integration with R packages 4.API development, user-driven 5.New tools for data visualization & exploration Neotoma DB Neotoma 2 2 Users
This talk represents the work of many Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams C4P Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams Neotoma DB NSF-Geoinformatics NSF-Earth Cube Eric Grimm C4P