Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB.

Slides:



Advertisements
Similar presentations
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Advertisements

Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
D EVELOPING S YNERGIES B ETWEEN L ARGE -S CALE R ESEARCH AND G EODATABASES : N EOTOMA A ND P AL EON Simon Goring, John W. Williams, Eric C. Grimm, Russell.
ODM2: Developing a Community Information Model and Supporting Software to Extend Interoperability of Sensor and Sample Based Earth Observations Jeffery.
U.S. Department of the Interior U.S. Geological Survey National Geospatial Technical Operations Center Towards a More Consistent Framework for Disseminated.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Drivers for a PRAGMA Biodiversity Science Expedition Reed Beaman Florida Museum of Natural History University of Florida.
Species Banks a GBIF mechanism to provide electronic access to quality species information Peter H. Schalk, Marc Brugman ETI, University of Amsterdam Tinde.
CHRONOS Cinzia Cervato, Doug Fils, Geoff Bohling, Pat Diver, Doug Greer, Brice Lambi, Josh Reed, and Xiaoyun Tang Geoinformatics 2006, May 12, 2006.
Field Project Planning, Operations and Data Services Jim Moore, EOL Field Project Services (FPS) Mike Daniels, EOL Computing, Data and Software (CDS) Facility.
GeoData 2011 Data Life Cycle: Breakout Session #4 (Pine) Breakout Moderator: Joanne Luciano Tetherless World Constellation Rensselaer Polytechnic Institute.
Impact of Cyberinfrastructure on Large Research Libraries Grace Baysinger Stanford University 2006 ACS National Fall Meeting.
Jake F. Weltzin United States Geological Survey Mark D. Schwartz University of Wisconsin - Milwaukee The RCN & the USA-NPN Founding & Current.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
Animal Species Database of China JI, Li-Qiang Institute of Zoology, CAS Beijing, China CODATA, 2006, Beijing.
SCIENCE, RESEARCH DATA, AND PUBLISHING Stewart Wills Editorial Director, Web & New Media, Science 26 February 2013.
Imagine a World…. With easy, unlimited access to scientific data from any field Where you can easily plot data of interest and display it any way you want.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
The ICDP Information Network Telework and Information Management in Scientific Drilling Projects Jens Klump and Ronald Conze GeoForschungsZentrum Potsdam.
University of Florida Florida State University
101: An Introduction to All Things ESIP Peter Fox, ESIP President January 6, 2015.
ESIP 101 An Introduction to All Things ESIP July 9, 2013
C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability data discovery is the most often cited issue in executive summaries on.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
U.S. Department of the Interior U.S. Geological Survey A vision for a global community Linda Gundersen Director Science Quality and Integrity US Geological.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
ESIP Federation: Connecting Communities for Advancing Data, Systems, Human & Organizational Interoperability November 22, 2013 Carol Meyer Executive Director.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
ESIP Federation 101 Federation of Earth Science Information Partners July 17, 2012.
Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB Neotoma DB Jack.
Encyclopedia of Life Established May 2007 First version of portal went online Feb year goals –Assemble infinitely expandable web pages for all.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Finding Water Resource Data: A Discussion David Arctur Ilya Zaslavsky OGC Hydrology DWG Workshop Sept 2015, Orleans France.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Distributed Biodiversity Information Databases A. Townsend Peterson.
The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert
Session on Disasters Management: Overview Karen Moe NASA Earth Science Technology Office WGISS-37 Meeting April 14-18, 2014.
EARTHCUBE COMMUNITY ENGAGEMENT Erin Robinson, Foundation for Earth Science OGC EarthCube Summit March 25, 2014.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
EarthCube Sustaining the Geosciences for 21 st Century Challenges Credits: from top to bottom: NOAA Okeanos Explorer Program (CC BY-SA 2.0), NASA/Kathryn.
1 US National Spatial Data Infrastructure: Common Standards and System Interoperability GITA-JAPAN 14 th Conference 5 November 2003 Alan R. Stevens, PhD.
ILYA ZASLAVSKY RAQUEL CALDERON CHRIS CONDIT JEFFREY GRETHE AMARNATH GUPTA BURAK OZYURT THOMAS WHITENACK DAVID VALENTINE ALICE GILIARINI AARON GONG University.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
U.S. Department of the Interior U.S. Geological Survey Manage and Provide Information: Examples from fish health, contaminants, and water quality data.
PALEOBIOLOGICAL DATA CONSORTIUM COMMUNITY GEODATA OPEN-SOURCE BIODATA Paleobiology DB NOW DB Continental Scientific Drilling Office (CDSCO) Digimorph NOAA.
USGS ScienceBase Making Connections with Metadata Integration GSA2011 R. Sky Bristol.
Connecting Users, Data & Data Repositories Simon J. Goring ORCID: John W. Williams doi: /m9.figshare Distinguished Lecture.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Transformative Earth Sciences through Data: Neotoma, EarthCube & Flyover Country Simon Goring Assistant Scientist University of Wisconsin - Madison S i.
F UNDED - R ESEARCH C OORDINATION N ETWORKS iSamples Best practices and standards for physical samples, including preservation, access and curation of.
National Biological Information Infrastructure Tom Lahr USGS Biological Resources Division, Office of Biological Informatics and Outreach Information Technology.
EarthCube Sustaining the Geosciences for 21 st Century Challenges Credits: from top to bottom: NOAA Okeanos Explorer Program (CC BY-SA 2.0), NASA/Kathryn.
Community-Curated Data Resources and Large-Scale Data-Model Syntheses: The Children of COHMAP John (Jack) W. Williams, University of Wisconsin,
Ilya Zaslavsky Jeffrey Grethe amarnath Gupta burak Ozyurt
Flanders Marine Institute (VLIZ)
ACS 2016 Moving research forward with persistent identifiers
Exploring Interoperability Solutions for Interplanetary Data
Recent Advances from the Neotoma Paleoecology Database
Bird of Feather Session
Presentation transcript:

Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle Tail’ between Geoscientific Users and Geoinformatics Neotoma DB Jack Williams, Allan Ashworth, Brian Bills, Jessica Blois, Don Charles, Simon Goring, Russ Graham, Eric Grimm, Alison Smith, & Mark Uhen Part I: Building the Middle Tail: Community-Led Data Repositories Part II: Interconnecting the Middle Tail: Cyberinfrastructure for the Paleogeosciences

Many Big Questions require assembly of individual paleorecords into larger networks Do global temperatures lead or lag CO 2 during deglaciations? 21,000 11,000 Modern 15,000 7,000 % Spruce distributions: last glacial maximum to present % % % No Data Williams et al. (2004) Ecological Monographs Spruce Pollen Ice How far and fast can species migrate when climates change? Global temperatures & CO 2 : 22ka->0ka Shakun et al. (2012) Nature

Paleoecological Data: Key characteristics ‘Long Tail’: Collected in the field by small scientific teams. Scientists vary w.r.t. data management expertise, capacity, interest Highly valuable: specimens & samples collected decades ago are still analyzed Distributed scientific expertise: by proxy type, region, time period, and/or taxonomic group C4P “Big Data” “Long Tail” Datasets Data Size Neotoma DB

Solution: Community-Led Data Repositories (COLDARs) as ‘middle tail’ for long-tail data Neotoma DB Key Characteristics Open Data Curated by Community Added Value by serving community-specific needs (e.g. age models, taxonomy) Paleobiology DB paleobiodb.org

Neotoma DB accessible small data BIG DATA findable identification, persistence identification, persistence authorization, protocols authorization, protocols context, provenance context, provenance re-usable harmonized, community governance & input interoperable “… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” - C.L. Borgman Moving up the Value Chain: Generic Depositories vs. Community-Led Repositories Modified from K. Lehnert Community- Led Repositories Community- Led Repositories Generic Depositories

Neotoma Paleoecology Database: Community- Led Repository for Quaternary and Pliocene Data Design Concepts Spatiotemporal Database: species occurrences & abundances in space & time Age Controls and Age Models stored Centralized IT and Distributed Scientific Governance Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP) Open Data accessible via Explorer, APIs, R Neotoma Broad User Community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB

Time: Late Neogene (~last 5 million years) Most records: yrs Space: North American to Global Paleoecological Data Plants & pollen Vertebrates Ostracodes Diatoms Insects Testate Amoebae Physical Sedimentology Brewer et al TREE Neotoma Domain Temporal Domains of Paleoecological Databases Neotoma DB

Recent uploads to Neotoma Pubs Citing Neotoma & Constituent DBs Neotoma Uploads, Citations, and Usage Last updated: July Usage Statistics Neotoma Explorer: 1,918 unique users Neotoma APIs: 1,562 unique users Neotoma APIs: 241,469 requests Neotoma DB

Data Preparation & Submission Data Search & Retrieval Neotoma Explorer APIs neotoma (R) Neotoma DB Tilia Data Exploration & Visualization Data Archival Ice Age Mapper Niche Viewer Stratigraphic Diagrams Explorer Data Submission Web Application Downloadable Database Snapshots Neotoma Software Ecosystem Exists In Development

Amoebae Data Stewards Developer Team Bills (lead) Anderson Buckland Davis Goring Grimm Roth Williams Executive Team Grimm, Williams + 1 more Users & Informaticists Paleobiological Data Consortium Neotoma Leadership Council Graham, Blois, Davis, Barnosky, Colburn, Etnier, Jacisin, Maguire, Milideo, Smith, Warren Josh Miller, Russ Graham Grimm, Williams, Bills + 1 Developer & 3 Data Stewards Bob Booth Betancourt, Holmgren, Latorre, Rylander Ashworth, Buckland, Punel Alison Smith, Brandon Curry Don Charles, Sonja Hausmann Bob Booth Suzanne Pilaar Birch, Chris Widja Jon Nichols Grimm, Bradshaw, Giesecke, Williams, Goring, Evans, Fletcher, Hopf, Markgraf, McGeever, Mitchell Training Workshops Diatoms Insects Middens Pollen Plant Macros Vertebrates Biomarkers Isotopes Taphonomy Ostracodes Neotoma Governance (Proposed) Neotoma DB

Next Challenge: Organizing and Interconnecting the Middle Tail C4P CINERGI Catalog: 224 Databases, 23 with geologic time metadata C4P CINERGI

EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P) Goals Build new partnerships and collaborations among geoscientists and technologists Survey and catalog existing resources Share news of the latest advances in cyberscience and paleogeoinformatics Facilitate development of common standards and semantic frameworks C4P

EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P) C4P Activities Webinars & YouTube Channel: r4paleo r4paleo CINERGI Catalog of paleoresources (databases, software, etc.) c4p-resource-viewer c4p-resource-viewer Paleobiology Workshop (May 2014) Geochronology Workshop (Oct 2014) Early Career Workshops – GSA 2014, 2015 New Initiatives: Paleobiological Data Consortium (Neotoma/PBDB/…, PBDB-iDigBio, Open Core Data (CDSCO/IEDA/Neotoma/…)

PALEOBIOLOGICAL DATA CONSORTIUM COMMUNITY GEODATA OPEN-SOURCE BIODATA Paleobiology DB NOW DB Continental Scientific Drilling Office (CDSCO) Digimorph NOAA Paleoclimatology DarwinCore iDigPaleo MorphoBank Neotoma DB VertNet Early Career Members-at-Large ROpenSci GBIF/BISON STEPPE Open Geospatial Consortium Integrated Earth Data Alliance iDigBio C4P Share best practices & protocols Build compatibility between geo- & bioinformatics

Current & Future Neotoma, C4P, & PDC Activities 1.Data Uploads (Neotoma; e.g. MIOMAP, Mexican Quaternary Mammal DB, ongoing) 2.All Hands Neotoma Workshop at AGU (Neotoma; Dec 2015) 3.One-Stop Queries for Neotoma & Paleobio DBs (Harmonized APIs & R packages) (PDC, ongoing) 4.Hackathon for Paleobiological Data (C4P; Summer 2016, invitations TBD!) 5.New tools for data visualization & exploration (Neotoma Taxa Mapper & Niche Viewer) Neotoma DB PDC

Sounds great! What’s in it for me? 1.Interested in using Neotoma to archive your data and make it available to others? Catch me after session Talk to a Data Steward WebEx training for new Stewards 2. Interested in using Neotoma & other paleobio resources? Neotoma Explorer walkthrough exercise: neotoma (R) paper (Goring et al Open Quaternary) User workshops: ESA2016, IBS2017 Hackathon Summer Interested in integrating your resource (software/DBs) to Neotoma & other paleobio resources? Catch me after session Hackathon Summer 2016 Neotoma DB PDC

This talk represents the work of many Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams C4P RCN Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams Neotoma DB NSF-Geoinformatics NSF-Earth Cube Eric Grimm C4P Paleobiological Data Consortium: Mark Uhen, Jack Williams, Brian Bills, Jessica Blois, Ed Davis, Simon Goring, Russ Graham, Michael McClennen, Shanan Peters, Alison Smith NSF-Earth Cube Paleobio Data Consortium