DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

Slides:



Advertisements
Similar presentations
SCADM Report Working Paper 10. Overview SCAR Data and Information Management Strategy (DIMS) – endorsed Oct Introduction to the draft SCAR Data.
Advertisements

Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.
Environmental Information Data Centre: enabling the discovery of CEH-held data John Watkins Deputy Director EIDC.
Earth System Curator Spanning the Gap Between Models and Datasets.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010.
Caro-COOPS Data Management: Metadata. Cast-Net addresses the need for improved connectivity among coastal observing systems by creating a regional framework.
1 ORNL DAAC: Data and Services Robert Cook and Suresh SanthanaVannan Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Presentation.
Matthew B. Jones Jim Regetz National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara NCEAS Synthesis Institute.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
Elements of a Data Management Plan
U.S. Department of the Interior U.S. Geological Survey USGS Data Management Training Modules: Value of Data Management “Data is a precious thing and will.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
SAFARI 2000 Data Activities at the ORNL DAAC Bob Cook, Les Hook, Stan Attenberger, Dick Olson, and Tim Rhyne Oak Ridge National Laboratory.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Inter-American Workshop on Environmental Data Access Panel discussion on scientific and technical issues Merilyn Gentry, LBA-ECO Data Coordinator NASA.
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Citing Data Sets in the Literature: ORNL DAAC Practices Robert Cook, Suresh SanthanaVannan, and Daine Wright Environmental Sciences Division Oak Ridge.
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Enhancing Linkages Between Projects and Datasets: Examples from LBA-ECO for NACP Lisa Wilcox, Amy L. Morrell,
Data Citation and Data Attribution A View from the Data Center Perspective Bruce E. Wilson Group Lead, Client & Collaboration Technologies Oak Ridge National.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
GLOBAL BIODIVERSITY INFORMATION FACILITY Éamonn Ó Tuama Senior Programme Officer, IDA 21 June Metadata publishing with the IPT.
Information International Associates (IIa) December 7, 2010 Scientific Data: Increasing Transparency and Reducing the Grey Bonnie C. Carroll, President.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Introducing Australia’s Terrestrial Ecosystem Research Network: linking disciplines for better environmental outcomes. Nikki Thurgate.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
1 Not So Strange Bedfellows: Information Standards For Librarians AND Publishers November 6, 2015.
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
1 Metadata Data integration for tackling global environmental challenges - Rebecca Koskela, Keith Jeffery, Jane Greenberg, Alex Ball.
LTER Science 2050: Challenges, Constraints and Opportunities Bill Michener Professor and DataONE Project Director University of New Mexico 12 September.
Metadata ESA Workshop. In this session we will discuss…  Metadata: what are they? and why should they be created?  Metadata standards  Creating metadata.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
1. 2 NOAA’s Mission To describe and predict changes in the Earth’s environment. To conserve and manage the Nation’s coastal and marine resources to ensure.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Enabling Digital Earth by focussing on ‘accessibility’ rather than ‘delivery’. Ryan Fraser CSIRO.
CEOS Working Group on Information System and Services (WGISS) Data Access Infrastructure and Interoperability Standards Andrew Mitchell - NASA Goddard.
Pasquale Pagano (CNR-ISTI) Project technical director
Vision... “… a network of learning environments and resources for Science, Mathematics, Engineering and Technology education, will ultimately meet the.
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Standardization Promotes Biogeochemical Data Management and Use in Multidisciplinary Environmental Research Yaxing Wei, Suresh Vannan, Robert B. Cook,
DataNet Collaboration
Flanders Marine Institute (VLIZ)
CNI Spring 2010 Membership Meeting
Prepared by: Jennifer Saleem Arrigo, Program Manager
ORNL is Operated by UT-Battelle for DOE
Jisc Research Data Shared Service (RDSS)
Bird of Feather Session
Make EML with r and share on github
Presentation transcript:

DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National Laboratory February 6, 2013 NACP All-Investigator Meeting

2 The DataONE Vision and Approach: Providing universal access to data about life on earth and the environment that sustains it, as well as the tools needed by researchers. 1. Building community 2. Developing sustainable data discovery and interoperability solutions 3. Supporting researcher tools and services

3 The long tail of orphan data Volume Rank frequency of datatype Specialized repositories (50%) Orphan data (50%) (B. Heidorn) 3 Characteristics Big Science Large Volume Automated sensos Well described Well curated Easily Discovered Small Science Small Volume Poorly described Rarely Indexed Invisible to scientists Rarely Used Dark Data High spatial resolution Process based Theory Development Model Development Benchmarking Characteristics

4 ✔ Check for best practices ✔ Create metadata ✔ Connect to ONEShare Data & Metadata (EML)

5 Sponsor Requirements for Data Management Credit for data through citation, DOI, and Data Citation Index Training in Data Management Improved tools for data preparation – DataUp Developing a metadata editor Model-Data Fusion: Harnessing Observations

6 Model-Data Fusion: Data System Characteristics (1) Dedicated financial support for data management is essential Close coordination between the data group(s) and the producers (experimentalists) and users (modelers) of the data products Based on a data management plan and a data policy Integrated system that delivers a suite of diverse products Establish standards (file, workflow, network) and promote interoperability Processes to assure and document data quality to allow proper interpretation and use 6

7 Facilitate rapid exchange of data, products, and information; rapid exchange of large volume data Promote the use of best practices to prepare and document data to share and archive Make efficient use of existing data management infrastructure and resources Ensure that finalized data and associated documentation are transferred to an appropriate archive Make numerical models (source code) and description of the models available, along with model parameters and example input and output data (Thornton et al 2005) (Thornton et al 2005) Model-Data Fusion: Data System Characteristics (2) 7

8 Interoperability KNB LTER ORNL DAAC Internal Metadata Index CDL Coordinating Nodes Metadata Extraction Virtual Portals Numerous search capabilities Metadata has link to data, which reside at Member Nodes USGS CSAS DRYAD Member Nodes Future EML, ISO FGDC FGDC, ISO EML FGDC METS FGDC, ISO

9 The long tail of orphan data Volume Rank frequency of datatype Specialized repositories (e.g. Remote Sensing, NEON) Orphan data (B. Heidorn) “Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray 9

10 Decreasing Spatial Coverage Increasing Process Knowledge Adapted from CENR-OSTP Remote sensing Intensive science sites and experiments Extensive science sites Volunteer & education networks “Data intensive science” and the “80:20 rule” 10