Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Slides:



Advertisements
Similar presentations
Digital Library Service at Higher Education in India
Advertisements

BEER Workshop November 9, 2008 Has Data Management Gone Mainstream? Presented at the BEER Workshop Coconut Grove (Miami), Florida November 9, 2008 Robert.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Visualizing Fitness for Purpose Bob Groman and Dicky Allison Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution.
Research Assessment Exercise 2006 University Grants Committee.
A Framework for Earth Science Search Interface Development Designing and Implementing S2S Eric Rozell, Tetherless World Constellation, RPI.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
Caro-COOPS Data Management: Metadata. Cast-Net addresses the need for improved connectivity among coastal observing systems by creating a regional framework.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
OGC Technical Committee Meeting National Resources and Environment Working Group 1 The JGOFS/GLOBEC Data Management System for Serving Physical and Biological.
Systems Oceanography: Observing System Design. Why not hard-wire the system? Efficiency of interface management –Hard-wire when component number small,
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Biological and Chemical Oceanography Data Management Office 1 of 12 An Introduction to the Biological and Chemical Oceanography Data Management Office.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.
An Oceanographic Event Logger James R. Wilkinson and Karen S. Baker Scripps Institution of Oceanography, University of California San Diego Field Practices.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Information and Discovery in Neuroscience (IDN) Carole Palmer Graduate School of Library and Information Science University of Illinois at Urbana-Champaign.
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
BUSINESS INFORMATICS descriptors presentation Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST) Linkoping.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Ocean Biodiversity Informatics conference Hamburg, 29/11-1/12/2004 Ocean Biodiversity Informatics International Conference on Marine Biodiversity Data.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Numerous common gaps… … more or less difficult to fill. Environmental Sciences and biodiversity conservation policies Rio Seminar. August 28, 2008.
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
VERTIGO data OCB database status update Cyndy Chandler Ocean Carbon and Biogeochemistry Data Management Office Cyndy Chandler Ocean Carbon and Biogeochemistry.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Soil and Water Conservation Modeling: MODELING SUMMIT SUMMARY COMMENTS Dennis Ojima Natural Resource Ecology Laboratory COLORADO STATE UNIVERSITY 31 MARCH.
Biological and Chemical Oceanography Data Management Office slide 1 of 19 CAMEO Data Management Bob Groman Biological and Chemical Oceanography Data Management.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
We take the argument of emergence very seriously: the elements which we have defined here are analytic resources rather than causal factors. They have.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
November 16, 2009 Page 1 of 28 Data and Data Management: Introduction to the BCO-DMO Presented to Professor Keiichi Uchida November 16, 2009 Robert C.
U.S. GLOBEC Georges Bank 2007 Phase 4B SI Meeting April 23, 2007 GoMODP, Data Interoperability and the MapServer Interface to U.S. GLOBEC Data Presented.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Advanced Semantic Technologies Project S2S Framework Evaluation Eric Rozell, Tetherless World Constellation.
GeoLink Overview Goal: Develop Semantic Web technologies that facilitate discovery (and reuse) of geoscience data.Goal: Develop Semantic Web technologies.
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
MDL Information Systems, Inc. Powering the Process of Invention Donna del Rey Director, Business Planning
US GLOBEC Georges Bank Phase 4B Scientific Investigators’ Meeting 1 Presentation to the US GLOBEC Georges Bank Phase 4B Scientific Investigators October.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Biological and Chemical Oceanography Data Management Office slide 1 of 10 U.S. GEOTRACES Data Management Cyndy Chandler BCO-DMO ~ WHOI 23 September 2008.
Biological and Chemical Oceanography Data Management Office slide 1 of 22 Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Importance of Methods’ Selection in the Geosciences Studies and Exploration Mustafa M Hariri Geosciences Department King Fahd University of Petroleum &
Biological and Chemical Oceanography Data Management Office slide 1 of 10 The Biological and Chemical Oceanography Data Management Office (BCO-DMO) Cyndy.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
The Semantic eScience Framework AGU FM10 IN22A-02 Deborah McGuinness and Peter Fox (RPI) Tetherless World Constellation.
Human Social Dynamics: Interoperability Strategies for Scientific Cyberinfrastructure: The Comparative Interoperability Project ( ) initiates a.
Acknowledgments Funding provided by the Jewett Foundation Introduction Data collected in ocean sciences, whether generated from research or operational.
EarthCube Sustaining the Geosciences for 21 st Century Challenges Credits: from top to bottom: NOAA Okeanos Explorer Program (CC BY-SA 2.0), NASA/Kathryn.
Data and Data Management: Introduction to the BCO-DMO
Digital library for Earth System Education Teaching Boxes
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access and Interoperability. 19 December 2008 San Francisco, CA P.H. Wiebe, R.C. Groman, C. Chandler, M.D. Allison, and D. Glover Woods Hole Oceanographic Institution Woods Hole, MA, USA

A Context Data and Information in oceanography in general are expanding at a rapid pace and there is a significant need for more and better management tools and techniques to preserve and serve them.

Talk Objectives To discuss current developments and new directions to enable better opportunities for data discovery, integration, and synthesis of oceanographic data regardless of origin. To encourage comprehensive efforts to establish broadly based and accepted best practices in the quest to obtain new information about ocean physics, chemistry, biology, geology, and geophysics. To highlight some of the changes I have observed during the past four decades and strongly endorse the New Age that is fast approaching in the way we gather, store, access, and analyze information and data.

A Personal Context I have worked throughout my career as a biological oceanographer on multi-investigator and multi-disciplinary programs and projects. I realized early on that data and information management was an essential element in design, acquisition, and synthesis of data sets in the oceanographic scientific enterprise. But the technology (hardware/software), resources (funding), and mandates were not in place until recently to do it effectively. The effort now is on more than data and information management. It involves what is termed “Data informatics”.

Informatics Defined “Informatics is the science and engineering that occupies the gap between information and communications technology (ICT) systems and cyberinfrastructure (computers, grids, Web services, etc.), and the use of digital data, information, and related services for research and knowledge generation.” From: Baker, D.N., C. E. Barton, W. K. Peterson, and P. Fox Informatics and the 2007–2008 Electronic Geophysical Year. Eos. 89(48):

1976 CCR Program1982 WCR Program 1999 GLOBEC Program Evolution of MOCNESS Data Acquisition HP2100 CBM 8032 Windows PC

Sampling in the Cold-Core Ring Program Cruises Total PO, bio-process, & mapping

Sampling in the Warm-Core Ring Program Cruises Total 6 PO 3 bio-process 3 bio-mapping 2 bio-process & mapping Knorr Endeavor Oceanus

Sampling in the U.S. GLOBEC Georges Bank Program Cruises Total 31 Broad-scale 91 process and mooring.

Data Storage 1970’s – Honeywell Sigma 7 - Simple File Storage plus the Sigma 7 Extended Database Management System. MOCNESS data only – terminal access. 1980’s Digital VAX 11/780 - Flat File Storage – all data – terminal access. Micro-computers with floppies and small hard-drives. 1990’s Sun/Unix-Linux Server’s - GLOBEC Data & Information Management system – project specific - all data – web available. Micro-computers become mainstay for labs. 2000’s Unix/Linux Server’s – BCO-DMO Data & Information Management system – multiple projects – web available

The Biological and Chemical Oceanography Data Management Office (BCO-DMO) The BCO-DMO was initially created in late 2006 to serve PIs funded by the NSF Biological and Chemical Oceanography Sections to serve investigators funded by the National Science Foundation to conduct marine chemical and ecological research. BCO-DMO provides open access to marine biogeochemical and ecological data and information developed in the course of scientific research can easily be disseminated, protected, and stored on short and intermediate time- frames. [

Theorem 1: The probability that all the necessary data and information are collected and preserved to allow another researcher to properly use your data is inversely proportional to the time since the data were collected. Corollary: Unless data and information are collected and preserved during the experiment (e.g., cruise), subsequent researchers will have a difficult time using those data. Theorem 2: The longer the time since the data were collected the less likely the data will ever be considered “final” or available. Groman’s Theorems Conclusion: It is essential that data and information management begin with the start of a project or program.

The Rise in Interdisciplinary Oceanography and Collaboration in Ocean Science have been emphasized by Powell (2008) and Briscoe (2008). Powell, T.M The rise of interdisciplinary Oceanography. Oceanography. 21(3): Briscoe, M.G Collaboration in the Ocean Sciences. Oceanography. 21(3): Powell: “Ocean science has long been interdisciplinary… Today, one can scarcely conceive of an oceanographic question that does not cut across disciplines.” Briscoe: “Ocean science must head toward more collaboration, because many of the research and applications questions we face demand teams of scientists and engineers (and probably social scientists and economists)…..Collaboration in the ocean sciences is critical to addressing emerging ocean problems, and is worth the effort.” It will take data informatics to make it possible! The Informatics Imperative

What has happened to cause a change? Computers more powerful and storage much larger. Software and software tools to handle data management now widely available. More multi-disciplinary research is happening that is building on the works of earlier programs and the earlier data are needed for current and future work. Programs have policies that require data sharing in reasonable time frames (~2 years) Program Managers are requiring that data be made publicly web accessible from previous grants in order to get the funding for the next grant.

Still resistance to sharing data – Why?  Scientist does not want others to use the data - fear of lost opportunities.  Scientist does not know how to do it.  Other Reasons expressed: Structural Impediments I’m not done publishing my papers based on the data. My graduate student is almost done analyzing the data. It’s not final yet. Lack of positive acknowledgment of data shared (give credit on par with papers? Need for DOI’s).

Reasons for sharing data  Scientist’s data are not nearly as valuable by themselves as they are in the context of all the other data sets collected within a program.  Use of other’s data within a program without sharing their data is not fair.  Data publishing with author citable references is coming. Scientists will get credit for putting their data in public repositories. There are real advantages to sharing.

Data Informatics Semantic Web RDF OWL SPARQL BASIN – an example of a prospective new program that will require all the Data Informatics and management techniques possible. Ontology web language (OWL); Resource Description Framework (RDF); SPARQL Query Language for RDF

Research in oceanography proceeds along three major lines: field observation, field and laboratory experimentation, and modeling. Data management and informatics until now have been an after-thought. Efforts like ecosystem-based management requires the integration of oceanographic, biodiversity, fisheries, and other marine environmental data, as well as the development of analysis and assessment tools. Exponential increase in data sources and the proliferation and distributed nature of databases have created a fourth new and important line of marine research. Data management and informatics is now on par with lines of oceanographic research (Baker et al. 2008). Summary FO EX MO Past EX FO MO DM&I Future Baker, D.N., C. E. Barton, W. K. Peterson, and P. Fox Informatics and the 2007–2008 Electronic Geophysical Year. Eos. 89(48):

Research priorities include: More rapid and efficient data acquisition, Enhanced data management, More effective data utilization and reuse, and Improved data visualization Development of ontologies. The ultimate goal is to create a cyberinfrastructure for oceanography that enables open, transparent, interoperable access to data and information, regardless of their location. Summary

Acknowledgments Charlton Galvarino for his excellent skill in implementing the MapServer interface. Huan-Xiang Xu for his help during the metadata database design and his help in the initial loading of the database. Xiaoyan Ye for her help in the initial attempts to develop comprehensive search options, geospatial displays of all the data, and for updating software to take advantage of the new database. Julie Allen for her extensive help and support in implementing our BCO-DMO web site using Drupal and in using Cold Fusion to provide web access to the database. National Science Foundation supported our work under grant numbers OCE and ANT Thanks To: