Peter Cornillon Graduate School of Oceanography University of Rhode Island Presented at the NSF Sponsored Cyberinfrastructure Meeting 31 October 2002 OPeNDAP:

Slides:



Advertisements
Similar presentations
Conversion of CPC Monitoring and Forecast Products to GIS Format Viviane Silva Lloyd Thomas, Mike Halpert and Wayne Higgins.
Advertisements

Forest Markup / Metadata Language FML
Programming Paradigms and languages
OPeNDAP’s Server4 Building a High Performance Data Server for the DAP Using Existing Software Building a High Performance Data Server for the DAP Using.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
1 NODC, Russia GISC & DCPC developers meeting Langen, 29 – 31 March E2EDM technology implementation for WIS GISC development S. Sukhonosov, S. Belov.
Data - Information - Knowledge
Technical Architectures
The HITCH project: Cooperation between EuroRec and IHE Pascal Coorevits EuroRec 2010 Annual Conference June 18 th 2010.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
Integrated Ocean Observing System Data Management and Communications March 2004 The US Integrated Ocean Observing System (IOOS) Plan for Data Management.
Chapter 2 Database Environment Pearson Education © 2014.
Peter Cornillon University of Rhode Island Presented at the 10 September 2003 NVODS Workshop Washington DC The National Virtual Ocean Data System (NVODS):
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Open Cloud Sunil Kumar Balaganchi Thammaiah Internet and Web Systems 2, Spring 2012 Department of Computer Science University of Massachusetts Lowell.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
05 December, 2002HDF & HDF-EOS Workshop VI1 SEEDS Standards Process Richard Ullman SEEDS Standards Formulation Team Lead
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
International Workshop on Web Engineering ACM Hypertext 2004 Santa Cruz, August 9-13 An Engineering Perspective on Structural Computing: Developing Component-Based.
Marketing Management Online marketing
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
The european ITM Task Force data structure F. Imbeaux.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
W HAT IS I NTEROPERABILITY ? ( AND HOW DO WE MEASURE IT ?) INSPIRE Conference 2011 Edinburgh, UK.
Peter Cornillon University of Rhode Island Presented at the 12 September 2003 NVODS Workshop Washington DC NVODS: Summary.
Manag ing Software Change CIS 376 Bruce R. Maxim UM-Dearborn.
Draft GEO Framework, Chapter 6 “Architecture” Architecture Subgroup / Group on Earth Observations Presented by Ivan DeLoatch (US) Subgroup Co-Chair Earth.
Observing System Monitoring Center (OSMC) Status Update April 2005 Steve Hankin – PMEL (co-PI) Kevin Kern – NDBC (co-PI)
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
1 Overall Architectural Design of the Earth System Grid.
Marine Metadata Interoperability Acknowledgements Ongoing funding for this project is provided by the National Science Foundation.
Module 4: Systems Development Chapter 13: Investigation and Analysis.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Cooperation & Interoperability Architecture & Ontology.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Sun Earth Connection Distributed Data Services Presented at the Principle Investigator's Meeting NASA's Applied Information Systems Research Program 5.
Chapter 2 Database Environment.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
The NOAA Operational Model Archive and Distribution System NOMADS CEOS-Grid Application Status Report Glenn K. Rutledge NOAA NCDC CEOS WGISS-19 Cordoba,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Data Browsing/Mining/Metadata
James Gallagher OPeNDAP
REST- Representational State Transfer Enn Õunapuu
MERRA Data Access and Services
University of Technology
Outline Pursue Interoperability: Digital Libraries
Chapter 2 Database Environment Pearson Education © 2009.
Database Environment Transparencies
OPeNDAP: Accessing Data in a Distributed, Heterogeneous Environment
OPeNDAP’s Server4: Building a High Performance Data Server for the DAP
OPeNDAP/Hyrax Interfaces
Presentation transcript:

Peter Cornillon Graduate School of Oceanography University of Rhode Island Presented at the NSF Sponsored Cyberinfrastructure Meeting 31 October 2002 OPeNDAP: Accessing Data in a Distributed, Heterogeneous Environment

Outline  DODS  NVODS & OPeNDAP  Interoperability: The Core Infrastructure  How OPeNDAP is being used  Lessons learned – also throughout

Distributed Oceanographic Data System (DODS)  Conceived in 1992 at a workshop held at URI.  Objectives were: –to facilitate access to PI held data as well as data held in national archives and –to allow the data user to analyze data using the application package with which he or she is the most familiar.  Basic system designed and implemented in by Gallagher and Flierl

Distributed Oceanographic Data System DODS consisted of two fundamental parts:  a discipline independent core infrastructure for moving data on the net,  a discipline specific portion related to data – population, location, specialized clients, etc.

DODS  OPeNDAP & NVODS To isolate the discipline independent part of the system from the discipline specific part, two entities have been formed:  Open Source Project for a Network Data Access Protocol (OPeNDAP)  National Virtual Ocean Data System (NVODS )

DODS  NVODS/OPeNDAP  OPeNDAP was formed to maintain and evolve the DODS core infrastructure  OPeNDAP is a non-profit corporation  OPeNDAP focuses on the discipline neutral parts of the DODS data access protocol

Objective of OPeNDAP  To provide a data access protocol allowing for machine-to-machine interoperability with semantic meaning in a distributed, heterogeneous data environment  The scripted exchange of data between computers, without human intervention.

Considerations with regard to the development OPeNDAP  Many data providers  Many data formats  Many different semantic representations of the data  Many different client types

The Core Infrastructure Interoperability

Interoperability - Metadata The degree to which machine-to-machine interoperability is achieved depends on the metadata associated with the data.

OPeNDAP and Metadata

Metadata Types We define two classes of metadata: Use metadata –needed to actually use the data. Search metadata – used to locate data sets of interest in a distributed data system.

Use Metadata We divide use metadata into two classes: Syntactic use metadata Semantic use metadata

Syntactic Use Metadata Information about the data types and structures at the computer level - the syntax of the data; –e.g., variable T represents a 20x40 element floating point array.

Semantic Use Metadata Information about the contents of the data set. e.g., variable T represents sea surface temperature with units of ºC

Semantic Use Metadata We divide semantic use metadata into two classes: Descriptive Semantic Use Metadata Translational Semantic Use Metadata

 Metadata required to make use of the data; e.g., to properly label a plot of the data  Define the translation from received values to semantically meaningful values  Examples Units of the data:  C+4   C Variable names in the data set: t  SST Missing value flags: -99  missing value

OPeNDAP and Metadata

Interoperability – Data Exchange Interoperability may be defined at any one of a number of levels ranging from:  the lowest (hardware) - how computers are linked electronically, to  the highest – semantically meaningful, machine-to-machine exchanges.

Organizational Complexity Example: Consider the different ways of organizing a multi-year data set consisting of one global sea surface temperature (SST) field per day:  one 2-d file per day sst(lat,lon) - URI  one 3-d file sst(lon,lat,time) - PMEL  one file per year with one variable per day  365 variables per file, n files for n year - GSFC

Structure Layer  Provide the capability to reorganize data so that they are in a consistent structural form.  Objective is to reduce the granularity of the data set  Example: one 3-d file sst(lon,lat,time)

Format Layer  Data values are not modified  Format transformation only between server and client  The organizational structure of the data is not modified

Structure Layer  Data values are not modified  The organizational structure of the data is modified

An OPeNDAP Structural Layer Component – The Aggregation Server  Developed by John Caron of Unidata  Is for the aggregation of grids and arrays only  Operates in the Syntactic Structural Level

OPeNDAP - NVODS Status

OPeNDAP/NVODS Server Sites OPeNDAP Server Sites

OPeNDAP Client and Server Status

Special Servers

Projects Using OPeNDAP  GODAE (Global Ocean Data Assimilation Experiment)  NOMADS (NOAA Operational Model Archive and Distribution System)  AOIMPS  ESG II - Earth System Grid II  Ocean. US (US-GOOS)  High Altitude Observatory Community

Institutions Making Heavy use of OPeNDAP 2  Ingrid - Columbia University  COLA - Center for Ocean-Land-Atmosphere  Goddard DAAC  CDC - Climate Diagnostic Center  PMEL - Pacific Marine Environment Lab

OPeNDAP Monthly Accesses (2002) Site/MonthAprilMayJuneJulyAugust URI4,85619,5043,69126,6937,440 LDEO80,70962,93046,09293,08832,084 CDC102,518153,36262,395181,974107,512 JPL3,06834,02863,3098,26013,282 COLA347,506412,991337,310400,314638,376 TOTAL535,589648,787502,797702,069785,412

OPeNDAP Unique Users (2002) SiteAprilMayJuneJulyAugust URI CDC JPL COLA

Interesting OPeNDAP Access Statistics IRI data accesses for 1 st quarter of 2002 TypeRequests%Volume (gb)% OPeNDAP191, Other2,062, Total2,254, PMEL OPeNDAP 2 ~ 35,000 with ~26,000 internal.

Lessons (Re)Learned

1. Modularity provides for flexibility The more modular the underlying infrastructure the more flexible the system. This is particularly important for network based systems for which the technology, software and hardware, are changing rapidly.

Lessons (Re)Learned 2. Data of interest will be stored in a variety of formats. Regardless of how much one might want to define the format to be used by system participants, in the end the data will be stored in a variety of formats. 2a. The same is true of translational use metadata!

Lessons Learned 3. Structural representation of sequence data sets is a major obstacle to interoperability Care must be given to the organizational structure (as opposed to the format) of the data. This is the single largest constraint to the use of profile data in NVODS.

Lessons (Re)Learned 4. “Not invented here” Avoid the “not invented here” trap. The basic concepts of a data system are relatively straightforward to define. Implementing these concepts ALWAYS involves substantially more work than originally anticipated. The “Devil’s in the details”. Take advantage of existing software wherever possible.

Lessons (Re)Learned 5. Work with those who adopt the system for their own needs. Take advantage of those who are interested in contributing to the system because the system addresses their needs as opposed to those who are simply doing the work for the associated funding. => Open source.

Lessons Learned 6. There is no well defined funding structure for community based operational systems. It is much easier to obtain funding to develop a system than it is to obtain funding to maintain and evolve a system. This is a major obstacle to development of a stable cyberinfrastructure that meets the needs of the research community.

Lessons Learned 7. It is relatively more difficult to obtain funding for applied system development than for research related to data systems. This is another obstacle to the development of cyberinfrastructure that meets the needs of the research community.

Lessons (Re)Learned 8. “Tough to teach old dogs new tricks” Introducing new technology often requires a cultural change in usage that is difficult to effect. This can negatively effect system development.

Lesser Lessons Learned 9. Some surprises encountered in the NVODS/ OPeNDAP effort  Heavy within organization usage.  Metadata focus in the past is appropriate for interoperability at the data level.  Number of variables increases almost linearly with the number of data sets.  Users will take advantage of all of the flexibility offered by a system sometimes to the disadvantage of all.  Incredible variability in the structural organization of data.

Lessons Learned 10. Metrics suggest  Increasing use of scripted requests  Large volume transfers As data systems offering machine-to- machine interoperability with semantic meaning take hold, we could well see an explosive growth in the use of the web.

Lessons Learned 11. Time to maturity is order 10 years not 3 Developing new infrastructure takes time, both to iron out all of the %^*% little details and adoption of the infrastructure takes time.

Peter’s Law The more metadata required the less data delivered Of course, the less metadata, the harder it is to use the data