Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Cornillon Graduate School of Oceanography University of Rhode Island Presented at the NSF Sponsored Cyberinfrastructure Meeting 31 October 2002 OPeNDAP:

Similar presentations


Presentation on theme: "Peter Cornillon Graduate School of Oceanography University of Rhode Island Presented at the NSF Sponsored Cyberinfrastructure Meeting 31 October 2002 OPeNDAP:"— Presentation transcript:

1 Peter Cornillon Graduate School of Oceanography University of Rhode Island Presented at the NSF Sponsored Cyberinfrastructure Meeting 31 October 2002 OPeNDAP: Accessing Data in a Distributed, Heterogeneous Environment

2 Outline  DODS  NVODS & OPeNDAP  Interoperability: The Core Infrastructure  How OPeNDAP is being used  Lessons learned – also throughout

3 Distributed Oceanographic Data System (DODS)  Conceived in 1992 at a workshop held at URI.  Objectives were: –to facilitate access to PI held data as well as data held in national archives and –to allow the data user to analyze data using the application package with which he or she is the most familiar.  Basic system designed and implemented in 1993-1994 by Gallagher and Flierl

4 Distributed Oceanographic Data System DODS consisted of two fundamental parts:  a discipline independent core infrastructure for moving data on the net,  a discipline specific portion related to data – population, location, specialized clients, etc.

5 DODS  OPeNDAP & NVODS To isolate the discipline independent part of the system from the discipline specific part, two entities have been formed:  Open Source Project for a Network Data Access Protocol (OPeNDAP)  National Virtual Ocean Data System (NVODS )

6 DODS  NVODS/OPeNDAP  OPeNDAP was formed to maintain and evolve the DODS core infrastructure  OPeNDAP is a non-profit corporation  OPeNDAP focuses on the discipline neutral parts of the DODS data access protocol

7 Objective of OPeNDAP  To provide a data access protocol allowing for machine-to-machine interoperability with semantic meaning in a distributed, heterogeneous data environment  The scripted exchange of data between computers, without human intervention.

8 Considerations with regard to the development OPeNDAP  Many data providers  Many data formats  Many different semantic representations of the data  Many different client types

9 The Core Infrastructure Interoperability

10 Interoperability - Metadata The degree to which machine-to-machine interoperability is achieved depends on the metadata associated with the data.

11 OPeNDAP and Metadata

12 Metadata Types We define two classes of metadata: Use metadata –needed to actually use the data. Search metadata – used to locate data sets of interest in a distributed data system.

13 Use Metadata We divide use metadata into two classes: Syntactic use metadata Semantic use metadata

14 Syntactic Use Metadata Information about the data types and structures at the computer level - the syntax of the data; –e.g., variable T represents a 20x40 element floating point array.

15 Semantic Use Metadata Information about the contents of the data set. e.g., variable T represents sea surface temperature with units of ºC

16 Semantic Use Metadata We divide semantic use metadata into two classes: Descriptive Semantic Use Metadata Translational Semantic Use Metadata

17  Metadata required to make use of the data; e.g., to properly label a plot of the data  Define the translation from received values to semantically meaningful values  Examples Units of the data: 0.125  C+4   C Variable names in the data set: t  SST Missing value flags: -99  missing value

18 OPeNDAP and Metadata

19 Interoperability – Data Exchange Interoperability may be defined at any one of a number of levels ranging from:  the lowest (hardware) - how computers are linked electronically, to  the highest – semantically meaningful, machine-to-machine exchanges.

20

21 Organizational Complexity Example: Consider the different ways of organizing a multi-year data set consisting of one global sea surface temperature (SST) field per day:  one 2-d file per day sst(lat,lon) - URI  one 3-d file sst(lon,lat,time) - PMEL  one file per year with one variable per day  365 variables per file, n files for n year - GSFC

22 Structure Layer  Provide the capability to reorganize data so that they are in a consistent structural form.  Objective is to reduce the granularity of the data set  Example: one 3-d file sst(lon,lat,time)

23 Format Layer  Data values are not modified  Format transformation only between server and client  The organizational structure of the data is not modified

24 Structure Layer  Data values are not modified  The organizational structure of the data is modified

25 An OPeNDAP Structural Layer Component – The Aggregation Server  Developed by John Caron of Unidata  Is for the aggregation of grids and arrays only  Operates in the Syntactic Structural Level

26

27

28 OPeNDAP - NVODS Status

29 OPeNDAP/NVODS Server Sites OPeNDAP Server Sites

30 OPeNDAP Client and Server Status

31 Special Servers

32 Projects Using OPeNDAP  GODAE (Global Ocean Data Assimilation Experiment)  NOMADS (NOAA Operational Model Archive and Distribution System)  AOIMPS  ESG II - Earth System Grid II  Ocean. US (US-GOOS)  High Altitude Observatory Community

33 Institutions Making Heavy use of OPeNDAP 2  Ingrid - Columbia University  COLA - Center for Ocean-Land-Atmosphere  Goddard DAAC  CDC - Climate Diagnostic Center  PMEL - Pacific Marine Environment Lab

34 OPeNDAP Monthly Accesses (2002) Site/MonthAprilMayJuneJulyAugust URI4,85619,5043,69126,6937,440 LDEO80,70962,93046,09293,08832,084 CDC102,518153,36262,395181,974107,512 JPL3,06834,02863,3098,26013,282 COLA347,506412,991337,310400,314638,376 TOTAL535,589648,787502,797702,069785,412

35 OPeNDAP Unique Users (2002) SiteAprilMayJuneJulyAugust URI7368724469 CDC12410591116111 JPL122152173198174 COLA158199317197408

36 Interesting OPeNDAP Access Statistics IRI data accesses for 1 st quarter of 2002 TypeRequests%Volume (gb)% OPeNDAP191,6118.5375.269.4 Other2,062,68191.5165.230.6 Total2,254,292100.0540.4100.0 PMEL OPeNDAP 2 ~ 35,000 with ~26,000 internal.

37 Lessons (Re)Learned

38 1. Modularity provides for flexibility The more modular the underlying infrastructure the more flexible the system. This is particularly important for network based systems for which the technology, software and hardware, are changing rapidly.

39 Lessons (Re)Learned 2. Data of interest will be stored in a variety of formats. Regardless of how much one might want to define the format to be used by system participants, in the end the data will be stored in a variety of formats. 2a. The same is true of translational use metadata!

40 Lessons Learned 3. Structural representation of sequence data sets is a major obstacle to interoperability Care must be given to the organizational structure (as opposed to the format) of the data. This is the single largest constraint to the use of profile data in NVODS.

41 Lessons (Re)Learned 4. “Not invented here” Avoid the “not invented here” trap. The basic concepts of a data system are relatively straightforward to define. Implementing these concepts ALWAYS involves substantially more work than originally anticipated. The “Devil’s in the details”. Take advantage of existing software wherever possible.

42 Lessons (Re)Learned 5. Work with those who adopt the system for their own needs. Take advantage of those who are interested in contributing to the system because the system addresses their needs as opposed to those who are simply doing the work for the associated funding. => Open source.

43 Lessons Learned 6. There is no well defined funding structure for community based operational systems. It is much easier to obtain funding to develop a system than it is to obtain funding to maintain and evolve a system. This is a major obstacle to development of a stable cyberinfrastructure that meets the needs of the research community.

44 Lessons Learned 7. It is relatively more difficult to obtain funding for applied system development than for research related to data systems. This is another obstacle to the development of cyberinfrastructure that meets the needs of the research community.

45 Lessons (Re)Learned 8. “Tough to teach old dogs new tricks” Introducing new technology often requires a cultural change in usage that is difficult to effect. This can negatively effect system development.

46 Lesser Lessons Learned 9. Some surprises encountered in the NVODS/ OPeNDAP effort  Heavy within organization usage.  Metadata focus in the past is appropriate for interoperability at the data level.  Number of variables increases almost linearly with the number of data sets.  Users will take advantage of all of the flexibility offered by a system sometimes to the disadvantage of all.  Incredible variability in the structural organization of data.

47 Lessons Learned 10. Metrics suggest  Increasing use of scripted requests  Large volume transfers As data systems offering machine-to- machine interoperability with semantic meaning take hold, we could well see an explosive growth in the use of the web.

48 Lessons Learned 11. Time to maturity is order 10 years not 3 Developing new infrastructure takes time, both to iron out all of the %^*% little details and adoption of the infrastructure takes time.

49 Peter’s Law The more metadata required the less data delivered Of course, the less metadata, the harder it is to use the data

50 http://unidata.ucar.edu/packages/dods http://nvods.org


Download ppt "Peter Cornillon Graduate School of Oceanography University of Rhode Island Presented at the NSF Sponsored Cyberinfrastructure Meeting 31 October 2002 OPeNDAP:"

Similar presentations


Ads by Google