MuQun Yang and Hyo-Kyung Lee (THG) James Gallagher (OPeNDAP, Inc.) HDF5 OPeNDAP Project MuQun Yang and Hyo-Kyung Lee (THG) James Gallagher (OPeNDAP, Inc.) Today I’ll present the HDF5 OPeNDAP project. This is a joint project between our group and James Gallagher at OPeNDAP, Inc. I’ll present an introduction to OPeNDAP and then present how HDF5 is served in OPeNDAP.
Question of the Day How can I get One Sub from SUBWAY? Chicken teriyaki with no onion How can I get One Subset of EOS data image from NASA? Today’s satellite image right above the roof The problem that we’re going to address in this project is similar to this: “How can I get one sandwich from Subway?” For example, foot-long chicken teriyaki sandwich with no onion. I believe most of you think this problem is very trivial. But, how about this question? “How can I get one subset of EOS data image from NASA?” For example, today’s satellite image right above the roof of this building.
Subway Easy: anywhere / accepts Visa|MC Efficient: your own recipe How can I get One Sub from SUBWAY? Easy: anywhere / accepts Visa|MC Efficient: your own recipe Cool: eat fresh and look good For the first question, everyone thinks it’s easy for the following reasons: Subway is everywhere. Some Subway restaurants are even located inside gas stations and Walmarts. Also, at Subway, you can make your own sub efficiently through selection. You can pick the size of bread and what to include or exclude in your bread. As a bonus, Subway provides a way to eat fresh vegetables and make you look good.
Dumb-way Difficult: bring a USB drive and visit NASA How can I get One Subset of EOS data image from NASA? Difficult: bring a USB drive and visit NASA Inefficient: download 10 Gbyte and search for 10 byte dataset Not cool: wrong or useless dataset How about the answers to the second question? Here are some really dumb ways of getting the subset of data from NASA. First, you bring a USB drive and visit NASA near you. This is extremely difficult way of getting data unless you have a complete access to NASA and it’s conveniently located nearby. Here’s a better way than the first but it is very inefficient. For example, you can download one 10G HDF5 file via FTP and search for the 10 byte that you cares most. Downloading the entire 10G hdf5 file for a 10 byte of data you need is like going to the Niagara Falls to get a cup of water. In both cases, the worst scenario is that you may end up finding that the real data is either empty or corrupted due to satellite communication failure. Is there a better way to get what you want?
DAP-way! Easy: anywhere / accepts IE|FF Efficient: WYSIWYG – less fat! How can I get One Subset of EOS data image from NASA? Easy: anywhere / accepts IE|FF Efficient: WYSIWYG – less fat! Cool: Visualization clients – see fresh! Yes, there is one and it’s called DAP-way. The DAP is a short name for Data Access Protocol. In DAP-way, getting a data is like ordering a pizza online. Also, it’s quite efficient --- what you want is what you get. You can browse the table of content before you pick the data. Since you retrieve the small dataset you want, it’s possible to show it with nice visualization clients as a bonus. Imagine that you cannot wrap the entire food stored in a local subway restaurant store with the tiny sandwich wrapper. If the data you retrieve is so small, visualization client wrap the data nicely. Also, you can see your data instantly and recognize that whether it’s useful or not by looking at it.
DAP-way Implementation OPeNDAP: Free DAP-way Software Server-Client Model APIs Widely used by Earth Science Community Now let’s see how the idea of DAP-way can be implemented. DAP is a mere protocol like FTP and OPeNDAP is an open and free software implementation of DAP-way. It is a server-client model and provides APIs for both. And the biggest advantage of OPeNDAP is that it is very popular among Earth Science Community.
OPeNDAP server sites OPeNDAP Market Dominance: http://www.opendap.org This is a map of OPeNDAP server sites that provide scientific data in DAP-way. Although it’s not as many as subway restaurant chains, we can see plenty of sites from east coast to Hawaii. Courtesy of OPeNDAP, Inc.
OPeNDAP Servers Tables SQL Flat Binary General Data Matlab HDF4 JDBC FreeFrom FITS CDF CEDAR netCDF DSP JGOFS Tables SQL Flat Binary CODAR ESML General Here is a list of scientific data file formats that can be served in DAP-way. Courtesy of OPeNDAP, Inc.
OPeNDAP Clients netCDF C netCDF Java Client IDV Ferret GrADS VisAD ncBrowse Matlab Excel IDL Access Client And here is list of visualization clients that can be used with OPeNDAP. Courtesy of OPeNDAP, Inc.
How OPeNDAP works NetCDF OPeNDAP NetCDF Server HDF4 OPeNDAP Client HDF4 Server Here’s how OPeNDAP works. First thing to note is that OPeNDAP server works for different scientific data storage file formats like NetCDF and HDF4. What users can really see is only the DAP-way representation of data through OPeNDAP clients. They don’t have to know how NetCDF or HDF4 store the data internally.
Architecture of OPeNDAP Server Server 3 (CGI) C/C++ based HTTP Server Insecure / Inflexible Server 4 (Hyrax) Java based TomCat Servlet Engine Secure / Flexible Let’s take a look at the inside of OPeNDAP server since this will help us understand how OPeNDAP works. There are two types of servers. Server 3 is old and Server 4 is new one. Server 3 is similar to the traditional C/C++ based web server. Server 3 will be no longer supported since it is insecure and inflexible.
Server 4 (Hyrax) Architecture Server 3 (CGI) Architecture Server 4 (Hyrax) Architecture HTTP Server OLFS Java Servlet Engine BES Commands BES Unix Daemon DAP XML, GIS, KML HTML XML- encapsulated object Let’s highlight the evolution of OPeNDA server. In the old server, a single server does everything --- reading scientific data and deliver the result in HTML or DAP. In a new server, the role of single server is split into two: front-end and back-end. OLFS is OPeNDAP lightweight front-end servlet. BES is back-end server. By splitting the one server into two, it becomes more secure and boots its performance. Also, by separating the presentation layer from data retrieval, you can format the final output of response more easily through catalog. It can help produce output that Google maps or other GIS can understand. OPeNDAP’s goal is to support as many clients as possible through this catalog method. (Thematic Realtime Environmental Distributed Data Services) Data Store Optional catalogs: XML, GIS, KML File system with data files like HDF, NetCDF and SQL Database, … http://docs.opendap.org/index.php/Hyrax Courtesy of OPeNDAP, Inc.
Example Usage HDF4 OPeNDAP Client OPeNDAP HDF4 Server Actual Content of Data Here’s an example usage of OPeNDAP. First, here is a HDF4 file that you want to allow people to see. Once the OPeNDAP HDF4 server is installed and running, this file becomes available world-wide through Data Access Protocol and you can view its content with a standard web browser. Data Access Protocol provides three ways of viewing data. First you can view the syntactic structure of data. Second you can view the semantic meaning of data. Finally, you can view the real content of data you want to retrieve. Semantic Meaning of Data Syntactic Structure of Data OPeNDAP Visualization Client
What about HDF5? NetCDF OPeNDAP NetCDF Server HDF4 OPeNDAP Client HDF4 Server So far we’ve seen how data access becomes easy through OPeNDAP. So, what about HDF5? Let there be OPeNDAP HDF5 server and it’s what we’ve been working on since 2007. HDF5 OPeNDAP HDF5 Server
Why Important? HDF5: NASA EOS / NPOESS NASA’s MISSION: ACCESS! Our MISSION: Build DAP-way Bridge Why is HDF5 part so important? The main reason is that new satellite data are being stored and will be stored in HDF5 in the future. And one of the most important mission of NASA is to provide a full access to its huge data center. So our mission is to bridge the gap between data provider and data user through OPeNDAP.
Goals 1st: Transform HDF5 access in DAP-way 2nd: Yet preserve the beauty of HDF5 So, here are our project goals. First, we want to provide way to access HDF5 in remote location like NASA in DAP-way. Yet, we want to lose as little as possible during this transformation process.
OPeNDAP HDF5 Server Map HDF5 Datatypes in DAP-way Goal: Transform HDF5 access in DAP-way Map HDF5 Datatypes in DAP-way Plus, tweaks for OPeNDAP Clients Plus, tweaks for HDF-EOS Files And, we built the HDF5 server to provide a standard DAP-way of accessing HDF5. Plus, we did some additional work for some OPeNDAP clients and HDF-EOS files. In the next few slides, we’ll see why such tweaks are necessary.
Mapping HDF5 in DAP-way Prototype server in 2001 NASA Grant from Nov. 2006 Support for Compound Datatype Support for Group Support for References / Links First Product Release in Mar. 2008 And, we built the HDF5 server to provide a standard DAP-way of accessing HDF5. Plus, we did some additional work for some OPeNDAP clients and HDF-EOS files. In the next few slides, we’ll see why such tweaks are necessary.
OPeNDAP HDF5 Server Map HDF5 Datatypes in DAP-way Goal: Transform HDF5 access in DAP-way Map HDF5 Datatypes in DAP-way Plus, tweaks for OPeNDAP Clients Plus, tweaks for HDF-EOS Files And, we built the HDF5 server to provide a standard DAP-way of accessing HDF5. Plus, we did some additional work for some OPeNDAP clients and HDF-EOS files. In the next few slides, we’ll see why such tweaks are necessary.
Subway Customers I want 3 inch-long sub I need Wasabi sauce I eat only wheat bread I eat only meat balls In business, customers are king and knowing customers is a key to success. Back to subway example, imagine a situation that a customer that says: What are you going to do? There’s no choice but to cut the bread and bring the special sauce the customer wants.
OPeNDAP Clients I hate foot-long variable names. Not all OPeNDAP clients are created equal! I hate foot-long variable names. I need special attributes on dataset. I care pre-defined Grid data type. I care only well-formed attributes. The same pattern happens in DAP-way clients. Some clients are very picky in terms of what DAP-server can provide. They may even ask something that HDF5 doesn’t have in the attribute like A-1 sauce. They simply reject what the standard DAP protocol allows.
Tweaks for OPeNDAP Clients Two configuration options --enable-short-name = cut bread --enable-CF = put Wasabi sauce Thus, we provided some configuration options during installation. This can make most picky clients happy. However, enabling these options are risky due to you may not serve some dataset in HDF5 through server.
OPeNDAP HDF5 Server Map HDF5 Datatypes in DAP-way Goal: Transform HDF5 access in DAP-way Map HDF5 Datatypes in DAP-way Plus, tweaks for OPeNDAP Clients Plus, tweaks for HDF-EOS5 Files And, we built the HDF5 server to provide a standard DAP-way of accessing HDF5. Plus, we did some additional work for some OPeNDAP clients and HDF-EOS files. In the next few slides, we’ll see why such tweaks are necessary.
Subway Customers I want 3 inch-long I need Wasabi sauce I eat only wheat bread I eat only meat balls So we could handle some strange situation that clients mandate by providing options. Going back to the Subway customer scenario, bottom two requests from a customer do not seem to have any problems.
Subway Suppliers Problem Subway Customers - OK I eat only wheat bread I eat only meat balls Subway Suppliers Problem However, what if Subway suppliers have problems in their factories and can’t provide what Subway want? We’re in big trouble. No breads, only wheat! No meat balls, only meat!
HDF EOS Grid with No Dimension data Not all HDF5 files are created for DAP-way! Grid with No Dimension data Clients expect Grid with Dimension data Metadata Attribute in Two Strings Clients expect One Structured Format This kind of situation happened with NASA HDF-EOS case. The dataset in HDF EOS files that NASA produce cannot be served directly in DAP-way. Since HDF5 data producer like NASA doesn’t have to keep DAP-way in mind, they can create HDF5 files in any way they want.
Tweaks for HDF-EOS Two more configuration options --enable-eos-grid = bake bread --enable-eos-meta = make balls Thus, we believe it’s our job to make the raw HDF5 data into a new form that clients will like. --enable-eos-grid processes the raw data into Grid that can client can consume easily. --enable-eos-meta chops the long string into a better format that can client can handle.
OPeNDAP HDF5 Server Map HDF5 Datatypes in DAP-way Goal: Transform HDF5 access in DAP-way Map HDF5 Datatypes in DAP-way Plus, tweaks for OPeNDAP Clients Plus, tweaks for HDF-EOS5 Files And, we built the HDF5 server to provide a standard DAP-way of accessing HDF5. Plus, we did some additional work for some OPeNDAP clients and HDF-EOS files. In the next few slides, we’ll see why such tweaks are necessary.
Day After Server Tweaks Finally, Happy Clients! ncBrowse Ferret MATLAB NCL IDV So after applying the tweaks for clients and hdf-eos, our server can serve many clients. Here’s a live demo and movie. GrADS ODC
Live Demo Ozone concentration level over the south pole I’m going to do live demo with one visualization client. This movie is generated from the real NASA data.
Goals 1st: Transform HDF5 access in DAP-way 2nd: Yet Preserve the beauty of HDF5 So we could achieve the first goal. Achieving the first goal has some side effects and give another challenge of preservation.
(Yet Preserve the beauty of HDF5) HDF5 Served in DAP-way (Yet Preserve the beauty of HDF5) There are some things money can’t buy. For everything else, there is MasterCardTM. Minimize There are some things DAP can’t serve. For everything else, there is _________? If you ever watched the “priceless” MasterCard commercial, you’ve heard this famous phrase: “There are …. “. We have found a similar situation when we tried to serve HDF5 in DAP-way. If HDF5 is served in DAP-way, it was inevitable to lose some details that the original HDF5 has simply because DAP doesn’t know how to represent. However, we think it would be great if we can minimize such things and find a clever solution like MasterCard that can cover and deliver maximum features in HDF5 through DAP-way. Maximize
Some Things (that OPeNDAP HDF5 server can’t serve) Hard: Opaque, Bitmap, Enum, 64 bit Integer, Variable Length types Illegal: Reserved characters in DAP are used in Dataset/Group name in HDF5 Here are some limitations that the current server can’t handle. Certain types like Opaque, Bitmap, Enum are not available in the current DAP protocol so it’s impossible to map them. Another interesting case is that there are some reserved characters in DAP yet they are perfectly legal in HDF5. In these cases, DAP protocol itself can be changed to have new additional types or use special hex encoding of characters.
For Everything Else Let there be HDF5-friendly OPeNDAP client library! Package and Deliver in DAP-way first. Then, let the client library handle it. Concept of Trojan Horse However, it’s hard to modify the protocol itself since it’s standard so this is our current solution for everything else in HDF5. We’d like to make a special HDF5-friendly OPeNDAP client dedicated for our OPeNDAP HDF5 server. The concept is quite simple. We simply package and deliver them in DAP-way and let the client handle it.
Traditional OPeNDAP client library: It’s an attribute that I don’t understand. I’ll ignore it. Example: Group in HDF5 HDF5-friendly OPeNDAP client library: I was waiting for this key attribute to re-construct HDF5 Here’s really one good example. In DAP, an attribute plays a role like comment in computer programming. Although DAP doesn’t have any concept of group, the group may play an important role in HDF5 like disambiguation of same variable name. This entire group structure is being sent as a single attribute in DAP. Thus, if a user wants to re-construct HDF5 using DAP client, this group attribute is essential and HDF5-friendly OPeNDAP client library should handle it properly.
Example: Reference in HDF5 Object / Regional Reference Map to DAP URL at server No de-referencing of URL at client library Important for NPOESS Another good example is the reference in HDF5. There are two types of reference in HDF5. Both object and regional reference can be mapped to a special data type called URL in DAP. However, the current OPeNDAP client library doesn’t supprot the de-referencing of the URL. That is, there is no way for a OPeNDAP client can access the dataset that the URL points to. This is particularly important for NPOESS data since it uses tons of HDF5 reference inside.
HDF5-Friendly OPeNDAP Client Library No Latitude and Longitude One more reason: Tame Clients Only ODC can display Swath properly. Easy but Evil OPeNDAP NC Client Library! easy: nc style API evil: fixed dimensional attributes Our client library must be Easy but Good There’s another reason that we want to pursue HDF5-friendly OPeNDAP client library. A swath is a 3-D scan of very small region on the earth. Earth Scientists are not much interested in seeing 2-D Grid data but a vertical profile of a swath like this. However, among the six OPeNDAP clients that we tried, only one client could display a vertical profile of swath data properly. The main reason is that many DAP clients do not use generic OPeNDAP client library. They use a well-established NetCDF style API of OPeNDAP client library which impose extra unnecessary restrictions in DA Protocol. Visualization tool developers used it because it’s quick and easy to adapt into their existing software. Thus, our goal is to build a client library that is similar to NetCDF API yet doesn’t put any restrictions. Courtesy of NASA
NC-Friendly OPeNDAP Client Library HDF5-Friendly OPeNDAP Client Library View G/R/Swath ??? View NetCDF Group/Ref/Swath NetCDF View HDF5 HDF5 GrADS GrADS OPeNDAP HDF5 Server OPeNDAP NC Server liboc-dap libnc-dap dapserver However, it’s hard to modify the protocol itself since it’s standard so this is our current solution for everything else. We’d like to make a special DAP client dedicated for OPeNDAP HDF5 server. The concept is quite simple. We simply package and deliver them in DAP-way and let the client handle it. We want to provide a more generic OPeNDAP client library. libdap
Summary DAP-way access of HDF5 Is Easy / Efficient / Cool Loses some things Requires HDF5-friendly OPeNDAP Client Library HDF5-friendly OPeNDAP Client Library Serves HDF5 better Tames evil visualization clients Here’s a summary of our project. Providing a DAP-way access of HDF5 is easy, efficient and cool. However, it can lose some information that HDF5 originally has. This requires either the modification of DAP or the creation of HDF5-friendly client. Since it’s hard to modify the well-established DAP, we think it’s easier to implement HDF5-friendly OPeNDAP client library. When it’s done, we believe it will serve HDF5 better and tame the visualization clients that abuse the easiness of NetCDF-friendly OPeNDAP client library.
Future Work HDF5 DAP Mapping Document Finish HDF5-friendly OPeNDAP Client Library prototype Test it on GrADS and display Swath Here’s our future work. First, we’ll give a detailed document on mapping between HDF5 and DAP. Second, we’ll finish HDF5 client library prototype and test it with one client. This will be only prototype due to the limited funding availability. We’ll replace part of GrADS source code that refers libdap-nc so that it can display a swath like ODC did. According to NASA people, GrADS is most popular. Also, it’s open source software.
Project Website http://hdfdap.hdfgroup.uiuc.edu/joomla Beta preview Feedbacks are more than welcome! This is our new project website that everyone is welcome to visit and give some feedback on us as a sneak previewer.
Credits Mike Folk (THG) Robert McGrath (NCSA) Peter Leonard, Daniel Kahn, Marghi Hopkins (ADNET) Christopher Lynnes, James Johnson, Denis Nadeau (NASA) Jennifer Adams (GrADS) Dave Brown (UCAR) We’d like to thank these people. They have provided us the right direction in development, an early access to data files and numerous feedback.