Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA
Motivation Typical model outputs are 100 Mb up to several GB. Typical model outputs are 100 Mb up to several GB. Traditional collaboration method: users grab the whole NetCDF file from your web/ftp site, or you them a few images. Traditional collaboration method: users grab the whole NetCDF file from your web/ftp site, or you them a few images. There is a better way… There is a better way…
NetCDF Machine independent, self-describing, binary format for multidimensional scientific data Machine independent, self-describing, binary format for multidimensional scientific data Interfaces: Fortran, C, C++, Java, Perl, Matlab, IDL, Python Interfaces: Fortran, C, C++, Java, Perl, Matlab, IDL, Python Free, supported by NSF at Unidata Free, supported by NSF at Unidata
netcdf swan_short { dimensions: y = 376 ; x = 136 ; time = UNLIMITED ; // (82 currently) variables: float depth(y, x) ; depth:units = "m" ; depth:long_name = "water depth" ; depth:_FillValue = f ; depth:coordinates = "lon lat" ; short hsig(time, y, x) ; hsig:units = "m" ; hsig:long_name = "significant wave height" ; hsig:_FillValue = 32767s ; hsig:add_offset = 14.5f ; hsig:scale_factor = f ; hsig:coordinates = "lon lat" ; double time(time) ; time:units = "days since " ; time:long_name = "modified julian day (ROMS-style)" ; float lon(y, x) ; lon:units = "degrees_east" ; lon:long_name = "longitude" ; float lat(y, x) ; lat:units = "degrees_north" ; lat:long_name = "latitude" ; // global attributes: :Conventions = "CF-1.0" ; :title = "SWAN driven by 7 km LAMI met model" ; :institution = "SACLANT Undersea Research Centre" ; :source = "SWAN Wave Model (NRL-SSC OpenMP version 31-Mar-2003)"; :contact = "Rich Signell }
PROGRAM WRITE_NC c INCLUDE 'netcdf.inc' PARAMETER (TIMES=3, LATS=5, LONS=10) ! dimension lengths INTEGER STATUS, NCID, TIMES INTEGER RHID ! variable ID DOUBLE RHVALS(LONS, LATS, TIMES)... NF_OPEN ('foo.nc', NF_WRITE, NCID) NF_INQ_VARID (NCID, 'rh', RHID) DO 10 ILON = 1, LONS DO 10 ILAT = 1, LATS DO 10 ITIME = 1, TIMES RHVALS(ILON, ILAT, ITIME) = CONTINUE NF_PUT_VAR_DOUBLE (NCID, RHID, RHVALS)
DODS/OpenDAP Open Data Access Protocol for delivery of multidimensional scientific data via http Open Data Access Protocol for delivery of multidimensional scientific data via http DODS allows efficient slicing from data via the web, just as NetCDF works for local files. (Putting the “Net” in NetCDF!) DODS allows efficient slicing from data via the web, just as NetCDF works for local files. (Putting the “Net” in NetCDF!) DODS serves not just NetCDF, but also Matlab, HDF (also GRIB, BUFR, etc…) DODS serves not just NetCDF, but also Matlab, HDF (also GRIB, BUFR, etc…)
Accessing DODS data DODS APIs (C++, Java) DODS APIs (C++, Java) Any NetCDF code, relinked instead with DODS netCDF library Any NetCDF code, relinked instead with DODS netCDF library ncdump => dncdump ncdump => dncdump ncview => dncview ncview => dncview Your Fortran, C, C++, Python, Perl, Java code… Your Fortran, C, C++, Python, Perl, Java code…
DODS & Matlab DODS GUI and command line tools DODS GUI and command line tools Relinked mexcdf53.dll, which can enable all Matlab tools that read NetCDF! Relinked mexcdf53.dll, which can enable all Matlab tools that read NetCDF! (e.g.) NetCDF/Matlab toolbox (e.g.) NetCDF/Matlab toolbox >> url=‘ >> url=‘ >> nc=netcdf(url); >> nc=netcdf(url); >> lon=nc{‘lon’}(:); >> lon=nc{‘lon’}(:); Google on: “sourceforge” “mexcdf” Google on: “sourceforge” “mexcdf”
DODS/OpenDAP Serving DODS data requires almost no effort on the part of the data provider: Serving DODS data requires almost no effort on the part of the data provider: 1. Download DODS server binaries to the cgi-bin directory on the web server 2. Put your NetCDF files on the web server 3. Go have a coffee to celebrate ! (Note: most people don’t know that getting a DODS server going is this easy!) (Note: most people don’t know that getting a DODS server going is this easy!)
DODS Success Story DODS at sea: in limited bandwidth situation, grabbed only 200 k OBC region instead of 18 Mb NetCDF file. DODS at sea: in limited bandwidth situation, grabbed only 200 k OBC region instead of 18 Mb NetCDF file. 30 second download instead of 45 minutes! 30 second download instead of 45 minutes!
Need for Conventions One of the greatest things about NetCDF is that it places few demands on the data provider - they are free to specify whatever attributes they want, or none at all One of the greatest things about NetCDF is that it places few demands on the data provider - they are free to specify whatever attributes they want, or none at all This is also one of the worst things, making it hard to develop flexible software This is also one of the worst things, making it hard to develop flexible software Software for ROMS won’t work for POM, NCOM, HOPS, ECOM, etc (and vice versa) Software for ROMS won’t work for POM, NCOM, HOPS, ECOM, etc (and vice versa)
CF Conventions I Google: “CF” “ucar”
CF Conventions II
Making ROMS CF-compliant Store all information about the grid (lon_u, lat_u, angle) in the.his and.avg files (not just the grid file) Store all information about the grid (lon_u, lat_u, angle) in the.his and.avg files (not just the grid file) Add “coordinates” attributes to curvilinear variables (e.g. zeta:coordinates=“lat_rho lon_rho) Add “coordinates” attributes to curvilinear variables (e.g. zeta:coordinates=“lat_rho lon_rho) Add “standard_name=ocean_s_coordinate” Add “standard_name=ocean_s_coordinate” Make sure dimension names match coordinate variable names (ocean_time, sc_r) Make sure dimension names match coordinate variable names (ocean_time, sc_r) Units need to be recognized by UDUNITS Units need to be recognized by UDUNITS
NCO I
NCO II
ROMS2CF script #!/bin/bashGFILE='../adria02_grid2.nc'FFILE='adria03_avg.nc' ncks -F -d ocean_time,1 $FFILE ${FFILE}_CF # Specify horizontal coordinate variables associated with "RHO fields" ncatted -O -h -a "coordinates","temp",c,c,"lat_rho lon_rho" ${FFILE}_CF ncatted -O -h -a "coordinates","salt",c,c,"lat_rho lon_rho" ${FFILE}_CF # Specify horizontal coordinate variables associated with "U fields" ncatted -O -h -a "coordinates","u",c,c,"lat_u lon_u" ${FFILE}_CF ncatted -O -h -a "coordinates","ubar",c,c,"lat_u lon_u" ${FFILE}_CF # Merge the ROMS grid file into the CF file so we # have all the coordinate variables we need ncks -O -v lon_rho,lat_rho,lon_u,lat_u,lon_v,lat_v,mask_rho,mask_u,mask_v,angle $GFILE $GFILE.tmp ncks -A $GFILE.tmp ${FFILE}_CF rm $GFILE.tmp # Add vertical coordinate info ncatted -O -h -a "standard_name","sc_r",c,c,"ocean_s_coordinate" ${FFILE}_CF ncatted -O -h -a "positive","sc_r",c,c,"up" ${FFILE}_CF ncatted -O -h -a "formula_terms","sc_r",c,c,"s: sc_r eta: zeta depth: h a: theta_s b: theta_b depth_c: hc" ${FFILE}_CF # Add data from field file to template ncks -A $FFILE ${FFILE}_CF # rename the dimension ncrename -O -h -d s_rho,sc_r ${FFILE}_CF CF checker: cgi-bin/cf-checker.pl cgi-bin/cf-checker.pl Google: “CF” “checker”
Integrated Data Viewer (IDV) Works on local CF-compliant NetCDF files Works on local CF-compliant NetCDF files Works on THREDDS catalog data Works on THREDDS catalog data
Integrated Data Viewer (IDV) Works on local CF-compliant NetCDF files Works on local CF-compliant NetCDF files Works on THREDDS catalog data Works on THREDDS catalog data
IDV Freeware supported by the Unidata Program Center (new app, version 1.2) Freeware supported by the Unidata Program Center (new app, version 1.2) Java, utilizing Java3D and VisAD (VIS5D) Java, utilizing Java3D and VisAD (VIS5D) Runs on Windows, Mac, Solaris (VIS5D is limitation) Runs on Windows, Mac, Solaris (VIS5D is limitation) Reads NetCDF, DODS, ADDE, GeoTiff, Arc Shapefiles Reads NetCDF, DODS, ADDE, GeoTiff, Arc Shapefiles Slices, dices, animates Slices, dices, animates
IDV in Action
THREDDS
Recommendations Make your model output CF-compliant! Make your model output CF-compliant! Distribute your model output via DODS Distribute your model output via DODS Make a THREDDS catalog for DODS data Make a THREDDS catalog for DODS data Allow “packing” of data for efficient internet delivery (and disk utilization) Allow “packing” of data for efficient internet delivery (and disk utilization) Develop software for CF-compliant data Develop software for CF-compliant data
Abstract Collaboration Tools and Techniques for Large Model Data Sets Rich Signell U.S. Geological Survey Woods Hole, MA USA Collaboration Tools and Techniques for Large Model Data Sets Rich Signell U.S. Geological Survey Woods Hole, MA USA New tools and standards are emerging that facilitate web-based collaboration with large data sets such as those produced by the ocean model ROMS. Using OpenDAP (a.k.a. DODS), ROMS NetCDF output files can be placed on a web server and users can extract just the data they need (say, the surface temperature from a particular day) from the file without any extra effort by the modeller. This, for example, allows a collaborator to issue a simple command in Matlab that will load just the model output desired from the remote web site into a local Matlab session, avoiding file format conversion and wasting network bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as well. New tools and standards are emerging that facilitate web-based collaboration with large data sets such as those produced by the ocean model ROMS. Using OpenDAP (a.k.a. DODS), ROMS NetCDF output files can be placed on a web server and users can extract just the data they need (say, the surface temperature from a particular day) from the file without any extra effort by the modeller. This, for example, allows a collaborator to issue a simple command in Matlab that will load just the model output desired from the remote web site into a local Matlab session, avoiding file format conversion and wasting network bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as well. ROMSOpenDAPNetCDFSourceForge MexCDF site ROMSOpenDAPNetCDFSourceForge MexCDF site If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions specifically designed for complex model output (including handling of the ROMS s-coordinate), then public domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can visualize and manipulate the data interactively through the collaboration facility built into IDV. The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed at which Internet users can obtain their requested data. If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward XML file) of the CF compliant ROMS output files, then the model results appear as just another data source to an IDV user. This allows users to browse and create visualization using model results without knowing that they are using NetCDF. If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions specifically designed for complex model output (including handling of the ROMS s-coordinate), then public domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can visualize and manipulate the data interactively through the collaboration facility built into IDV. The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed at which Internet users can obtain their requested data. If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward XML file) of the CF compliant ROMS output files, then the model results appear as just another data source to an IDV user. This allows users to browse and create visualization using model results without knowing that they are using NetCDF.CF ConventionsIDVNCOTHREDDSXMLCF ConventionsIDVNCOTHREDDSXML