Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA.

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

DataTools Models Data, models and tools: Dealing with any complex hydraulic engineering problem invariable use is made of: data, models and tools.
/2829 November 2007 WDF-Presentation V Common Wind Tunnel Data Format.
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
The NCAR Command Language (NCL) and the NetCDF Data Format Research Tools Presentation Matthew Janiga 10/30/2012.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.
McIDAS-V McIDAS-V The 5 th Generation of McIDAS by Tom Whittaker Space Science and Engineering Center University of Wisconsin-Madison USA with contributions.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Collaboration Tools and Techniques for ROMS Rich Signell,USGS Woods Hole, MA.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Session 2: Using OPeNDAP-enabled Applications to Access Australian Data Services and Repositories eResearch Australasia 2011, ½ Day Morning Workshop, Thursday.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
ElVis Developments for Simulation and Analysis Programs Tarun Pondicherry Summer 2006 Science Ed High School Intern Eliot Feibush, Mentor 8/16/2006.
© Crown copyright Met Office Introduction to IDV PRECIS Reading Workshop, August 2009.
Netcdf course Intro CF convention Netcdf excercises.
Implementation of Model Data Interoperability for IOOS: Successes and Lessons Learned Rich Signell USGS Woods Hole, MA / NOAA Silver Spring USA Model Data.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
U.S. Department of the Interior U.S. Geological Survey Management of Oceanographic time-series data at the Woods Hole Coastal and Marine Science Center.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Enhancements to a Community Toolset for Ocean Model Data Interoperability: Unstructured grids, NCTOOLBOX, and Distributed Search Rich Signell (USGS), Woods.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
NcBrowse A Graphical netCDF/OPeNDAP Browser Donald Denbo 1 & John Osborne 2 1 UW/JISAO-NOAA/PMEL, 2 OceanAtlas Software
Unidata and Oceanography Through the Ages Rich Signell USGS Coastal and Marine Science Center Woods Hole, MA & NOAA Integrated Ocean Observing System (IOOS)
Deutscher Wetterdienst
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
IOOS Modeling Testbed Cyberinfrastructure Rich Signell, USGS, Woods Hole, MA IOOS-RA-Briefing, Feb 14, 2012.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
Unidata TDS Workshop THREDDS Data Server Overview
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies Richard Chinman, UCAR-IITA, DODS Project Manager
Accessing Remote Datasets using the DAP protocol through the netCDF interface. Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
Advanced Utilities Extending ncgen to support the netCDF-4 Data Model Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
David R. Maidment Unidata Program Center, Boulder CO 6 Feb 2004
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Lab Activity 5: Analyze Data from Gulf of Mexico Model Run Data from Xu et al. (2011) model (or a very similar model run) is stored on plumeri: /data/users-tmp/ckharris/MCH2mod/RESULTS/XU_etal_2011_Paper.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
OPeNDAP Hyrax Harnessing the power of the BES OPeNDAP Hyrax Back-End Server Patrick West
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
International Collaboration between DB stations for EOS Data Networking Steve Dutcher, Paolo Antonelli, Gieuseppe Meoli, Tom Rink, Liam Gumley, Paul Menzel,
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
GrADS-DODS Server An open-source tool for distributed data access and analysis Joe Wielgosz, Brian Doty, Jennifer Adams COLA/IGES - Calverton, MD
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Rich Signell Roland Viger Curtis Price USGS Community for Data Integration Feb 15, 2012.
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Update on Unidata Technologies for Data Access Russ Rew
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
NcBrowse: OPeNDAP Server Access and 3-D Graphics Presented by Nancy N. Soreide NOAA/PMEL Donald W. Denbo UW/JISAO-NOAA/PMEL.
Data Are from Mars, Tools Are from Venus
PRECIS Reading Workshop, August 2009
Platform as a Service.
PHP / MySQL Introduction
Tom Rink Tom Whittaker Paolo Antonelli Kevin Baggett.
PRECIS Reading Workshop, August 2009
McIDAS-V: Why it’s Based on VisAD and IDV
Accessing Remote Datasets through the netCDF interface.
HDF-EOS Workshop XXI / The 2018 ESIP Summer Meeting
NCL variable based on a netCDF variable model
Presentation transcript:

Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Motivation Typical model outputs are 100 Mb up to several GB. Typical model outputs are 100 Mb up to several GB. Traditional collaboration method: users grab the whole NetCDF file from your web/ftp site, or you them a few images. Traditional collaboration method: users grab the whole NetCDF file from your web/ftp site, or you them a few images. There is a better way… There is a better way…

NetCDF Machine independent, self-describing, binary format for multidimensional scientific data Machine independent, self-describing, binary format for multidimensional scientific data Interfaces: Fortran, C, C++, Java, Perl, Matlab, IDL, Python Interfaces: Fortran, C, C++, Java, Perl, Matlab, IDL, Python Free, supported by NSF at Unidata Free, supported by NSF at Unidata

netcdf swan_short { dimensions: y = 376 ; x = 136 ; time = UNLIMITED ; // (82 currently) variables: float depth(y, x) ; depth:units = "m" ; depth:long_name = "water depth" ; depth:_FillValue = f ; depth:coordinates = "lon lat" ; short hsig(time, y, x) ; hsig:units = "m" ; hsig:long_name = "significant wave height" ; hsig:_FillValue = 32767s ; hsig:add_offset = 14.5f ; hsig:scale_factor = f ; hsig:coordinates = "lon lat" ; double time(time) ; time:units = "days since " ; time:long_name = "modified julian day (ROMS-style)" ; float lon(y, x) ; lon:units = "degrees_east" ; lon:long_name = "longitude" ; float lat(y, x) ; lat:units = "degrees_north" ; lat:long_name = "latitude" ; // global attributes: :Conventions = "CF-1.0" ; :title = "SWAN driven by 7 km LAMI met model" ; :institution = "SACLANT Undersea Research Centre" ; :source = "SWAN Wave Model (NRL-SSC OpenMP version 31-Mar-2003)"; :contact = "Rich Signell }

PROGRAM WRITE_NC c INCLUDE 'netcdf.inc' PARAMETER (TIMES=3, LATS=5, LONS=10) ! dimension lengths INTEGER STATUS, NCID, TIMES INTEGER RHID ! variable ID DOUBLE RHVALS(LONS, LATS, TIMES)... NF_OPEN ('foo.nc', NF_WRITE, NCID) NF_INQ_VARID (NCID, 'rh', RHID) DO 10 ILON = 1, LONS DO 10 ILAT = 1, LATS DO 10 ITIME = 1, TIMES RHVALS(ILON, ILAT, ITIME) = CONTINUE NF_PUT_VAR_DOUBLE (NCID, RHID, RHVALS)

DODS/OpenDAP Open Data Access Protocol for delivery of multidimensional scientific data via http Open Data Access Protocol for delivery of multidimensional scientific data via http DODS allows efficient slicing from data via the web, just as NetCDF works for local files. (Putting the “Net” in NetCDF!) DODS allows efficient slicing from data via the web, just as NetCDF works for local files. (Putting the “Net” in NetCDF!) DODS serves not just NetCDF, but also Matlab, HDF (also GRIB, BUFR, etc…) DODS serves not just NetCDF, but also Matlab, HDF (also GRIB, BUFR, etc…)

Accessing DODS data DODS APIs (C++, Java) DODS APIs (C++, Java) Any NetCDF code, relinked instead with DODS netCDF library Any NetCDF code, relinked instead with DODS netCDF library ncdump => dncdump ncdump => dncdump ncview => dncview ncview => dncview Your Fortran, C, C++, Python, Perl, Java code… Your Fortran, C, C++, Python, Perl, Java code…

DODS & Matlab DODS GUI and command line tools DODS GUI and command line tools Relinked mexcdf53.dll, which can enable all Matlab tools that read NetCDF! Relinked mexcdf53.dll, which can enable all Matlab tools that read NetCDF! (e.g.) NetCDF/Matlab toolbox (e.g.) NetCDF/Matlab toolbox >> url=‘ >> url=‘ >> nc=netcdf(url); >> nc=netcdf(url); >> lon=nc{‘lon’}(:); >> lon=nc{‘lon’}(:); Google on: “sourceforge” “mexcdf” Google on: “sourceforge” “mexcdf”

DODS/OpenDAP Serving DODS data requires almost no effort on the part of the data provider: Serving DODS data requires almost no effort on the part of the data provider: 1. Download DODS server binaries to the cgi-bin directory on the web server 2. Put your NetCDF files on the web server 3. Go have a coffee to celebrate ! (Note: most people don’t know that getting a DODS server going is this easy!) (Note: most people don’t know that getting a DODS server going is this easy!)

DODS Success Story DODS at sea: in limited bandwidth situation, grabbed only 200 k OBC region instead of 18 Mb NetCDF file. DODS at sea: in limited bandwidth situation, grabbed only 200 k OBC region instead of 18 Mb NetCDF file. 30 second download instead of 45 minutes! 30 second download instead of 45 minutes!

Need for Conventions One of the greatest things about NetCDF is that it places few demands on the data provider - they are free to specify whatever attributes they want, or none at all One of the greatest things about NetCDF is that it places few demands on the data provider - they are free to specify whatever attributes they want, or none at all This is also one of the worst things, making it hard to develop flexible software This is also one of the worst things, making it hard to develop flexible software Software for ROMS won’t work for POM, NCOM, HOPS, ECOM, etc (and vice versa) Software for ROMS won’t work for POM, NCOM, HOPS, ECOM, etc (and vice versa)

CF Conventions I Google: “CF” “ucar”

CF Conventions II

Making ROMS CF-compliant Store all information about the grid (lon_u, lat_u, angle) in the.his and.avg files (not just the grid file) Store all information about the grid (lon_u, lat_u, angle) in the.his and.avg files (not just the grid file) Add “coordinates” attributes to curvilinear variables (e.g. zeta:coordinates=“lat_rho lon_rho) Add “coordinates” attributes to curvilinear variables (e.g. zeta:coordinates=“lat_rho lon_rho) Add “standard_name=ocean_s_coordinate” Add “standard_name=ocean_s_coordinate” Make sure dimension names match coordinate variable names (ocean_time, sc_r) Make sure dimension names match coordinate variable names (ocean_time, sc_r) Units need to be recognized by UDUNITS Units need to be recognized by UDUNITS

NCO I

NCO II

ROMS2CF script #!/bin/bashGFILE='../adria02_grid2.nc'FFILE='adria03_avg.nc' ncks -F -d ocean_time,1 $FFILE ${FFILE}_CF # Specify horizontal coordinate variables associated with "RHO fields" ncatted -O -h -a "coordinates","temp",c,c,"lat_rho lon_rho" ${FFILE}_CF ncatted -O -h -a "coordinates","salt",c,c,"lat_rho lon_rho" ${FFILE}_CF # Specify horizontal coordinate variables associated with "U fields" ncatted -O -h -a "coordinates","u",c,c,"lat_u lon_u" ${FFILE}_CF ncatted -O -h -a "coordinates","ubar",c,c,"lat_u lon_u" ${FFILE}_CF # Merge the ROMS grid file into the CF file so we # have all the coordinate variables we need ncks -O -v lon_rho,lat_rho,lon_u,lat_u,lon_v,lat_v,mask_rho,mask_u,mask_v,angle $GFILE $GFILE.tmp ncks -A $GFILE.tmp ${FFILE}_CF rm $GFILE.tmp # Add vertical coordinate info ncatted -O -h -a "standard_name","sc_r",c,c,"ocean_s_coordinate" ${FFILE}_CF ncatted -O -h -a "positive","sc_r",c,c,"up" ${FFILE}_CF ncatted -O -h -a "formula_terms","sc_r",c,c,"s: sc_r eta: zeta depth: h a: theta_s b: theta_b depth_c: hc" ${FFILE}_CF # Add data from field file to template ncks -A $FFILE ${FFILE}_CF # rename the dimension ncrename -O -h -d s_rho,sc_r ${FFILE}_CF CF checker: cgi-bin/cf-checker.pl cgi-bin/cf-checker.pl Google: “CF” “checker”

Integrated Data Viewer (IDV) Works on local CF-compliant NetCDF files Works on local CF-compliant NetCDF files Works on THREDDS catalog data Works on THREDDS catalog data

Integrated Data Viewer (IDV) Works on local CF-compliant NetCDF files Works on local CF-compliant NetCDF files Works on THREDDS catalog data Works on THREDDS catalog data

IDV Freeware supported by the Unidata Program Center (new app, version 1.2) Freeware supported by the Unidata Program Center (new app, version 1.2) Java, utilizing Java3D and VisAD (VIS5D) Java, utilizing Java3D and VisAD (VIS5D) Runs on Windows, Mac, Solaris (VIS5D is limitation) Runs on Windows, Mac, Solaris (VIS5D is limitation) Reads NetCDF, DODS, ADDE, GeoTiff, Arc Shapefiles Reads NetCDF, DODS, ADDE, GeoTiff, Arc Shapefiles Slices, dices, animates Slices, dices, animates

IDV in Action

THREDDS

Recommendations Make your model output CF-compliant! Make your model output CF-compliant! Distribute your model output via DODS Distribute your model output via DODS Make a THREDDS catalog for DODS data Make a THREDDS catalog for DODS data Allow “packing” of data for efficient internet delivery (and disk utilization) Allow “packing” of data for efficient internet delivery (and disk utilization) Develop software for CF-compliant data Develop software for CF-compliant data

Abstract Collaboration Tools and Techniques for Large Model Data Sets Rich Signell U.S. Geological Survey Woods Hole, MA USA Collaboration Tools and Techniques for Large Model Data Sets Rich Signell U.S. Geological Survey Woods Hole, MA USA New tools and standards are emerging that facilitate web-based collaboration with large data sets such as those produced by the ocean model ROMS. Using OpenDAP (a.k.a. DODS), ROMS NetCDF output files can be placed on a web server and users can extract just the data they need (say, the surface temperature from a particular day) from the file without any extra effort by the modeller. This, for example, allows a collaborator to issue a simple command in Matlab that will load just the model output desired from the remote web site into a local Matlab session, avoiding file format conversion and wasting network bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as well. New tools and standards are emerging that facilitate web-based collaboration with large data sets such as those produced by the ocean model ROMS. Using OpenDAP (a.k.a. DODS), ROMS NetCDF output files can be placed on a web server and users can extract just the data they need (say, the surface temperature from a particular day) from the file without any extra effort by the modeller. This, for example, allows a collaborator to issue a simple command in Matlab that will load just the model output desired from the remote web site into a local Matlab session, avoiding file format conversion and wasting network bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as well. ROMSOpenDAPNetCDFSourceForge MexCDF site ROMSOpenDAPNetCDFSourceForge MexCDF site If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions specifically designed for complex model output (including handling of the ROMS s-coordinate), then public domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can visualize and manipulate the data interactively through the collaboration facility built into IDV. The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed at which Internet users can obtain their requested data. If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward XML file) of the CF compliant ROMS output files, then the model results appear as just another data source to an IDV user. This allows users to browse and create visualization using model results without knowing that they are using NetCDF. If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions specifically designed for complex model output (including handling of the ROMS s-coordinate), then public domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can visualize and manipulate the data interactively through the collaboration facility built into IDV. The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed at which Internet users can obtain their requested data. If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward XML file) of the CF compliant ROMS output files, then the model results appear as just another data source to an IDV user. This allows users to browse and create visualization using model results without knowing that they are using NetCDF.CF ConventionsIDVNCOTHREDDSXMLCF ConventionsIDVNCOTHREDDSXML