ESMPy and OpenClimateGIS: Python Interfaces for High Performance Grid Remapping and Geospatial Dataset Manipulation Ryan O’Kuinghttons, Ben Koziol, Robert Oehmke Cecelia DeLuca, Gerhard Theurich Peggy Li, Joseph Jacob Cooperative Institute for Research in Environmental Sciences NOAA Environmental Software Infrastructure and Interoperability Project European Geosciences Union General Assembly Vienna, Austria April 22, 2016
ESMF and ESMPy The Earth System Modeling Framework (ESMF) is open source software for building modeling components, and coupling them together to form weather prediction, climate, coastal, and other applications. Provides infrastructure for time management, data communications, metadata and I/O, running models as web services, grid remapping Supports a full Fortran and limited C and Python interfaces ESMF provides a mature high performance regridding package Transforms data from one grid to another by generating and applying interpolation weights Supports structured and unstructured, global and regional, 2D and 3D grids, with many options Fully parallel and highly scalable The Python interface to ESMF (ESMPy) offers access to the regridding functionality and other related features of ESMF.
OCGIS OpenClimateGIS (OCGIS) is a standalone Python package enabling dynamic access to and manipulation of high resolution climate data Subsetting, coordinate transformations, temporal averaging, and other computations Data conversions between CSV, Shapefile, GRIDSPEC, and UGRID Data conversions between ESMPy and OCGIS bring together GIS capabilities with high performance regridding functionality to create a more unified set of Python tools for Earth system modeling One area of interest is connecting high resolution hydrological models with the high performance climate models
2D Unstructured Mesh From ESMPy Overview FIM Unstructured Grid Regional Grid High performance regridding is applied as a callable Python object NumPy array access to distributed data (parallelism for FREE) Many regridding methods including first-order conservative Data objects can be created from NetCDF files in standard metadata formats Supported grids and methods for regridding with ESMPy include: Bilinear, higher order patch [1,2], first order conservative[3], or nearest neighbor regridding Global or regional 2D or 3D logically rectangular Grids 2D or 3D unstructured Meshes composed of triangles, quadrilaterals or hexahedrons 1D streams of observational data or unconnected sets of points (LocStream)
OpenClimateGIS Overview Developed by the NESII Group in association with the NCPP Project under funding provided by the NOAA Climate Program Office. Python package designed to ease the “localization” and accessibility of high-dimensional scientific datasets Primary Features: geospatial subsetting, standardized calculation, bundling, format conversion, access to OpenDAP datasets. Additional dependencies: GDAL, Shapely, Fiona, netCDF4, osgeo
ESMPy – OCGIS Integration ESMPy and OCGIS have complementary capabilities OCGIS allows access to and manipulation of high resolution data sets ESMPy provides high performance regridding and access to distributed NumPy data There are several ways to create an integrated workflow OCGIS can preprocess data files and convert between data formats ESMPy Field object is an output format of OCGIS ESMPy can read OCGIS outputs (NetCDF) in parallel, for high performance regridding OCGIS offers serial regridding using ESMPy Parallel processing requires clever use of integrated capabilities… OCGIS is implemented and used in single processor mode ESMPy is fully parallel IF objects are created in parallel Conversion between serial and distributed objects is next..
Integrated Workflow Example ESMF command line application allows parallel regrid weight generation with output to file-based output in a single step : Preprocess files using OCGIS (subsetting)2: Read distributed ESMPy objects 4: Write parallel object to files for use by downstream applications 3: Compute and apply regridding weights Data file Object processor ID ** Green text indicates steps that can be done in serial or parallel Object processor ID
Supported Data Conventions ESMPy grid files use the following standard data file formats: Climate and Forecast (CF) grid conventions UGRID - candidate CF convention for unstructured grids [3], used to represent grids with arbitrary polygons with no gaps GRIDSPEC – accepted CF convention for logically rectangular grids [4] SCRIP – Spherical Coordinate Remapping and Interpolation Package [5] Legacy format for 2D logically rectangular or 2D unstructured grids ESMF Custom format for unstructured grids, more efficient storage than SCRIP or CF when used with ESMF codes OCGIS has a rich set of conversion routines between the following: CF grid conventions (above) Shapefile – geospatial vector data format used by GIS software [6] CSV – comma separated value
Interfaces ESMPy has objects for data (Field) and underlying distribution (Grid/Mesh): Grid - logically rectangular discretization object grid=ESMF.Grid(filename=“gridspec.nc”, filetype=ESMF.FileFormat.GRIDSPEC) grid=ESMF.Grid(max_index=numpy.array([7,8,9]),coord_sys=ESMF.CoordSys.CART) Mesh - unstructured mesh discretization object mesh = ESMF.Mesh(filename=“ugrid.nc”, filetype=ESMF.FileFormat.UGRID) Field – data object built on a grid or mesh with optional mask derived type of numpy.ndarray field = ESMF.Field(dstgrid, "dstfield”, meshloc=ESMF.MeshLoc.ELEMENT, ndbounds=[1, 365, 1]) OCGIS has a very compact interface for a wide range of capabilities: ops = ocgis.OcgOperations(dataset=rd, geom=path_ugid_shp, select_ugid=select_ugid, agg_selection=True, prefix='subset_nc', output_format='nc’, add_auxiliary_files=False)
Regridding r1to2 = Regrid(field1, field2, regrid_method=RegridMethod.CONSERVE) where: f(phi,theta) = 2 + cos(theta)**2 * cos(2*phi) Mean relative error Maximum relative error Conservation error Source grid: fv1.9x2.5_ nc - 1.9x2.5 CAM finite volume grid Destination grid: wr50a_ nc - Regional 205x275 grid = 3.19E-03 = 1.93E-02 = 7.11E-15
Conservative Regridding Conservative regridding is important in Earth system modeling to preserve the total integral of a field throughout the operation (e.g. water content) The algorithm used by ESMF computes interpolation weights between cell i on the source grid and j on the destination grid using: where f ij is the fraction of the source cell contributing to the destination cell and A i and A j are the relative areas of the source and destination cells. Options exist for: Using internally computed (default) or user supplied areas Computing areas and distances using great-circle (default) or straight line distances on the surface of the sphere
Enabling Hydrological Studies Hydrological impact studies can be improved when forced with data from climate models; hydrological feedbacks can affect climate A technology and scale gap exists: Many hydrological models have limited scalability, run on desktop computers, and have watershed-sized domains Many climate models are highly parallel, run on high performance supercomputers and have global domains However, scales are slowly converging (e.g. high resolution climate models, hydrological systems of greater extent) Provides scientists opportunities to explore new coupled model configurations and modes of coupling Provides programmers opportunities to develop tools to handle this coupling interface
High Resolution Data Task: Subset high resolution climate precipitation data to local scale and then regrid to catchment basins Source data: CF formatted precipitation data file for the continental United States on a logically rectangular grid (nldas_met_update.obs.daily.pr.1990.nc) Output: Multi-dimensional precip values (including time) on a subset of catchment basins in region of interest after conservative regridding
High Performance Results Conservative regridding result with CONUS NHDPlus catchments using exact solution: Test done on IBM iDataPlex (yellowstone) with 128 and 256 cores Source grid has 2,647,454 elements with up to nodes Weight file generation takes minutes, application takes seconds
Status and Future Work Both ESMPy and OCGIS are in production and fully supported Upcoming development: Read and write ESMF formatted weight files Write ESMF Fields in parallel Seamless conversions between serial and distributed objects in ESMPy Python 3 support
Requirements, Supported Platforms, Limitations, etc... Supported Platforms: -Linux, Darwin, and Cray -Gfortran -OpenMP -Linux, Darwin, Windows Requirements: ESMPy: -Python 2.6, 2.7 -Numpy 1.6.1/2 (ctypes) -ESMF installation (with NetCDF) OCGIS (additional dependencies): -netCDF4 -Shapely -Fiona -osgeo Testing: -Nightly regression testing-Travis CI integration Installation: -ESMPy: python setup.py build --ESMFMKFILE= install -OCGIS: python setup.py install conda install -c conda-forge esmpy ocgis
Selected Users UV-CDAT (PCMDI) – Ultrascale Visualization Climate Data Analysis Tools cfpython (University of Redding) – Implementation of the CF data model for reading, writing and processing of data and metadata Iris (Met Office) – Python library for visualizing meteorological and oceanographic data sets. PyFerret (NOAA) – Python based interactive visualization and analysis environment Community Surface Dynamics Modeling System (CU-Boulder) – Tools for hydrological and other surface modeling processes OCGIS – climate4impact portal (IS-ENES): Tools for climate modelers to tailor high resolution climate data OCGIS – ClimatePipes (kitware): User- friendly data access, manipulation, analysis and visualization of community climate models
Contact Us! References: 1.Khoei S.A., Gharehbaghi A. R., The superconvergent patch recovery technique and data transfer operators in 3d plasticity problems. Finite Elements in Analysis and Design, 43(8), Hung K.C, Gu H., Zong Z., A modified superconvergent patch recovery method and its application to large deformation problems. Finite Elements in Analysis and Design, 40(5-6), D. Ramshaw, Conservative rezoning algorithm for generalized two-dimension meshes. Journal of Computational Physics,59, UGRID documentation: accessed Dec. 19, GridSpec whitepaper: accessed Dec. 19, 2014https://ice.txcorp.com/trac/modave/wiki/CFProposalGridspec 6.Jones, P.W. SCRIP: A Spherical Coordinate Remapping and Interpolation Package. Los Alamos National Laboratory Software Release LACC Shapefile whitepaper: accessed Dec. 19, 2014http:// or Website: orhttps://earthsystemcog.org/projects/esmpy/
Jupyter Notebooks
ESMPy Regridding
Plotting the solution with matplotlib shows error on the order of 10 -7
OCGIS Utilities Tech-Stack ipynb
OCGIS Utilities Tech-Stack ipynb
OCGIS Utilities Tech-Stack ipynb
OCGIS Utilities Tech-Stack ipynb
OCGIS Utilities Tech-Stack ipynb
OCGIS Utilities Tech-Stack ipynb
OCGIS Utilities Tech-Stack ipynb
Implementation Details
ctypes bindings to ESMF Allocating Numpy array buffers for memory allocated in ESMF: buffer = numpy.core.multiarray.int_asbuffer( ctypes.addressof(pointer.contents), numpy.dtype(ESMF2PythonType[self.type]).itemsize*size) array = numpy.frombuffer(buffer, ESMF2PythonType[self.type]) Interfacing with ctypes: _ESMF.ESMC_GridGetCoord.restype = ctypes.POINTER(ctypes.c_void_p) _ESMF.ESMC_GridGetCoord.argtypes = [ctypes.c_void_p, ctypes.c_int, ctypes.c_uint, numpy.ctypeslib.ndpointer(dtype=numpy.int32), ctypes.POINTER(ctypes.c_int)] gridCoordPtr = _ESMF.ESMC_GridGetCoord(grid.struct.ptr, coordDim, staggerloc, exclusiveLBound, exclusiveUBound, ctypes.byref(lrc)) # adjust bounds to be 0 based exclusiveLBound = exclusiveLBound - 1 Switching between Fortran and C array striding: array = numpy.reshape(array, self.size_local[stagger], order='F') ESMPy is connected to ESMF using ctypes bindings to the C interface