Download presentation
Presentation is loading. Please wait.
Published byOliver Eriksen Modified over 5 years ago
1
Ben Koziol (NESII/CIRES/NOAA-ESRL/GSD) February 2018
OpenClimateGIS: A Python Library for Geospatial Manipulations of Climate Datasets Ben Koziol (NESII/CIRES/NOAA-ESRL/GSD) February 2018 Hello. My name is Ben Koziol. Today I’ll be giving an overview presentation on OpenClimateGIS. I’d like to quickly thank Tom Landry and the PAVICS/Birdhouse teams for inviting me to speak today. The infrastructure being developed as part of this joint collaboration is very valuable to the geosciences community and will continue to provide an important service as climate change adaptation data requirements evolve. OCGIS’s involvement in this collaboration has greatly benefited the software in terms of bug fixes, feature additions and enhancements, performance improvements, and stress testing. On a personal note, I am looking forward to continued work with all members of this team.
2
NESII Group Overview NESII → NOAA Environmental Software Infrastructure & Interoperability Group NESII builds software infrastructure for Earth system modeling, data analysis, and scientific collaboration using open source, community development approaches NESII has been at ESRL / CIRES since November, formerly the Earth System Modeling Infrastructure section at the National Center for Atmospheric Research Joined Global Systems Divisions at ESRL in 2016 Partners and customers are from research and operational centers, weather and climate, across U.S. agencies and international organizations Before diving into software details, it is worth disambiguating my work affiliation acronym on the previous slide. NESII stands for the NOAA Environmental Software Infrastructure & Interoperability Group. Our group of about 15 software engineers build infrastructure for Earth system modeling, data analysis, and scientific collaboration using free and open source software development approaches. We’ve been part of NOAA’s Earth System Research Lab in Boulder, CO since 2009 joining the Global Systems Division there in 2016. We are also affiliated with the Cooperative Institute for Research in the Environmental Sciences at the University of Colorado-Boulder.
3
Presentation Outline Overview Subsetting Structured Computation
Data Format Conversion Next Steps Notebook Demo Moving on, here is a quick presentation outline. First, I’ll provide a high-level introduction to OpenClimateGIS followed by a discussion of its three core capabilities: subsetting, structured computation, and data format conversion. I’ll then talk about next steps for the software followed by a live Jupyter Notebook demonstration. To leave time for the live demo, I’ll try and move through these slides relatively quickly.
4
What is OpenClimateGIS?
OpenClimateGIS (OCGIS) is a pure Python, open source software library enabling dynamic access to and manipulation of climate data Software goal is to overcome barriers of usability of climate projections in adaptation planning and resource management Translate out of climate data formats Select geographical regions of interest Select times/levels of interest Compute application-relevant indices Convert to end-user and analysis-ready formats Provide comprehensive metadata Enable access to big, high-resolution climate data files Builds on numerous open source software libraries: So...what is OpenClimateGIS? I’m aware that many of you listening today have probably not heard of this software. OpenClimateGIS is a pure Python library designed for dynamic access to and manipulation of climate data. It was originally designed as a tightly-coupled Django-based RESTFUL web service at its inception in 2011, but was then later refactored to operate independently as a standalone data processing library. The original mission of the software has been maintained however, its goal being to overcome barriers of usability of climate projections in adaptation planning and resource management: <list from slide> It should of course be noted that OCGIS is not monolithic and builds on many other open source software projects. Required Optional netCDF4 numpy shapely fiona osgeo rtree cf_units ESMF mpi4py
5
Software Architecture
Written in pure Python Modular/extensible design for data interface, format conversion, and computations Operations manipulate coordinate variables to limit the amount of “value data” requested Built with generator functions at the operations API With few exceptions, all operations parallelized with MPI I’d like to talk a little bit about OCGIS’s underlying architecture. The software has a modular and extensible design. These abstractions allow it to adapt to the varieties of climate metadata in circulation and handle storage formats outside NetCDF. There is a data interface containing software features familiar to climate data scientists: dimensions, variables, collections or datasets, as well as groups. OCGIS also collects metadata-tagged spatial and temporal information along with data and ancillary variables in a collection object called a “field”. The “field” is the primary data interface object in OCGIS. The lower-level data interface is itself wrapped for ease-of-use by an operations API that simplifies most OCGIS functionality into a single API call. Operations is built with generators allowing it to churn through a number of datasets without disintegrating. Also, added in the last release was the ability to execute operations and arbitrary OCGIS scripts in parallel using MPI. I’ll be able to describe these features in slightly more detail during the live demonstration.
6
Subsetting Handles many types of geospatial subsetting:
Points Arbitrary Polygons Lines Bounding Boxes Multi-geometries Reads geometries directly from OGC and ESRI formats (anything that supports Python) Temporal subsetting - time ranges or “regions” (i.e. arbitrary month and year combinations) Level subsetting - lower and upper bounds Label-based slicing Reads and writes CF and PROJ.4 coordinate systems Wrapping and unwrapping for 360 geographic coordinate systems OCGIS is primarily designed around spatial subsetting and can handle a wide variety of geospatial data inputs and configurations. Pretty much anything that can be pushed into a Shapely geometry can be used by OCGIS to spatially subset fields. It also provides time and level subsetting through first-class treatment of these variable types. OCGIS also implements label-based or named dimension slicing to extract arbitrary data sections from variable collections which is used heavily during coordinate variable subsetting. It can also handle all CF coordinate systems in addition to spherical coordinate system wrapping/unwrapping for grids and geometries.
7
Structured Computation
Framework designed to accommodate a variety of climate indices and metrics: Temporally grouped functions → monthly means, annual maximums, seasonal aggregations String-based functions → ‘diff=tasmax-tasmin’ Simple transforms → natural logarithm Multivariate functions → heat indices Provide a “straightforward” and “painless” method for introducing new indices Integrated ICCLIM ECA indices Calculations in OCGIS are designed to be both structured and extensible. Each calculation is itself a Python class that allows customizations for things like temporal grouping, unit handling, named variables, metadata, and the actual calculation code itself. The structure of the calculation code developed around applied climate indices attempting to encapsulate the need for standardization with exception. In addition to a standard suite of indices, we’ve also integrated ICCLIM European Climate Assessment indices through a collaboration with CERFACS in France. I’ll run a number of calculations during the notebook demonstration so you can get a feel for how these work.
8
Data Format Conversion
A general framework for data conversion for streaming to multiple formats Sequential parallel writes implemented for all formats. Asynchronous writes available for NetCDF provided its built with parallel support Common set of headers for tabular output files that may be adjusted to suit a user’s needs (i.e. a user may only be interested in a timestamp and associated data value) Uses Fiona to write to GIS vector formats Currently supported formats: CSV, Keyed-CSV Shapefile, GeoJSON, netCDF (with and without geometries!), ESRI Shapefile, array-based collections NetCDF Metadata Interpretation → CF-Grid, UGRID, SCRIP OCGIS has a driver-based framework for dealing with data format interoperability. Parallelism up to a point is implemented for all formats. Tabular formats like CSV do not support asynchronous writing so sequential, parallel writes are used. However, NetCDF asynchronous will be supported in the next release provided that the NetCDF library is built with parallel support. For OGC-like datasets, Fiona is used to write GIS vector data. A number of popular formats are supported as well as CF-Metadata, UGRID, and SCRIP conventions for NetCDF data. For those not familiar with UGRID, it is a metadata convention for storing unstructured grids or grids that are not logically rectangular. SCRIP is one of the original metadata conventions for storing grids in netCDF. David Huard and team as part of the PAVICS work contributed a linked netCDF GIS format which has been really needed for quite some time.
9
Some Recent OCGIS Developments
National Hydrography Dataset Conservative Regridding Motivation: Develop and profile a high performance regridding solution for complex, irregular meshes and fine resolution rectangular grids over large scales. Chunked Regridding Motivation: Develop software infrastructure for grid manipulations that scale to high spatial resolutions and accuracies while accommodating arbitrary compute environments and grid structures. Before finishing up slides, I want to mention two recent OCGIS applications presented at AGU last year. The first application converted the U.S.’s National Hydrography Dataset hydrologic catchments (~2.5 million catchments, ~485 million nodes) into a netCDF unstructured grid format handling holes and multi-polygons in the process. We then used ESMF conservative regridding with a 250-meter exact test grid to evaluate mean error with good results. The second application developed a chunked regrid weight generation method for very high resolution Earth science grids. OCGIS is used to create a spatial decomposition allowing regridding operations that take terabytes of memory to be executed on a laptop, for example. ESMF is again used for regrid weight generation.
10
Next Steps 2.1.0 (March 2018) UGRID/SCRIP/ESMF Unstructured Format
Asynchronous NetCDF IO (adds write capability) Chunked regrid weight generation for big grids Memory and performance improvements 2.2.0 (...future...) Memory tiling for computations Redistribution Parallel regridding w/ ESMF mesh support CF-Discrete Sampling Geometries ? <read bullets> Feature requests are always welcome in addition to outside contributions.
11
Contacts & Links Questions, comments, suggestions, or “hidden features”: or Mailing lists and releases: Software links: Here some important links and contact information. And with that, I’ll move on to the live demonstration. Any questions before I do so?
12
Backup Slides
13
Status Current Release: 2.0.0 (2.1.0 planned in coming months)
Project is fully open source under the University of Illinios- NCSA License ( Hosted on GitHub: Mature test harness (1000+ unit tests) Here is a quick status slide for the software. We are currently prepping for version 2.1.0 The project is hosted on GitHub with
14
Quick Subset Code Example
import ocgis ops = ocgis.OcgOperations(dataset={‘uri’=’/data/tas_kelvin.nc’}, time_region={‘month’: [6, 7, 8]}, geom=[-121, 38, -122, 40], conform_units_to=’celsius’, output_format=’nc’) path = ops.execute() primary entry point most users will encounter
15
ESMPy Overview Python interface to the high performance, parallel regridding functionality of ESMF - Uses NumPy for data array management Supported coordinate representations: Grid: 2D/3D, logically rectangular, regional/global, stagger options Mesh: 2D/3D unstructured LocStream: 2D/3D unconnected points (point cloud) Source data is represented with Field, built on a Grid, Mesh or LocStream Regridding uses two Fields (source and destination) Methods include first order conservative, bilinear, nearest neighbor, and more Data may be read directly from file. Formats include Gridspec, UGRID, and SCRIP Other notable features include masking, ignoring unmapped points, options for line paths between points, and a variety of pole handling capabilities
16
ESMPy Integration OCGIS has been integrated with ESMPy to support bilinear and first order conservative regridding between structured grids Regridding is part of the operations “ecosystem” and may be chained with subsetting, etc. Current development is adding support for meshes and location streams in addition to grids Use mesh regridding in place of the nonoptimal spatial averaging algorithms inside OCGIS Use location stream for unstructured/observational regridding sources and targets automatically selects regridding method based on the presence of corners - conservative with corners With the release of ESMPy 7.0, ESMPy fields will be interoperable with OCGIS fields - proof-of-concept code in feature branch
17
Extensibility Example Calculation Subclassing
Example NetCDF Data Reading
18
Dataset Bundling Bundles or packages are groups of data over which to apply a common set of operations → idea is to extend ensembles OCGIS consolidates coordinate systems for the datasets and subset geometry(s) and applies selected operations to each in sequence The example data displayed below is from a CSV output from three datasets: CMIP5 Decadal Simulation (3 degrees, 360 lat/lon) NARCCAP CRCM-CGCM3 (50 km, Polar Stereographic) Maurer Gridded Observational (⅛ degrees, 180 lat/lon) Example description: Pull out all January dates Spatially subset and area-weight the values for grid cells intersecting the Nebraska state boundary Calculate the monthly mean and standard deviation Write data to CSV
19
Core Capabilities of OpenClimateGIS
Read local or remotely served (i.e. OPeNDAP) ~CF- compliant netCDF datasets Geospatial subsetting by arbitrary vector geometries (e.g. watersheds) and time/level bounds Common spatial operations such as intersects, clip, and aggregation on point or polygon (e.g. bounded coordinates) data representations Geometry wrapping and unwrapping to maintain a “GIS-friendly” -180 to 180 longitudinal spatial domain Support for geographic (e.g. latitude/longitude) and projected climate datasets (e.g. Lambert Conformal) Option to apply temporally-grouped computations to data subsets Write climate data to GIS and tabular formats
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.