DataTools Models Data, models and tools: Dealing with any complex hydraulic engineering problem invariable use is made of: data, models and tools.
Wat is the problem? Quality, quick availability and accessibility of data for analysis purposes currently not satisfactory Models and tools used and developed by engineers are not sufficiently documented nor version controlled We can do much better! Data: data not under version control, multitude of file formats, metadata not available within data files. Models and tools: different tool versions on users’ PC’s, confusion on version of tool used to perform calculations. Result: inefficiency!
OPeNDAP Server Raw DataTools Models SubVersion Server DetailedSimplified OpenEarth (BwN) provides the infrastructure to deal with this problem. Basic elements: SubVersion server & OPeNDAP server. Paradigm: Fixed structure – flexible access. User Supplier
X Z T Y An array based data structure for storing multidimensional data N-dimensional coordinates systems –X coordinate (e.g. longitude) –Y coordinate (e.g. latitude) –Z coordinate (e.g. altitude) –Time dimension –… other dimensions Variables – support for multiple variables –Temperature, humidity, pressure, salinity, etc Geometry – implicit or explicit –Regular grid (implicit) –Irregular grid –Points NetCDF: NASA's Earth Science Data Systems Standards Process Group recommends NetCDF as data storage standard. Pro’s: data exchangeability, platform independent, robust use and easy to understand. What is NetCDF?
Efficient data storage: Binary NetCDF format enables complete variable definition with a minimal set of numbers (see example) and minimal metadata repetition. Result: efficiency in disk space, easy database querying. XYZQ X YZ 32 numbers14 numbers
transect.nc netcdf transect.nc { dimensions: crossshore = 198 ; time = 3 ; variables: float crossshore_distance(crossshore), shape = [198] crossshore_distance:unit = "meter" float year(time), shape = [3] year:unit = "year" float height(time,crossshore), shape = [3 198] height:unit = "meter" data: coastward_distance = (-65:5:920); year = (2006:2008); height = [ … … … ]; } x = nc_varget(transect.nc, 'crossshore_distance'); y = nc_varget(transect.nc, 'time'); z = nc_varget(transect.nc, 'height'); surface(x, y, z); Example NetCDF file: 198 crossshore points, 3 timestamps, 3 x 198 surface elevations. Metadata in one file together with the data. NB: transect.nc is a binairy file. Easy Matlab routines available: nc_varput, nc_addvar, nc_varget (see upper right) Example:
SubVersion: open source version control system. Users ‘commit’ their files in one central database (update local copy regularly). Every commit receives a unique revisionnumber. Comments indicate per commit which changes were made.
Blame functionality: Subversion knows of each line of code who changed it, when and as part of what revision number. Colors indicate the age of the code (bluer = older). Any change can always be rolled back at any time.
Merge tool: Changes made between any two versions of a tool are easily revealed using the Merge tool. The Merge tool also helps to resolve coding conflicts in case multiple users modified the same code.
Version control: any routine/datafile can automatically be given a comment block with information on: last change date, author, revision number etc. Recording revision info of tools and data used in a project enhances reproducibility of results.
Statistics: Per project/tool a separate repository can be made. Combining reusable tools in one central repository provides large advantages (sharing, cooperation, learning). OpenEarth tools, is open source and freeware.
Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools Data workflow: OpenEarth pre-scribes the following steps to make data available: 1. put raw data in a SubVersion repository, 2. use scripts to transform data to NetCDF including meta data, 3. upload *.nc files to OpenDap server, and 4. provide easy access.
Community of practice: OpenEarth has a wide community of users (Building with Nature, EU FP7 MICORE, Delft Cluster etc.). A wide number of trainingsessions are available (SubVersion use, programmingstandards, etc.).