Better data management through NetCDF Jaison Kurian CAOS IISc.

Slides:



Advertisements
Similar presentations
Chapter 2: Using Objects Part 1. To learn about variables To understand the concepts of classes and objects To be able to call methods To learn about.
Advertisements

Programming in Visual Basic
Chapter One The Essence of UNIX.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
10-Jun-15 Introduction to Primitives. 2 Overview Today we will discuss: The eight primitive types, especially int and double Declaring the types of variables.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Linux+ Guide to Linux Certification, Second Edition
Introduction to NetCDF Ernesto Munoz. Outline Overview of NetCDF Overview of NetCDF NetCDF file information NetCDF file information CDL utilities: ncdump,
Chapter 2 Basic Elements of Fortan
NetCDF 3.6: What’s New Russ Rew Unidata Program Center University Corporation for Atmospheric Research
FORTRAN Short Course Week 4 Kate Thayer-Calder March 10, 2009.
Guide To UNIX Using Linux Third Edition
Guide to Linux Installation and Administration, 2e1 Chapter 6 Using the Shell and Text Files.
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
CHAPTER 6 FILE PROCESSING. 2 Introduction  The most convenient way to process involving large data sets is to store them into a file for later processing.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
NetCDF Ed Hartnett Unidata/UCAR
Introduction to NetCDF Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
1 Chapter One A First Program Using C#. 2 Objectives Learn about programming tasks Learn object-oriented programming concepts Learn about the C# programming.
1 NetCDF and Self-Describing Data Kate Hedstrom January 2010
 2003 Prentice Hall, Inc. All rights reserved. 1 Introduction to C++ Programming Outline Introduction to C++ Programming A Simple Program: Printing a.
Introduction to Shell Script Programming
Lesson 11-Locating, Printing, and Archiving User Files.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Numeric Processing Chapter 6, Exploring the Digital Domain.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Scientific Computing Division A tutorial Introduction to Fortran Siddhartha Ghosh Consulting Services Group.
Slides based on lectures from the NCL workshop Also see the NCL homepage.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
NetCDF Operators 1 netCDF Operators [NCO]
Linux+ Guide to Linux Certification, Third Edition
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Deutscher Wetterdienst
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
Accessing Remote Datasets using the DAP protocol through the netCDF interface. Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
Advanced Utilities Extending ncgen to support the netCDF-4 Data Model Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
Brief Overview: GRIB, HDF, WRF.nc
Climate Data Formats Deniz Bozkurt
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
CKD Workshop  30 March 2011  Jim Kinter  Data Lessons from Project Athena GrADS Station Data Model Used for in situ observational data Individual reports.
ISBN Chapter 6 Data Types Introduction Primitive Data Types User-Defined Ordinal Types.
Lecture 5 1.What is a variable 2.What types of information are stored in a variable 3.Getting user input from the keyboard 1.
Eurasia Institute of Earth Sciences Istanbul Technical University NCL Introduction Deniz Bozkurt web.itu.edu.tr/bozkurtd ITU Eurasia Institute of Earth.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
VISUAL C++ PROGRAMMING: CONCEPTS AND PROJECTS Chapter 2A Reading, Processing and Displaying Data (Concepts)
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
NetCDF and binary read in MATLAB Pierre Chien 2009/03/19.
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
Utilities for netCDF-4 Dr. Dennis Heimbigner Unidata Advanced netCDF Workshop July 25, 2011.
Linux Administration Working with the BASH Shell.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
LINGO TUTORIAL.
NEMO – Reformating tool
(Network Common Data Form)
Introduction to Classes and Objects
NetCDF 3.6: What’s New Russ Rew
What is FITS? FITS = Flexible Image Transport System
NetCDF 資料格式介紹 NCL 變數設定介紹
National Scientific Library at Tbilisi State University
Unidata Advanced netCDF Workshop
Topics Introduction to File Input and Output
GDSS – Digital Signature
NCL variable based on a netCDF variable model
Topics Introduction to File Input and Output
SPL – PS1 Introduction to C++.
Presentation transcript:

Better data management through NetCDF Jaison Kurian CAOS IISc

Introduction There are different formats for geoscience data... Is NetCDF the best one ??????? HDF, CDF, NetCDF, Binary, GRIB, ASCII and many more

Let us start with few examples The story of a Rainfall data

Features of NetCDF 1. self-describing - all data and meta-data is encapsulated in one file 2. machine independent - works on almost all platforms 3. direct access – efficiently read subsets of large datasets 4. appendable - data can be quickly added to old files 5. sharable – one writing process and several reading process can occure at once 6. easy to learn 7. freely available, well documented & well supported 8. supported by variety of data analysis, processing, and visualization tools

NetCDF conventions There are different conventions.... “COARDS” is the most widely accepted one. COARDS ( Cooperative Ocean/Atmosphere Research Data Service) Filename extension.cdf or.nc ????.nc !!!

Commands : ''ncdump'' & ''ncgen'' ncdump can be used to get a CDL (network Common Data form Language) file CDL file is the ascii representation of a NetCDF file ncdump / CDL file provides an easy way to look at the structure and contents of a NetCDF file. ncdump -c ncfile.nc | less OR ncdump -c ncfile.nc > ncfile.cdl man ncdump

ncgen can be used 1. to check the syntax of input CDL file. 2. to make a fortran/c program to write the NetCDF file described in input CDL file. 3. to make a binary NetCDF file from fom given CDL file. We will see the details later

Components of a NetCDF file are : 1. Header part 1.1 Dimensions 1.2 Variables 1.3 Attributes 2. Data part coards

1. dimension/variable name should start with a letter and can have digits and '_'. ''temp'' 2. names are “case sensitive” Rules.... ''1temp'' ''temp'' = ''Temp'' = ''TEMP''

1.1 Dimensions Maximum number of dimensions for a file is 512 (netcdf-3.5.1) dimensions: time = 31 ; height = 1 ; latitude = 122 ; longitude = 182 ; ; Maximum dimensions for a variable is 4 name length/siz e dimension s

dimensions: time = UNLIMITED ; // (31 currently) height = 1 ; latitude = 122 ; longitude = 182 ; ; dimension s unlimited dimension ??? A NetCDF dataset can have at most one unlimited dimension, but need not have any. NetCDF model does not cater for variables with several changeable dimension sizes. Variables should have rectangular shapes.

1.2 Variables Maximum number of variables for a file is 4096 (netcdf-3.5.1) dimensions => shape short wind_speed(time, height, latitude, longitude) ; wind_speed:long_name = "wind speed" ; wind_speed:units = "m/s" ; short zonal_wind_speed(time, height, latitude, longitude) ; zonal_wind_speed:long_name = "zonal wind speed" ; zonal_wind_speed:units = "m/s" ; typename variable s

variable data “types” TypeFortranNetCDF Bits byte BYTE NF_BYTE 8 char CHARACTER NF_CHAR 8 short INTEGER*2 NF_SHORT 16 long INTEGER*4 NF_LONG 32 float(real)REAL*4 NF_FLOAT 32 NF_REAL 32 doubleDOUBLE PRECISION NF_DOUBLE 64 REAL*8 64 variable s

Which dimension varies fastest ????? CDL / C short wind_speed(time, height, latitude, longitude) slowest varying dim fastest varying dim Fortran INTEGER*2 wind_speed(longitude, latitude, height, time) slowest varying dimfastest varying dim variable s

T0 : :00:00 t_end Lon(X) Lat(Y) Depth / Height Time = Tn

Coordinate / Independent Variables (with same name as dims) dimensions: time = 31 ; height = 1 ; latitude = 122 ; longitude = 182 ; variables: float time(time) ; time:units = "hours since :0:0" ; time:time_of_day = "12:00" ; float height(height) ; height:units = "meters" ; height:positive = "up" ; float latitude(latitude) ; latitude:units = "degrees_N" ; float longitude(longitude) ; longitude:units = "degrees_E" ; short wind_speed(time, depth, latitude, longitude) ; variable s

Coordinate variables have no special meaning to the NetCDF library. But it typically defines a physical coordinate corresponding to that dimension for the “software” using this library Softwares/packages that make use of coordinate variables commonly assume they are numeric vectors and strictly monotonic : all values are different & either increasing or decreasing and no missing/Fill values variable s

Primary / Dependent Variables dimensions: time = 31 ; depth = 1 ; latitude = 122 ; longitude = 182 ; variables: short wind_speed(time, depth, latitude, longitude) ; wind_speed:long_name = "wind speed" ; short zonal_wind_speed(time, depth, latitude, longitude) ; zonal_wind_speed:long_name = "zonal wind speed" ;

1.3 Attributes Variable attributes => provides information about a particular variable short wind_speed(time, depth, latitude, longitude) ; wind_speed:long_name = "wind speed" ; variable name attr. nameattr. data (character string) wind_speed:missing_value = 32767s ; (numeric value) attribute s

Character Variable Attributes short wind_speed(time, depth, latitude, longitude) ; wind_speed:long_name = "wind speed" ; <= Title wind_speed:units = "m/s" ; => OR "ms-1" long_name & units are recognized by tools like Ferret We can add any other attributes if needed but this does not be recognized by any tools....example.. wind_speed:var_desc = "scalar wind speed"; wind_speed:dataset = "quikscat_01_2001.nc” ; wind_speed:level_desc = "Surface" wind_speed:statistic = "3 day Mean" wind_speed:parent_stat = "Satellite Observation" wind_speed:history = ''no processing'' attribute s

Character Variable Attributes float latitude(latitude) ; latitude:long_name = "Latitude in Degrees'' latitude:units = "degrees_N" ; latitude:point_spacing = ''even'' <= perfomance improvement float longitude(longitude) ; longitude:long_name = ''Longitude in Degrees'' longitude:units = "degrees_E" ; longitude:modulo = '' '' longitude:point_spacing = ''even'' degrees_E / degrees_east / degree_E / degree_east degrees_N / degrees_north / degree_N / degree_north

Character Variable Attributes float depth(depth) ; depth:long_name = ''Depth wrt sea surface'' depth:units = "meters" ; depth:positive = "down" ; float height(height) ; height:long_name = ''Height wrt Ground'' height:units = "meters" ; height:positive = "up" ; for ocean for atmosp. attribute s

Character Variable Attributes float time(time) ; time:long_name = ''Time'' time:units = "hours since :0:0" ; time:time_of_day = "12:00" ; time:calendar = “JULIAN” ==> OR calendar_type Reccomented time units are : seconds, minutes, hours & days. months & years are not of equal length GREGORIAN or STANDARD default calendar JULIAN with leap years NOLEAP or COMMON_YEAR 365 no leap years 360_DAY 360 each month is 30 days calendar(tool specific) attribute s

Character Variable Attributes climatological time axis float time(time) ; time:long_name = ''Climatological Time'' time:units = "hours since :0:0" ; time:modulo = '' '' attribute s

Numeric Variable Attributes short wind_speed(time, depth, latitude, longitude) ; wind_speed:long_name = "wind speed" ; wind_speed:valid_min = 0.f ; wind_speed:valid_max = 60.f ; OR wind_speed:valid_range = 0.f, 60.f ; Numeric ''type'' of attribute should be same as that of variable. attribute s

Numeric Variable Attributes short wind_speed(time, depth, latitude, longitude) ; wind_speed:long_name = "wind speed" ; wind_speed:scale_factor = 0.01f ; wind_speed:add_offset = 0.f ; scale_factor & offset together offers ''packing'' of data while a tool ''reads'' packed data : first multiply by scale_factor then add offset while ''packing'' data : first subtract offset then devide by scale_factor scale_factor and add_offset => of the type of unpacked data(float or double) ''packed'' data is typically of type byte or short attribute s

Numeric Variable Attributes short wind_speed(time, depth, latitude, longitude) ; wind_speed:long_name = "wind speed" ; wind_speed:missing_value = 32767s ; wind_speed:_FillValue = 32767s ; _FillValue : value used to pre-fill disk space allocated to the variable scalar, same ''type'' as the variable missing_value : value/values indicating missing data scalar/vector, same ''type'' as the variable These values should all be outside the valid_range. If variable is ''packed'' ==> missing_value/_FillValue flags are likewise packed attribute s

Global Attributes provides information about the netCDF dataset as a whole such as title, processing history, instrument can be of character / numeric type a good option to store all the necessary details about the data set to make it ''really self-describing'' attribute s

Global Attributes // global attributes: :WOCE_Version = "3.0" ; :CONVENTIONS = "COARDS/WOCE" ; :long_name = "QuikSCAT daily mean wind fields" ; :producer_agency = "IFREMER" ; :producer_institution = "CERSAT" ; :product_version = "1.0" ; :time_resolution = "one day mean" ; :spatial_resolution = "0.5 degrees" ; :platform_id = "QuikSCAT" ; :instrument = "QuikSCAT" ; :objective_method = "kriging" ; :data_processing = "data missing dates are filled with dummy _FillValue-s" ; :time_modification = "to avoid the problems with 12:00 hrs in ferret" attribute s

data 2. Data time = , , , , , , , , , , , , ; height = 10 ; latitude = 30.25, 29.75, 29.25, 28.75, 28.25, 27.75, 27.25, 26.75, 26.25, ; longitude = 29.75, 30.25, 30.75, 31.25, 31.75, 32.25, 32.75, 33.25, 33.75, ; wind_speed = -129, -129, -129, -129, -129, -129, -129, -129, -129, -129, -129, -129, ;

Fortran Interface How to read/write NetCDF files using Fortran ?? - use the ''include'' header file to define NetCDF related variables. INCLUDE 'netcdf.inc' - Explicitly specify NetCDF ''include'' & ''lib'' directories if the files ''netcdf.inc'' and ''libnetcdf.a'' are not in default search directories for the compiler (like /usr/include & /usr/lib) ] $ f77 mync_pgm.f -I/home/pkgs/netcdf /include -L/home/pkgs/netcdf-3.5.1/lib -lnetcdf Fortran Interf

Fortran Interface steps to create a new NetCDF file : 1. open a new NetCDF file err = NF_CREATE ( 'let_me_learn.nc', NF_WRITE, ncid ) 2. define all the required dimensions err = NF_DEFINE_DIM( ncid, 'latitude', 180, dimid_lat ) 3. define all the required variables err = NF_DEFINE_VAR( ncid, 'latitude', NF_REAL, 1, dimid_lat, varid_lat) 4. define all attributes err = NF_PUT_ATT_TEXT( ncid, varid_lat, 'units', 9, 'degrees_N' ) 5. leave define mode ( and enter ''data'' mode ) err = NF_ENDDEF (ncid) <== Very Important 6. write data err = NF_PUT_VARA_REAL(ncid, varid_lat, 1, 180, lat) 7. close NetCDF file err = NF_CLOSE (ncid) Fortran Interf

Fortran Interface Fortran Interf steps to read an existing NetCDF file : 1. open existing NetCDF file err = NF_OPEN ( 'let_me_learn.nc', NF_NOWRITE, ncid ) 2. get all the required variable ''id''s err = NF_INQ_VARID( ncid, 'latitude', varid_lat ) 3. get variable ''data'' err = NF_GET_VARA_REAL( ncid, varid_lat, start, count, lat) 4. close NetCDF file err = NF_CLOSE (ncid)

Fortran Interface Fortran Interf OMODE Flags NF_CLOBBER : overwrite any existing dataset with the same file name NF_NOCLOBBER : do not overwrite (clobber) an existing dataset NF_WRITE : open dataset with read-write access. - add/change dim, var, att & data - delete att NF_SHARE : same as NF_WRITE - one process may be writing the dataset and one or more other processes reading the dataset concurrently NF_NOWRITE : open dataset with read-only access

Fortran Interface How to write the program in an efficient way ???? 1. Use IMPLICIT NONE option 2. Use HANDLE_ERR subroutine err = NF_CREATE ( 'let_me_learn.nc', NF_CLOBBER, ncid ) if (err.NE. NF_NOERR) call HANDLE_ERR(err) SUBROUTINE HANDLE_ERR(ERR) IMPLICIT NONE INCLUDE 'netcdf.inc' INTEGER ERR PRINT *,'netcdf error : ', NF_STRERROR(ERR) STOP 'Stopped' END 3. Segmentation fault (core dumped) ==> check for number of arguments Fortran Interf

Fortran Interface Fortran Interf Let us see few examples

ncgen 1. to check the syntax of input CDL file. 2. to make a fortran/c program to write the NetCDF file described in input CDL file. 3. to make a binary NetCDF file from fom given CDL file. man ncgen

Installing NetCDF (netcdf-3.5.1; RH Linux- 9) Get netcdf tar.Z from unidata's download site. Login as root (if needed) Do the following 'setenv' stuff (export for bash shell) export CC=/usr/bin/c99 export CPPFLAGS='-DNDEBUG -Df2cFortran' export CFLAGS=-O export FC=/usr/bin/f77 export FFLAGS='-O -w' export CXX=/usr/bin/c++ # zcat./netcdf tar.Z | tar xvf - #./configure # make # make test # make install # make clean installation

Installing NetCDF Let us test wether this new library is working fine

What about curvilinear data ???

Limitations of NetCDF 1. File size increases in some cases even with missing data. 2. Only one UNLIMITED dimension is possible. 3. Limited number of external data types......inefficient use of disc space. 4. File size maximum of 2GB. 5. The extent to which data can be completely self-describing is limited: there is always some assumed context without which sharing and archiving data would be impractical. 6. No support for multiple concurrent writers. 7. Dimensions, Variables & Data cannot be DELETED !!!!!!

So..... how is NetCDF ???????

Good ?????

Beware !!!!!!!!!!!!!! Data being in NetCDF format doesnot guarantee that it is better than having the data in other formats (unless it is supplied in proper shape with all necessary details/informations). Here is an example from Argo data archive.

Questions Please

Some usefull sites.... NetCDF home page : why NetCDF : netCDF/csm_why_netcdf.html Software for Manipulating or Displaying netCDF Data Documentaion : COARDS NetCDF Convention :

NetCDF man pages : man ncdump man ncgen man netcdf

THANK YOU