Precipitation Data: Format and Access Issues Tom Hopson
Data availability details Data Storage and Access stored in geospatial database at NCAR, b) Data are archived and backed up, c) Storage is network-accessible, d) Services allow for retrieving in SQL, GeoJSON, WKT Algorithm modifications a) Raw forecasts, b) bias-corrected, c) calibrated, d) multi-modeled Formats wgrib2 (native), b) binary, c) netcdf, d) GIS shapefiles (different basin resolutions), e) CSV text files
Different spatial and temporal scales: 28 Catchments 4696 Catchments Temporal: 24 hour accumulations: 0 to 10 day lead times 5 day accumulations: 0-4 day, 5-9 day, 10-14 day lead times Etc. – other resolutions can be provided
Data availability details (cont) FTP Access FTP site (automated “push” or “pull”) Example: ftp://ftp.rap.ucar.edu/incoming/irap/tigge/ providing operational daily ensemble gridded precipitation (and other met variables) forecasts (going out to 16 day lead-times) for eight weather centers: CMA [China], CMC [Canada], CPTEC [Brazil], ECMWF [European Union], JMA (Japan), MeteoFrance, NCEP [USA], and UKMO). over past year, working with Bhakra Beas hydrologists to provide access
Data availability details (cont) Other Access Methods a) web-accessible files, b) web displays, c) email “alerts” Discharge forecasts translated from rainfall forecasts Tech Transfer All processing steps and web displays can be installed locally here in Bihar
Automated Gauge Quality Assurance System Characteristics Identifies > 99.99% of bad river gauge data Automatically customizable to each river Uses historical river data to derive QA parameters Same system can be used for small and big rivers Computationally efficient Techniques employed Identifies anomalously high/low values Identifies unrealistically rapid rises/drops in river level Employs QA score to identify suspicious data Future work Compare with gauges upstream/downstream Obtain more historical data to further refine QA parameters Further develop existing QA algorithms to be more proficient Add more complex QA algorithms to identify subtle errors Employ satellite data for rivers/river sections without gauge data
Automated Quality Assurance System e.g. river gauge Brahmaputra River at Guwahati Brahmaputra River at Guwahati raw river gauge data possibly bad data bad data
“I have a very strong feeling that science exists to serve human welfare. It’s wonderful to have the opportunity given us by society to do basic research, but in return, we have a very important moral responsibility to apply that research to benefiting humanity.” Dr. Walter Orr Roberts (NCAR founder) “I have a very strong feeling that science exists to serve human welfare. It’s wonderful to have the opportunity given us by society to do basic research, but in return, we have a very important moral responsibility to apply that research to benefiting humanity.” Walter Orr Roberts
Quality Assurance of India River Level Gauge Data Joe Grim, NCAR Tom Hopson, NCAR Satya Priya, World Bank
QA characteristics Identifies > 99.99% of bad data Automatically customizable to each river Uses historical river data to derive QA parameters Same system can be used for small and big rivers Computationally efficient
QA procedure #1: extract as much valid data as possible from raw gauge level data if data value is useless (e.g., level=asdfg), mark as “missing” QA procedure #2: Identify multiple reports at same time at given station If all identical values, only keep one If only 2 non-identical values QA flag the one with a greater difference with its nearest time neighbor If > 2 non-identical values QA flag all, except one with value closest to group median
Sonai River at Kashithal example of non-identical duplicate
QA procedure #3: Identify data points with no nearby data points for comparison If no other data within 10 days, identify with QA flag Ganga River at Hathidah example of isolated pt.
QA procedure #4: Identify exceptionally low/high values First calculate difference between 10th & 90th %iles for each station, Data > is QA flagged Data < is QA flagged and were optimized to be an small as possible, without falsely identifying any good data
Identifying Outliers – too high/low Ganga River at Allahabad too high upper limit = 90th %ile } 10th %ile lower limit = too low
QA procedure #5: Identify data points that suggests very rapid time rate of change on either side (e.g., data “spike”) Calculate change in level per unit time, before and after, for all data points; then sum the before and after rates: Calculate the 95th %ile of all for each station Identify all that exceed , =16 as “bad”. Identify smaller that exceed , but are less than with 0-99 QC “score” QC score = , =5
Brahmaputra River at Guwahati “questionable” spikes sometimes good data are identified as “questionable” “bad” spikes
QA on satellite-derived river level M/C ratio correlated to fraction of pixel covered by water
Future work ideas Collect more historical data to capture full annual monsoon cycle Compare with upstream/downstream gauges Further develop existing QA algorithms to be more proficient Add more complex QA algorithms to identify subtle errors