Download presentation
Presentation is loading. Please wait.
Published byImogene Riley Modified over 9 years ago
1
Experiences of a Earth Science Data User Confessions of a Data Hoarder Rob Carver, The Weather Company
2
–Andrew S. Tanenbaum “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.”
3
Open Data and The Weather Company ❖ Our business model is taking open data and using it to tell interesting stories that engage our users. ❖ Over the years, we’ve archived over 100 Tb of data ❖ GRIB1, GRIB2, NIDS, shapefiles, netCDF, HDF5, ❖ NWS/NCEP, NCDC, FEMA, Census Bureau, NASA DAAC’s
4
Locating Data 1.Google and literature searches 2.??? 3.Data!
5
100+ Tb of Weather Models ❖ Most data arrives through Unidata’s LDM and FTP pull scripts. ECMWF pushes data to our FTP site. (All GRIB2/1) ❖ Ingested into the forecast system, and GRADS handles the model visualization ❖ Archived to local disk arrays and Amazon S3
6
Level-III NIDS Archive ❖ NCDC maintains an archive of the WSR-88D radar network’s products from 1995 to present (>10 Tb) ❖ Order datasets from a tape-based archive ❖ Two years to acquire it using a set of PHP scripts ❖ Easier to acquire the entire archive than figuring out what subset to acquire ❖ Already had a NIDS parser for visualization
7
FEMA Flood Maps ❖ Data Acquisition Method: DVD for each state ❖ Format: ESRI Shapefiles (1 shapefile of a feature class per state) ❖ Data Display: Split state shapefiles by county and then pre-render tiles for moderate to coarse zoom levels on a map mashup.
8
Suggestions ❖ Data in a difficult/proprietary format just waste disk space ❖ Please use data formats that are well-supported by open-source software packages (i.e. OGR/GDAL) ❖ netCDF, TIFF, ESRI shapefiles, HDF5, geoJSON ❖ Instead of complex CSV or fixed-width text files, use self-describing formats (JSON,XML,SQLITE)
9
Suggestions (cont.) ❖ Data/Navigation files should use the same naming conventions/sequences ❖ Don’t use overly large archive files ❖ Data pools/ftp servers attached to large disk arrays are awesome data providers (as long as limits are in place) ❖ For really large, static datasets (>10Gb), Bittorrent would be really useful
10
Questions/Comments/Answer s? ❖ rob.carver@weather.com rob.carver@weather.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.