PRP Weather Data Transfer PRPv2 Workshop February 21, 2017 Scott L. Sellars1, Phu Nguyen2, John Graham3, Joulien Tatar3, Larry Smarr3, Tom DeFanti3, F. Martin Ralph1 and Soroosh Sorooshian2 1Center for Western Weather and Water Extremes, UCSD, La Jolla, CA 2Center for Hydrometeorology and Remote Sensing, UCI, Irvine, CA 3California Institute for Telecommunications and Information Technology (Calit2), UCSD, La Jolla, CA
Weather Data Transfer Real time observations and Models Satellites, Ground based observations, Numerical Weather Prediction “Reanalysis” data NASA (M2I3NPASM): Assimilated Meteorological Fields (~15TB) (M2T1NXADG): Aerosol Diagnostics (extended) (~15TB) (M2I3NVAER): Aerosol Mixing Ratio (~15TB) (M2I3NVASM): Assimilated Meteorological Fields (M2I3NVCHM): Carbon Monoxide and Ozone Mixing Ratio (~8TB) (M2I6NVANA): Analyzed Meteorological Fields (~15TB) (M2T1NXLND): Land Surface Diagnostics (~5TB) NCEP/NCAR ECMWF (ERA-40) NOAA Climate Models CMIP5 (2,200TB) CMIP6 (?) 1127 variables…
NASA MERRA2 IVT (kg m-1 s-1 ) Latitude Longitude
Big Data Transformed Into Insight Novel approach in Computational Earth Sciences (Sellars et al. 2013,2015) Dr. Scott L. Sellars and Dr. Phu Nguyen (among others) CONNECT: Object Segmentation Object Storage (PostgreSQL) Data Hypercube: 5mm/hr Rainfall Longitude Latitude Time t=5 t=4 t=3 t=2 t=1 Latitude Longitude Database Indexes: Set Object Criteria: Data Object ID Number Latitude (of each voxel in objects) Longitude (of each voxel in objects) Time (hour) Precipitation Intensity (mm/hour) Each voxel must have 1mm/hr Each object must exist for 24 hours 6 voxel connections (i.e. voxel face connections) 60N-60S, 0-360 lat and long Hourly time step March 1st, 2000 to January 1st, 2011
Recent Big Data Analysis Pipeline: One Variable CHRS 2.4T Time: 7d 10h 49min 2.4T to 100GB Time: 10d 5h 05min 100GB to 50GB Time: ~1d 14h 00m 50GB to 100MB Time: ~1d 5h 00m NASA CW3E CHRS C3WE Data Visualization And Search CW3E Download Data organization Variable format CONNECT Segmentation CONNECT Characteristic Calc. Total time: ~20d 11h 0m Not including data visualization Not including data mining/machine learning jobs Assumes we know what we are doing Data mining and Discovery, Machine Learning
Pacific Research Platform (10-100 Gb/s) SDSC’s COMET Calit2’s FIONA Calit2’s FIONA
Opportunities Data Download via PRP from NASA (among others) to FIONA Using PRP network Download speed increased 4x using PRP network (40MB/s which is 4x faster than the 10MB/s standard connection previously being used by researchers at SIO). Live Thematic Real-time Environmental Distributed Data Services server (THREDDS) Set up CONNECT algorithm and workflow (and other research scripts) to harness FIONA tech, Super Computing Resources, and PRP