Making Climate Change Data Easier to Find and Use Michael Corsello Seshu Vaddey
Climate Change is a Paradigm Shift
Climate Change is a Paradigm Shift
Climate Change is a Paradigm Shift
Otherwise We are using old analytical techniques Designed for an old paradigm Being applied to a new paradigm of problems
Example You get new Climate Change data
Example What’s the first thing you do?
Example Try to put it into excel
Take a closer look at Climate Change data UW CIG CBCCSP 2 emission scenarios 10 GCM’s 3 downscaling methods From available total of 6 emission scenarios 23 GCM’s Multiple Approaches
Take a closer look at Climate Change data Total Size of Data Produced ~32 TB % of Total
Take a closer look at Climate Change data Total Size of Data Produced ~32 TB % of Total Individual hydrologic projection (297 sites) ~1.3 GB %
Take a closer look at Climate Change data Total Size of Data Produced ~32 TB % of Total Individual hydrologic projection (297 sites) ~1.3 GB % Hydrology (297 Sites, All Projections)) ~18.5 GB 0.06 %
Take a closer look at Climate Change data Total Size of Data Produced ~32 TB % of Total Individual hydrologic projection (297 sites) ~1.3 GB % Hydrology (297 Sites, All Projections)) ~18.5 GB 0.06 % Temp & Precip data (2 of 21 parameters) Monthly Grids (all HD projections) Daily Grids (all HD projections) ~65 GB ~2.4 TB 0.20 % 7.5 %
Take a closer look at Climate Change data Total Size of Data Produced~32 TB% of Total Individual hydrologic projection (297 sites) ~1.3 GB0.004 % Hydrology (297 Sites, All Projections)) ~18.5 GB0.06 % Temp & Precip data (2 of 21 parameters) Monthly Grids (all HD projections) Daily Grids (all HD projections) ~65 GB ~2.4 TB 0.20 % 7.5 % Daily total precipitation Daily average temperature Daily maximum temperature Daily minimum temperature Outgoing longwave radiation Incoming shortwave radiation Relative humidity Vapor pressure deficit Daily evapotranspiration Daily Runoff Daily Baseflow Soil Moisture, Layer 1 Soil Moisture, Layer 2 Soil Moisture, Layer 3 Snow water equivalent Snow depth Potential Evapotranspiration 1 Potential Evapotranspiration 2 Potential Evapotranspiration 3 Potential Evapotranspiration 4 (alfalfa) Potential Evapotranspiration 5
Working with Climate Change data The Challenge Volume of data swamps Cyber Infrastructure Steep learning curves to use new tools Tools are always changing
Enter the Web and Cloud computing Software as a Service Platform as a Service Infrastructure as a Service
Enterprise Data Management Move away from data living on our computers
Enterprise Data Management The data and tools / applications now reside on servers (Cloud) The data is now more crucial than ever We all “share” common sets of data “through” the cloud
Enterprise Data Management The data and tools / applications now reside on servers (Cloud) The data is now more crucial than ever We all “share” common sets of data “through” the cloud
Summary The need for a paradigm shift In how we work This new paradigm must provide for Ease of use, and value to the organization (Return on Investment) CRF is working towards this goal We need users across different domains to work with us
Questions? Blog: Breakout Discussion Session Wednesday at 10am
CRF Developed Solution
CRF Developed Solution Develop series of database structures Based upon “real-world things” (like flows)
CRF Developed Solution Organize these structures into separate databases for each “domain aspect” Rather than a single monolithic database.
CRF Developed Solution Cloud Based Data Warehouse
Maximize Value of Climate Data
The real challenge with CC data is keeping track of metadata Metadata is data about data What about the metadata for the metadata? Can the metadata be data itself? There is no real “metadata” It’s all about perspective Metadata from one perspective is data in another The data model is the key
Metadata Examples An important form of metadata is “chain of custody” (provenance) Talks about the process by which data originates What processing methods were used? What was the source data? Who did the work? Another important form of metadata is descriptive When was the sensor last calibrated? What was the nominal error as defined by the manufacturer? What is the temporal nature of the data (does it “expire”)? What about licensing info? Metadata can often be “linked” rather than “stored”
The real Challenge with Climate Change? We want the ONE true answer to Climate Change The rest of the data is meaningless Because the paradigm we work with is deterministic We have a hard time dealing with uncertainty
Cloud Computing Basics Move computing from device oriented to resource oriented Give me enough computing resources to get an answer I don’t care where Software as a Service Software is delivered as an online service Salesforce.com, Mint.com, Office 365 Platform as a Service A software platform (e.g. Sharepoint, Drupal) is provided as a service Your agency customizes the platform to your needs Infrastructure as a Service You rent “virtual machines” and set them up as you see fit Basically a “virtual” computer Add or remove machines “on- demand”
Data Models
Workflows More data to manage as we create more data All of our “final” data Much of our “working” data
Workflows Management translates to Ease of Access to Data Analysis / Modeling with Data Results & Reporting Store Results for future use
CRF Developed Solution Developed Web and Desktop Tools to Access the Database(s)