Earth Science Datacasting Informed Pull of GHRSST data using Really Simple Syndication (RSS) Earth Science Datacasting Informed Pull of GHRSST data using Really Simple Syndication (RSS) Andrew Bingham Bob Deen, Kevin Hussey Tim Stough, Sean McCleese, Nick Toole Andrew Bingham Bob Deen, Kevin Hussey Tim Stough, Sean McCleese, Nick Toole Funded through NASA ROSES/ACCESS Program
Overview Overview of RSS and podcasting Datacasting approach Demo GDAC use and request to data providers to consider setting up Datacasting Feeds Overview of RSS and podcasting Datacasting approach Demo GDAC use and request to data providers to consider setting up Datacasting Feeds
The Datacasting Concept Enable discover and/or automatic download of only the data files that meet a predefined need Need all data for a specific event (flood, fire, storm, oil spill etc) Need only cloud free data Need only data that meet (or fail) a quality threshold Datacasting allows users to easily and simply get at the data they need to complete a specific task: Eliminates costly manual data searches Automatically download data that is needed Provides automated data thinning and consequently reduces data volume Enables subscriptions Builds on common web-based standards (RSS) A possible solution for integrating and correlating disparate information sources Enable discover and/or automatic download of only the data files that meet a predefined need Need all data for a specific event (flood, fire, storm, oil spill etc) Need only cloud free data Need only data that meet (or fail) a quality threshold Datacasting allows users to easily and simply get at the data they need to complete a specific task: Eliminates costly manual data searches Automatically download data that is needed Provides automated data thinning and consequently reduces data volume Enables subscriptions Builds on common web-based standards (RSS) A possible solution for integrating and correlating disparate information sources
Really Simple Syndication (RSS) NYT > Home Page New York Times > Breaking News, World News en-us Copyright 2007 The New York Times Company Sun, 21 Oct :05:01 GMT NYT > Home Page Tighter Border Delays Re-entry by U.S. Citizens After decades of being waved through, Americans returning from Mexico are increasingly being checked and questioned like foreigners, leading to wait times of two hours or more. JULIA PRESTON Sun, 21 Oct :55:12 GMT In Myanmar, Fear Is a Constant Companion Beneath the ominous calm that has settled since the recent uprising is anger, uncertainty and hopelessness. CHOE SANG-HUN Sun, 21 Oct :57:29 GMT ` RSS is a family of web feeds for publishing frequently updated content such as blog entries, news headlines or podcasts
Aggregation Services
Podcasting All About Everything en-us A show about everything John Doe A show about everything Shake Shake Shake Your Spices John Doe A short primer on table spices This week we talk about salt and pepper shakers Wed, 15 Jun :00:00 GMT 7:04
Datacasting Approach Datacasting contributes to the RSS family through publishing of updated Earth Science data streams Scenario Data providers create RSS feeds for a data stream - update each time a new data file is available Feed uses the RSS standard, with extensions that describe The web location of the file for downloading The description & structure of the files for data extraction The content & meaning of the data for filtering Users subscribe to RSS feeds and are notified when new data files are available Users create “filters” based on a need, which in turn selects and download the files After the file is on the users hard drive, the user has the option to visually inspect the data Datacasting contributes to the RSS family through publishing of updated Earth Science data streams Scenario Data providers create RSS feeds for a data stream - update each time a new data file is available Feed uses the RSS standard, with extensions that describe The web location of the file for downloading The description & structure of the files for data extraction The content & meaning of the data for filtering Users subscribe to RSS feeds and are notified when new data files are available Users create “filters” based on a need, which in turn selects and download the files After the file is on the users hard drive, the user has the option to visually inspect the data
RSS Feeds for Datacasting GHRSST-L4 Hurricane Sea Surface Temp Daily Sea Surface Temperatures from GHRSST for Hurricanes GHRSST: UKMO's OSTIA, NCDC's AVHRR and AMSR+AVHRR Wed, 17 Oct :52:12 GMT Datacasting Feed Publishing Tools NARI at :00 GMT Mon, 17 Sep :00:00 GMT Mon, 17 Sep :59:00 GMT <enclosure url=" 12W.nc.gz" length="299728" type="application/x-gzip”/ OSTIA12W.jpg Wed, 19 Sep :55:00 GMT PO.DAAC Hurricanes GHRSST-L4 Hurricane Sea Surface Temp Daily Sea Surface Temperatures from GHRSST for Hurricanes GHRSST: UKMO's OSTIA, NCDC's AVHRR and AMSR+AVHRR Wed, 17 Oct :52:12 GMT Datacasting Feed Publishing Tools NARI at :00 GMT Mon, 17 Sep :00:00 GMT Mon, 17 Sep :59:00 GMT <enclosure url=" 12W.nc.gz" length="299728" type="application/x-gzip”/ OSTIA12W.jpg Wed, 19 Sep :55:00 GMT PO.DAAC Hurricanes Dataset info & definition of custom metadata Granule info (including custom metadata values)
System Configuration Data provider creates XML feeds and provides access to files using the Datacasting Feed Publishing software Users subscribe and download relevant files using the Datacasting Feed Reader software Data provider creates XML feeds and provides access to files using the Datacasting Feed Publishing software Users subscribe and download relevant files using the Datacasting Feed Reader software
Datacasting Tools Publishing Software A set of easy to use, portable, Python based tools for publishing your data Text based configuration file specifies location of repository, data format, and other information for the feed Publishing Software A set of easy to use, portable, Python based tools for publishing your data Text based configuration file specifies location of repository, data format, and other information for the feed Feed Reader Uses an reader type interface Subscribes to feeds Filters available data files and downloads only what the user needs Stores and manages files for later use Previews data for interactive selection Feed Reader Uses an reader type interface Subscribes to feeds Filters available data files and downloads only what the user needs Stores and manages files for later use Previews data for interactive selection
Rich Metadata Filtering is a function of the metadata Some metadata are part of the core Datacasting XML specification, e.g. Location or Extent Acquisition or Start time Other metadata can be defined by the data provider and included in the XML feed, for example: % of cloud free pixels Min, mean & max of data parameter Event information capture in the data (algae blooms, hurricanes, fires……) Quality control parameters Filtering is a function of the metadata Some metadata are part of the core Datacasting XML specification, e.g. Location or Extent Acquisition or Start time Other metadata can be defined by the data provider and included in the XML feed, for example: % of cloud free pixels Min, mean & max of data parameter Event information capture in the data (algae blooms, hurricanes, fires……) Quality control parameters
Demo
Feed Reader start up
Acquire Datcasting Feed URL
Add Datacasting Feed
Interrogate feed
Add RSS feed
Create a filter
Reduced list based on need – configure to download file
Combine other Datacasting Feeds
Mashup - Combining data and information from more than one source into a single integrated tool - Creating a new and distinct web service that was not originally envisaged by either source
Now What? Need data providers to create Datacasting Feeds Need suggestions on what metadata should be included in the GDAC feeds Current: Time, space extents % of valid pixels Max & Min SST, AOD, SSI, Wind Speed, Deviation from previous day SST Anything else? Quality parameters? E.g. GMPE or Pixie metrics Statistical parameters that summarize the granule content Need data providers to create Datacasting Feeds Need suggestions on what metadata should be included in the GDAC feeds Current: Time, space extents % of valid pixels Max & Min SST, AOD, SSI, Wind Speed, Deviation from previous day SST Anything else? Quality parameters? E.g. GMPE or Pixie metrics Statistical parameters that summarize the granule content