Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla 14 September 2006
LNO NIS Table of Contents Background System Architecture System Workflow and Architecture Details Demonstration Screen Examples
LNO NIS Message from IMExec - Feb 2006 “IMExec suggests that this activity be used to scope and determine the feasibility of using EML in the development of NIS modules for solving general synthesis problems.” “The premise of this project is that EML will adequately describe the data set (e.g., entities, attributes, physical characteristics) to allow the capture of distributed data sets into a central SQL database.” “Determining the nature of this model for dynamic data delivery – whether it is more site-loaded or more (network) service-loaded – is critical.” “IMExec suggests that the near-term Trends NIS module activity be focused on development of a prototype for demonstration at the ASM in September.”
LNO NIS Prerequisites Site data is documented with “rich” and “complete” EML Time-series data must be captured as “snap shots” for EML temporal coverage – i.e., no “continuous end date” Site data is open and accessible through a standard protocol such as HTTP Site EML documents are harvested on a regular basis into the LTER Metacat
LNO NIS What is EML? Ecological Metadata Language is… An ecological metadata standard Very extensible; it can be used to describe many different types of data Comprehensive and supports a rich set of constructs to fully describe data including –how to access distributed data –its logical and physical structure Defined by an XML Schema For further information: –
LNO NIS What is Metacat? Metacat is… A storage system for metadata and data (optimized for use with EML) Built on top of relational database system using Java servlets Requires metadata to be in XML format Provides a customizable web interface Support point-to-point replication For further information: –
LNO NIS Trends Data Store Architecture Source A Source B Source C EML Dataset Registry 1 ̊ f(x) 2 ̊ HTML SOAP EML Factory - Derived Metadata - Source Provenance - Integration Methods - Trends Contact EML Parser/ Loader Metacat/ Harvester EML.xml Trends Metadata Primary Database (source data) Secondary Database (derived data) Data Integration/ Transformation Trends Data Warehouse Store Front
LNO NIS Generalized Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS Decomposed Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS LTER Site Data Collection Time-series data –Physical environment (e.g., climate, …) –Human population and economy –Biogeochemistry –Biotic structure Data/metadata –Relational Database –Spreadsheet –Text file –HTML/XML
LNO NIS Generalized Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS EML, Metacat, and the Harvester EML Package ID knb-lter-site.XX.YY knb-lter-sev knb-lter-sev knb-lter-sev Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted Harvester is a time-based update process to “pull” site EML and inserts into Metacat Source A Source B Source C EML Metacat/ Harvester “independent of the Trends Project”
LNO NIS Generalized Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS EML Loader/Parser Dataset registry identifies Trends data in Metacat New revisions assert a “new” data load. The EML parser/loader –Translates the site EML into the RDBMS DDL –Creates a new DB table in the primary database based on the revision –Loads the new data into the primary database –Trigger to continue workflow Source A Source B Source C EML Dataset Registry 1 ̊ EML Parser/ Loader Metacat/ Harvester
LNO NIS Generalized Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS Data Transformation Primary DB (1°) stores site data in native schema Transformation module reads native schema, performs transformation/integration, and writes to global schema Secondary DB (2°) stores derived data in consistent global schema 1 ̊ f(x) 2 ̊ MCM Canada Glacier Wind date_time Timestamp of observation 15 min interval wdir Wind direction (azimuth) wdirstd Standard deviation of wind direction wspd Wind speed meters/second wspdmax Maximum wind speed meters/second wpsdmin Minimum wind speed meters/second Wind direction (knb-eco-trends.1.1) Timestamp (daily)value Wind direction std dev (knb-eco-trends.2.1) Timestamp (daily)value Wind speed max (knb-eco-trends.5.1) Timestamp (daily)value … “triggered by data load”
LNO NIS Global Schema knb_eco_trends_1_1 scope identifier revision
LNO NIS Generalized Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS EML for the “derived” EML Factory generates EML metadata for the derived data and inserts into Metacat Derived data is now accessible through the Metacat user interface EML 2 ̊ EML Factory - Derived Metadata - Source Provenance - Integration Methods - Trends Contact Metacat/ Harvester EML.xml Trends Metadata
LNO NIS Generalized Workflow 1.Sites collect and document time-series data (e.g., climate, social-economics, …) 2.Sites update EML with a new revision 3.EML is harvested into Metacat 4.EML Loader/Parser loads new/updated dataset into primary database 5.Data integration/transformation converts “raw” data into “derived” data 6.Derived data is stored in secondary database 7.EML is generated for derived data and is stored in Metacat 8.Derived data is made available to store front
LNO NIS Store Front Store Front provides API to derived data products in secondary DB HTML – today Web service – tomorrow Issues: –Authentication –Authorization –Provenance –Quality –Interactive Plots 2 ̊ HTML SOAP Store Front (beta site location)
LNO NIS HTML Store Front (evolution in progress)
LNO NIS Animated Workflow Source A Source B Source C EML Dataset Registry 1 ̊ f(x) 2 ̊ HTML SOAP EML Factory - Derived Metadata - Source Provenance - Integration Methods - Trends Contact EML Parser/ Loader Metacat/ Harvester EML.xml Trends Metadata Store Front Step 1 Step 2 Step 3 Step 4 Step 5 Step 6
LNO NIS Thank You – The End