agINFRA science gateway for workflows and integrated services 07/02/2012 Robert Lovas MTA SZTAKI
Why workflows? ‘Orchestration of Tasks’ (not only in sequence; organized in Direct Acyclic Graphs) Data-driven To exploit large computational resources / process large data sets To make the complex applications run faster –By applying paralellization techniques on them Paralellization Techniques: –Indepent tasks can be executed concurrently –Execute tasks against LARGE datasets parameter study, domain decomposition, etc.
Parameter study / domain decomposition GEN SEQ COLL SEQ Generates input parameter space Evaluates the results of the simulation Parameter sweep jobs 3
Liferay-based WS-GRADE/gUSE portal for agINFRA: – –Open Registration for project participants –X509 Certificate authentication required to be able to submit jobs NEW: robot certs –agINFRA VO is accessible (ca CPU cores, >50 TB storage)
Access modes 5 ASM API WS-PGRADE UI Customized UI Other, existing UI gUSE Workflow engine agINFRA VO Volunteers’ computers
Workflow building blocks (glossary) “Jobs” operating on data Jobs can be: –Grid-enabled applications (e.g. AgrovocTagging) –Web-services –NEW: REST services –Another workflows (embedded) “Ports” representing inputs and outputs for the jobs –Available port types: Value Local file e.g. from the scientists laptop Remote File (gsiftp, lfc) in the Grid Database Queries (SQL) …. –Extensions or improvement might be required (e.g. Drupal / Dublin Core / SPARQL / CIARD RING support?) 6
agINFRA overview
Services to be integrated Harvesting, validation, transformation
The Organic.Edunet Ingest Workflow
Schematic Representation of the AGRIS workflow
Cross-community workflows identified at the Athens
DEMONSTRATION I.
AgroTagger
First demo application
First demo application - details
Job details
Inputs and outputs
Monitoring of execution 18
Successful execution of AgrovocTagging application
DEMONSTRATION II. Harvesting workflow
ARIADNE aggregation panel - Select scheduling Link to the workflow interface -Invoke the workflow -Check the status of the workflow -Stop the workflow -Add metadata for the aggregation ARIADNE aggregation panel - Select scheduling Link to the workflow interface -Invoke the workflow -Check the status of the workflow -Stop the workflow -Add metadata for the aggregation gUSE WS-PGRADE Harvesting - Add parameters of agDataHarvesters web service Harvesting - Add parameters of agDataHarvesters web service Metadata Validation vs target schema - Add parameters of agMetadataValidation web service Metadata Validation vs target schema - Add parameters of agMetadataValidation web service Target Validation - Add parameters of agTargetValidation web service Target Validation - Add parameters of agTargetValidation web service Target schema? No Stop the process for the specific target send message or store logs and send them through the gUSE API Yes Transformation - Get parameters for the transformation and invoke agMetadataTransformation web service Transformation - Get parameters for the transformation and invoke agMetadataTransformation web service No Store metadata on the GRID (agINFRA VO) Store metadata on the GRID (agINFRA VO) Yes Valid? starting a pre-defined procedure as an agINFRA workflow
Integration of multiple components/services
Details in demo…
Plan: Continue the development of aggregation workflow
Plan: Further integration + volunteers…
07/02/2012 Questions? 26