Workflow Management in GridMiner Günter Kickinger, Jürgen Hofer, Peter Brezany, A Min Tjoa Institute for Software Science University of Vienna The 3rd Cracow Grid Workshop
Outline Overview The Knowledge Discovery Process GridMiner Architecture Collaboration of Services Workflows Dynamic Service Composition
Overview GridMiner –Service-oriented grid-aware data mining system –cope with very large data sets high dimensional data sets geographically distributed data sets different types of data sets –implemented on top of Globus Toolkit 3.0
DWH Knowledge Cleaning and Integration Selection and Transformation Data Mining Evaluation and Presentation The Knowledge Discovery Process
GridMiner Architecture GMMS Mediation GMPPS Pre Processing GMDMS Data Mining GMPRS Presentation GM DSCE Dynamic Service Control GMDIS Integration GMOMS OLAM GMIS Information GMRB Resource Broker GridMiner Core GMCMS OLAP / Cubes GridMiner Base GridMiner Workflow Grid Core Services Security File and Database Access Service Replica Management Grid Core Grid ResourcesData Source Fabric
Collaboration of GM-Services GMPPS Pre Processing GMDMS Data Mining GMDIS Integration GMPRS Presentation Data Sources Intermediate Result 1 Intermediate Result 2 (e.g. “flat table”) Intermediate Result 3 (e.g. PMML) Final Result Simple Scenario:
Collaboration (2) GMDIS GMPPS GMDMSGMPRS GMPPS GMDMS GMPRS Complex Scenarios: GMDMSGMPRS GMDIS GMPPS GMCMSGMOMSGMPRSGMPPS
Workflow Management Motivation –high complex and dynamic process order of service execution selection of services sequential and parallel execution –long running process termination of client would terminate the workflow => Additional workflow layer needed !
Workflow Models Static workflowsDynamic workflows
Dynamic Workflows DSCE Service AService B Service C Service D DSCL Dynamic Service Control Language (DSCL) –based on XML –easy to use Dynamic Service Control Engine (DSCE) –processes workflow according to DSCL
Dynamic Service Control Language Features –Control flow parallel execution of activities sequential execution of activities –Activities creation of new Grid Service Instances invoking operations on Grid Service Instances Querying SDEs of Grid Service Instances assigning and copying variables
DSCL - Example variables composition dscl qreate Service invoke query SDE qreate Service invoke query SDE qreate Service invoke
Dynamic Service Control Engine Features –processing of a DSCL document –parallelism –hiding complexity –delivery of intermediate results –status of executed service –Caching mechanism included
Dynamic Service Control Engine Implementation –transient stateful OGSA Grid Service –Operations updateDSCL() start() stop() resume() –SDE activities –results, failures, states for each activity
DSCE - Architecture Service InterfaceFactory Interface DSC Engine DGS Invocation Dynamic Invoker Axis 1.1 Globus 3.0
Current and Future Work This is work in progress Additional Features –Notification Model –Exception Handling
Related Work BPEL4WS: Business Process Execution Language (BEA, IBM, Microsoft, SAP, Siebel) GSFL: Grid Services Flow Language (Krishnan, Wagstrom, Laszewski) Data mining. Concepts and Techniques (Han) Anatomy of the Grid (Foster, Kesselman, Tuecke) Physiology of the Grid (Foster, Kesselman, Nick, Tuecke) Open grid service infrastructure (Tuecke, Czaijkowski, Foster)
Conclusions Dynamic Service Control is an approach allowing the service consumer specify a workflow General approach – not only restricted to GridMiner