San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida SDSC Matrix Project: A Passionate Workflow towards Scientific Perfection Arun Jagatheesan Architect and Team Lead, SDSC Matrix Project San Diego Supercomputer Center (SDSC) Super Computing Conference 2003, Exhibit at SDSC Booth November 18 Phoenix, Arizona
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 2 Credit / Acknowledgements Participants Allen Ding Jonathan Weinberg Lucas Gilbert Reena Mathew Xi Cynthia Sheng Well Wishers (They had the Matrix red pill) Reagan Moore ( & SRB Team) SDSC DAKS (Big Team, Big Support !) Kim Baldridge YOU Sponsors NSF GriPhyN, NSF SCEC, NPACI REU, NIH BIRN
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 3 Talk Outline Workflow Requirements for Grid Workflow Data Grid Language Matrix as a WfMS Demonstrations XQuery (CDL) External Status Requests
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 4 Workflow Automation of business process Whole or Part Documents/Information or tasks passed between participants Based on a set of procedural rules Scientific Computing Workflow Computational research process as pathways or pipelines Gather data, cleanse data, apply different combinations of transformations, simulations, visualization, publish in digital library, archive data, get Nobel prize (makes us also happy :-)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 5 What is needed for Grid Workflow Yet Another Standard XML language Describe import and export of Workflow in Grid Peer-2-Peer Collaboration for Workflows Looping Structures Scientific Workflows Iterations over millions of data sets Generic System Multiple Domains: Bio, Physics, Digital Libraries… Dynamic Status Queries Dynamic and robust execution based on prior executions Grid Service Handles to Query, Publish or Subscribe XQuery subset - Uniform query for data and process You too Arun? ( Becoming Anti-standards by issuing a new standard ) – But, we need it
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 6 What is needed for Grid Workflow II Granular Metadata Context-based workflow, with control-based constructs Context Based flows Apart from being just Control based Sequential, Parallel, Multiple Split, Conditional, … Dynamic rule (ECA rules) to update milestones Grid Data Types Support to have Schema to describe data sets, collections Inbuilt support to describe Grid Locations
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 7 Grid Workflow Process I End User Workflow Description Data Grid Language
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 8 Grid Workflow Process II Abstract Workflow Data Grid Language Concrete Workflow Planner Post-presentation comment (based on questions asked): We are not implementing this planner now. We are implementing the DGL parser, DGL Query interpreter in the Matrix server to manage the workflow state for grid workflows. We are also implementing the protocols for the P2P workflow on the grid.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 9 Grid Workflow Process III Concrete Workflow Export Workflow to Matrix P2P Grid Workflow Processor
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 10 Matrix Server Acts as a Peer in WfMS P2P System * Processes Data Grid Requests Can maintain state an manage process steps Can invoke SRB data grid processes, OGSA- Services, WSDL Services (OGSA Threads to be implemented) Implemented as an Open-source Project * Being Designed/developed as of the presentation date
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 11 Implementation Status Data Grid Language Schema for basic workflow constructs, Data Grid Operations Matrix agents for executing data grid requests Basic process pipeline management Data Grid Language: Rules, Embedded query, OGSA operations to be added Matrix: P2P, export/sharing of workflow to be added
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 12 SDSC Matrix Architecture Matrix Agent Abstraction In Memory StoreJDBC OGSA Agent WSDL Agent Persistence (Store) Abstraction Termination Handler Matrix Data Grid Request Processor Transaction Handler Status Query Handler Data flow pipeline Meta data Manager JMS Messaging System JAXM Wrapper OGSARPC-Style for SOAP SOAP Service Wrapper Abstraction Flow Handler and Execution Manager Pipeline Query Processor XQuery Processor Event Publish Subscribe, Notification SRB Agents Other Data Services
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 13 Data Grid Request (DReq) Datagrid Request Asynchronous requests for data/process-flow in datagrids Requests are either a Transaction or a Status Query Each Transaction consists of one or more Flows Each Flow consists of one ore more datagrid operations Datagrid operation = data transformation or data query A flow can be executed sequential or parallel
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 14 Data Grid Request Remind me to show the new Matrix 3.0 Schema
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 15 Data Grid Response Datagrid Response Either Transaction Acknowledgement or Status Response Status Response contains the results of a Transaction Response could be received at any granular level Status response is used for coordination of flows and inter-process notifications
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 16 Data Grid Response (DRes) Remind me to show the new Matrix 3.0 Schema
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 17 Conclusion Data Grid Language Grid Workflow Description Basic Stuff or foundation ready Solid Design to handle more complex stuff Workflow Modeling not investigated (like Ptolemy?) Matrix Server Implementation Create, Query, Manage Grid Workflows OGSA, Rules, P2P to be implemented More Support will expedite R&D
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida 18 Demos ? He is trying to escape. Where are the Demos?