Presentation is loading. Please wait.

Presentation is loading. Please wait.

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC.

Similar presentations


Presentation on theme: "San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC."— Presentation transcript:

1

2 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC Matrix San Diego Supercomputer Center SDSC Tech Talk SDSC, UCSD

3 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 2 Talk Outline Where do we need this? Infrastructure-based Execution logic (Concept?) Matrix Project Overview (Who?) Data Grid Language and Programming Gridflow Runnable (flowable) Flow Gridflow Metadata ECAA rules Other benefits What Next – Straight Talk

4 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 3 Data handling pipeline (data  information pipeline) Metadata derivation Ingest Metadata Ingest Data Determine analysis pipeline Initiate automated analysis Organize result data into distributed data grid collections Use the optimal set of resources based on the task – on demand Pipeline could be triggered by input at data source or by a data request from user All gridflow activities stored for data flow provenance

5 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 4 Generic Gridflow Scenario Application X, Application Y, Application Z May be different programming languages, programmers, different execution environments May be in different grid domains (sites) Pass data between each other during their execution SDSC Note: Might use a data grid environment that works!

6 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 5 Example for Generic Gridflow 1.Ingest 1 million URLs into digital library using URL Ingestor (or harvestor – App X) 2.For each URL iterate with 5 parallel execution 1.Do some processing on the file (App Y) 2.Store the output file from App Y in a grid disk resource 3.Replicate a copy of same file in a grid archive resource 4.Calculate MD5 checksum (App Z) for file in disk 5.Calculate MD5 checksum (App Z) for file in archive 6.If checksums mismatch, ingest a metadata warning flag For each If checksums mismatch Pattern Gridflow metadata processing Rules Late binding

7 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 6 Traditional way Write a customized program Create a common program that can invoke the distributed or localized applications using appropriate client code Hardwire all the apps (X, Y, Z) together Have this customized program as the delegator invoking all other applications Declare the necessary variables, implement the rules/conditions also [like the checksum1 == checksum2]

8 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 7 Why take the Gridflow approach? What if scenarios… The infrastructure can run more or less things in parallel The cyber-infrastructure has more resources for distribution (An app can be run at multiple places for different parameters – parameter sweep distribution) Different meta-data conditions or milestone Run this till the molecule changes from green to red (or yellow) Change in the sequence of execution it self (New app) Process provenance is required Any ways, you are not coding/changing your application to fit into the gridflow environment (It’s the other way around) – Make simple changes only in the execution logic…

9 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 8 Infrastructure-based Execution Logic Each gridflow has different executables App X, App Y, App Z – Runnable or “Flowable” How should these flowables be run? Parallel, Sequential, for-each input item (pipeline), while, switch Capture this as a Flow Is there a condition Run till exit value = 0 or till molecule color changes to red Are there metadata variables? (color) Describe this Execution Logic Separately Loosely coupled, modified without compilation Use a XML based language

10 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 9 That is why we started Matrix Project Movie break …. Language to describe and execute this Infrastructure-based Execution Logic Software to design, query, run this logic

11 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 10 “Flowable” Any thing that can Run in a gridflow Not using Runnable (java) as its taken in Thread paradigms Any App (single execution of App X, Y, Z) Any SRB based data grid step (to handle data)

12 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 11 “Flowable” in java ExecuteProcessStep executeMD5 = new ExecuteProcessStep("executeMD5-Metadata", "md5"); executeMD5.setStdOut(new StreamData("$md5Sum", false)); executeMD5.addParameterAsExpression("$locatio nOfFile");

13 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 12 “Flowable” in DGL md5 $locationOfFile $md5Sum

14 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 13 Data Grid Language (DGL) XML based gridflow description Describes execution flow logic ECA-based rule description for execution ECA = Event, Condition, Action Querying of Status of Gridflow XQuery / Simple query of a Gridflow Execution Scoped variables and gridflow patterns For control of execution flow logic

15 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 14 Gridflow Patterns These basic things can be combined together E.g. Execute all 9 flowables in parallel Switch based on color: Red: App X Green: App Y Gridflow Patterns Sequential, Parallel, For-Each-Parallel, For-each- sequential, Switch, While / MileStone processing

16 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 15 Gridflow Pattern in Java // forEach file in the collectionList, do some processing ForEachFlow forEach = new ForEachFlow("forEachFlow", "file", new CollectionList("$collectionList")); // could also say how many files to be handled in parallel // A DGL (XML) code would be generated

17 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 16 Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements

18 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 17 Gridflow Variable in Java /* create a variable called "collectionList" with an initial value of "empty“. this variable is a string now, but will later be used to hold a CollectionList. This is ok to do because variables are dynamically typed in DGL */ processFilesFlow.addVariable("collectionList", "empty");

19 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 18 Data Grid Request Annotations about the Data Grid Request Can be either a Flow or a Status Query

20 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 19 DGL Requests Data Grid Flow An XML Structure that describes the execution logic, associated procedural rules and grid environment variables Status Query An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level A DGL or Matrix client sends any of these to the Matrix Server

21 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 20 Grid User Matrix-demo sdsc ****** /home/Matrix- demo.sdsc sdsc- unix 0 arun@sdsc.edu

22 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 21 VO Info

23 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 22 SDSC Matrix Project CS Research & Development Gridflow Description, Data Grid Administration Rules Gridflow P2P protocols for Gridflow Server Communication Development SRB Data Grid Web Services SRB Datagrid flow automation and provenance Theory  Practice Help in customized development & deployment of gridflow concepts in scientific / grid applications Visibility and assist in standardization of efforts at GGF

24 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 23 Matrix Gridflow Server Architecture Matrix Agent Abstraction In Memory Store JDBC Agents for java, WSDL and other grid executables Persistence (Store) Abstraction ECA rules Handler Matrix Data Grid Request Processor Transaction Handler Status Query Handler Gridflow Meta data Manager JAXM Wrapper SOAP Service for Matrix Clients Flow Handler and Execution Manager Workflow Query Processor XQuery Processor JMS Messaging Interface Event Publish Subscribe, Notification SDSC SRB Agents Other SDSC Data Services WSDL Description Sangam P2P Gridflow Broker and Protocols

25 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 24 Matrix Folks (Emeritus) Jonathan Weinberg Daniel Moore Allen Ding Reena Mathew Erik Vandekieft Don’t you guys have a group picture?

26 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 25 SRB Java Folks Luke - Jargon Man One man development team Says he works on strategies for SRB Java software Hey, The guy on right is all talk and no walk

27 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 26 Advantages from SRB Perspective Reduces the Client-Server Communication The whole execution logic is sent to the server Less number of WAN messages Our experiments prove significant increase in performance Datagrid Information Lifecycle Management Autonomic: “Move data at 9:00 PM in weekdays and in week ends” Data Grid Administration Power-users and Sophisticated Users Data Grid Administrator (Rules to manage data grid) Scientist or Librarian (Visualized data flow programming)

28 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 27 Using DG-Modeler GUI for dataflow programming

29 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 28 Gridflow Process I (Vision) End User using DGBuilder Gridflow Description Data Grid Language

30 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 29 Gridflow Process II (Vision) Abstract Gridflow using Data Grid Language Planner Concrete Gridflow Using Data Grid Language

31 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 30 Gridflow Process III (Vision) Gridflow P2P Network Gridflow Processor Concrete Gridflow Using Data Grid Language

32 San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida got ideas/suggestions? Contact: SDSC Matrix project arun@sdsc.edu Google key word: SDSC Gridflow Click here to start the slide show again


Download ppt "San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC."

Similar presentations


Ads by Google