Siddarth Ganesan, Young Yoon, Hans-Arno Jacobsen NIÑOS Take Five: The Management Infrastructure for Distributed Event-Driven Workflows Siddarth Ganesan, Young Yoon, Hans-Arno Jacobsen
Many Workflow Management Systems IBM Websphere MQ Workflow Oracle Business Process Management Sage ERP X3 SAP Netweaver Business Process Management Symantec Workflow These are for centralized workflows 12/01/2019
Distributed orchestration Event-driven execution New breed of workflows Distributed tasks Distributed orchestration Event-driven execution Distributed deployment Decentralized coordination 12/01/2019
NIÑOS XML (wsdl,bpel) Workflow BPEL Parser Receive Deployer Assign Set of sub/advs Deployer Assign Assign Agent Inject Sub/Adv sets Flow PADRES Pub/Sub ESB WS Gateway Agent Invoke Wait Web Service Flow Agent Reply 1. Create a novel model for mapping BPEL into subscription language 2. Design lightweight agents to execute BPEL activities 3. Propose a BPEL parser to translate a business process definition into a set of subscriptions and advertisements 4. Develop a BPEL deployer to inject the mapped subscription sets 5. Define a Web service interface for PADRES Scalability Fine-grained monitoring Go across administration boundaries Correlation is handled by the underlying network Running instance v. process WS Client Execution done Receive Agent Invoke Agent New instance created Reply Agent Wait Agent HTTP/ SOAP 12/01/2019
Large-scale Business Processes Vendor Goods selection Goods delivery Dispatch B Packaging Pick-up goods Out-stock B FedEx Delivery Pick up Sale Sale prediction Sign Contract Fill order Determinate plan Process Check order CCC administrate Fill out-stock bill Check stock Manufactory Confirm features Design Fill dispatch bill Control Determinate plan Prototype Warehouse Out Take Raw materials Execute plan Material Out-stock B Pay Case Study (Chinese Electronics Manufacturer): Department-level processes with 26 to 47 activities Global processes that compose departmental ones Thousands of concurrent instances Hundreds of collaborating partners Geographically distributed Administrative boundaries Credit card Check Assign Audit Process control Make plan Target price Signature Finance Raw Check dealer Check credit Confirm Approval Approval Monitoring Feature selection Print receipt Marketing Statistic Monitor Validate Requirement collection Feedback Affirm order Chart Strategy Design Marketing Manufactory Order Payment 12/01/2019
Research Efforts Processing infrastructure design and implementation (NIÑOS) [Li et al, TWeb’10] Enforcement of declarative SLA through fluid workflow processing engines (eQoSystem) [Jacobsen et al, BPM’10, Muthusamy et al, DEBS’11 (Demo)] Conflict-free execution [Yoon et al, WWW’11] Selected key efforts on building the infrastructure, supporting performance requirement (SLA), preventing violation of functional properties. 12/01/2019
What’s missing? Management! 12/01/2019
WFMC Workflow Reference Model “Interface 5” Supervision Pause, resume, jump, skip an instance or individual tasks. Terminate a workflow or instance. Assign or update attributes in a workflow. User management Establish, delete, suspend or amend privileges or roles of users or workgroups Audit Query, print, start new or delete an audit trail or event log for monitoring purposes. Resource control Set, unset or modify concurrency levels of an entire workflow instance or its individual tasks. Interrogate resource control data such as counts, thresholds, usage parameters. status function Open & close a workflow or task query with optionally set filter criteria. Citation http://www.wfmc.org/reference-model.html. 12/01/2019
Management Mechanisms Workflow modification Pause, resume, skip Discovery Workflow status query Variable update Resource control, user privilege update 12/01/2019
Management Mechanisms in Distributed Workflow Task A Task B Task C TaskID = 1 TaskAgentID = 1 InstanceID = x ProcessID = 1 TaskID = 2 TaskAgentID = 2 InstanceID = x ProcessID = 1 TaskID = 3 TaskAgentID = 3 InstanceID = x ProcessID = 1 Management Client ADV Here is a sample workflow and its deployment on the padres esb. Task A Agent Skip Task B Pause/Resume instance 1 Fetch status of instance 1 Instance Ownership SUB Task B Agent Task C Agent Causal Relationship Queue 12/01/2019
Variable Update Accessing or modifying variables and attributes associated with workflow instances and entire workflows User management operations Resource control operations such as set, unset, and modify of concurrency levels Condition variables Distributed variable updates using variable agents (TWeb’10) 12/01/2019
Concurrent Composite Operations Management Client 1 Management Client 2 Pause instance Fetch status of instance Resume instance Amend user privilege Resume instance Time 12/01/2019
Isolation and Ordering Management Client 1 Pause instance Amend user privilege Management Infrastructure Resume instance Management Client 2 Fetch status of instance Resume instance Fetch status of instance Pause instance Amend user privilege Resume instance Resume instance 12/01/2019
. . . . . . Elastic! Operation Agent Manager (OAM) Pending Running Management Client Queue Length Monitor Management Client Operation requests . . . Dispatch operations Management Client Pending Running Conflicting . . . Operating Agent (OA) OA OA Management Cluster Elastic! Contact task agents TA Task Agent (TA) TA TA TA Distributed task agents
Initial Experiment Setup OA OA MC TA n3 n2 n1 TA TA TA OA OA OAM OA OA 12/01/2019
Overhead of the Management Infrastructure There was an additional overhead of 50 ms for modification operations when compared to discovery operations 12/01/2019
Elastic Management Infrastructure With elastic management cluster response time remained constant 12/01/2019
Dynamic Migration n1 n2 n9 n18 . . . . . . n20 n19 TA MC TA OA OA OA OAM 12/01/2019
Benefit of Dynamic Migration The OAs and the OAM are dynamically relocated based on the workload and the network topology to minimize the average delay. X axis scale?? 12/01/2019
Summary Introduced runtime control (management) of distributed event-driven workflows for the first time Observed mechanisms for primitive management operations Identified the isolation and ordering problems of concurrent composite operations Devised a distributed management infrastructure that is elastic and fluid 12/01/2019
Thank You 12/01/2019