X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China An Effective Framework for Handling Recoverable Temporal Violations in Scientific Workflows Xiao Liu 1, Zhiwei Ni 2, Zhangjun Wu 2, Dong Yuan 1, Jinjun Chen 1, Yun Yang 1 1 SUCCESS ( Centre for Computing and Engineering Software Systems ), Swinburne University of Technology Melbourne, Australia 2 Institute of Intelligent Management, Hefei University of Technology Hefei, China
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Outline > Background – Workflow Technology Group – SwinDeW Family, SwinGrid, SwinCloud > Brief Overview: Workflow Temporal QoS Support > Handling Temporal Violations in Scientific Workflows – Problem Analysis – An Effective Light-Weight Handling Framework – Two-Stage Local Workflow Rescheduling Strategy > Evaluation > Summary 2
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Workflow Technology Group Overview >WT group is a part of SUCCESS ( Centre for Computing and Engineering Software Systems), a Tier-1 university research centre at Swinburne University of Technology. Our group conducts research into workflow technologies for complex software systems and services including peer-to- peer, grid, and cloud computing based e-science, e-business, transactional and inter-organisational workflows. 3 Leader: Prof Yun Yang Visitors (7-8/09): Prof Lee Osterweil Prof. Lori Clarke Researchers: Dr Jinjun Chen (Senior Lecture) Xiao Liu (PostDoc) Dong Yuan (PhD) Gaofeng Zhang (PhD) Wenhao Li (PhD ) Dahai Cao (PhD) Xuyun Zhang (PhD) Others: Prof Ryszard Kowalczyk Prof Chengfei Liu Dr Jun Yan (Wollongong) Prof Hai Jin (HUST) Prof Mingshu Li (ISCAS) Prof Qing Wang (ISCAS) Prof Zhiwei Ni (HFUT) Prof Jinpeng Huai (BUAA)
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China SwinDeW Family >SwinDeW – Swinburne Decentralised Workflow - foundation prototype based on p2p –SwinDeW – past –SwinDeW-A (for Agents) – ARC DP06 –SwinDeW-G (for Grid) – past –SwinDeW-V (for Verification) – current (ARC DP) –SwinDeW-C (for cloud) – current (ARC LP) –Others: SwinDeW-B / -S / -P / -G – past >Current Projects: –ARC DP , Cost effective storage of massive intermediate data in cloud computing applications, Duration: –ARC LP , Novel cloud computing based on workflow technology for managing large numbers of process instances, Duration:
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China SwinGrid to SwinCloud 5
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Outline > Background – Workflow Technology Group – SwinDeW Family, SwinGrid, SwinCloud > Brief Overview: Workflow Temporal QoS Support > Handling Temporal Violations in Scientific Workflows – Problem Analysis – An Effective Light-Weight Handling Framework – Two-Stage Local Workflow Rescheduling Strategy > Evaluation > Summary 6
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Scientific Workflows >Scientific Workflow often underlies many large-scale complex e- science applications such as climate modeling, astrophysics, structural biology and chemistry, earth quake simulation and disaster recovery. >Scientific workflows are usually deployed in distributed high performance computing infrastructures such as cluster, grid and cloud. >Compared with conventional business workflows, most scientific workflow are more data and/or computation intensive, less human interaction, large scale, complex process structures.
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Temporal QoS Support for Scientific Workflows >Motivation: most e-science applications are time constrained with global temporal constraints (deadlines) and local temporal constraints (milestones) to achieve some pre-defined goals on schedule. >Basic requirements: automation and cost-effectiveness. >Challenges: highly dynamic system environments, changing process structures, charge for the usage of resources >Solution: A Novel Probabilistic Temporal Framework and Its Strategies for Cost-Effective Delivery of High QoS in Scientific Cloud Workflow Systems [PhD Thesis - Xiao Liu]
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Lifecycle Support of Temporal QoS
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Lifecycle Support of Temporal QoS >At workflow build-time modeling stage –Component 1: temporal constraint setting Forecasting activity durations [eScience08], [JSS10b] Setting both coarse-grained and fine-grained temporal constraints [BPM08], [CCPE09], [JCSS10] –Component 2: temporal consistency monitoring Temporal checkpoint selection [ICSE08], [TAAS07] Temporal verification [CCPE07], [ToSEM09] –Component 3: temporal violation handling Temporal violation handling point selection [TSE] Temporal violation handling [CCGrid], [JSS10a], [TSE], [ICPADS]
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Outline > Background – Workflow Technology Group – SwinDeW Family, SwinGrid, SwinCloud > Brief Overview: Workflow Temporal QoS Support > Handling Temporal Violations in Scientific Workflows – Problem Analysis – An Effective Light-Weight Handling Framework – Two-Stage Local Workflow Rescheduling Strategy > Evaluation > Summary 11
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Problem Analysis >Basic requirements: automation and cost-effectiveness > 1) How to define fine-grained recoverable temporal violations. –Define statistical recoverable and non-recoverable temporal violations, to avoid heavy-weight exception handling strategies and facilitate light-weight ones –Divide fine-grained recoverable temporal violations, to facilitate the choice of different handling strategies with different capability (higher capability, higher cost) > 2) Which light-weight effective exception handling strategies to be facilitated. –Employ or design a set of light-weight handling strategies, from low capability to high capability (low cost to high cost)
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China An Effective Light-Weight Handling Framework >Three levels of temporal violations –Level I, Level II and Level III >Corresponding three levels of temporal violation handling strategies –TDA, ACOWR and TDA+ACOWR
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Three Levels of Handling Strategies >TDA (Time Deficit Allocation) [CCPE07] –TDA is to actively propagate small time deficits to the subsequent workflow activities so that they may be compensated by their saved execution time. >ACOWR (Ant Colony Optimisation based Workflow Rescheduling) [CCGrid10] –Based on our general two-stage local workflow rescheduling strategy –Using ACO as the metaheuristic algorithm >TDA+ACOWR (the hybrid strategy of TDA and ACOWR) –One time TDA and multiple times of ACOWR (normally smaller than 3)
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China A General Two-Stage Workflow Local Rescheduling Strategy >Handling temporal violations with workflow rescheduling >Key objective: reduce or ideally remove the time deficit at the current checkpoint, i.e. to reduce the execution time of the subsequent activities after the checkpoint in the violated workflow segment as much as possible >Requirement 1: fighting good balance between time deficit compensation and the completion time of other activities (workflow activities and general tasks, with or without temporal constraints) – from the overall makespan perspective >Requirement 2: utilising available resources in the system rather than recruiting additional resources – from the overall cost perspective 15
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Integrated Task Resource List 16
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China 17 Pseudo-code for An Abstract Strategy
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Outline > Background – Workflow Technology Group – SwinDeW Family, SwinGrid, SwinCloud > Brief Overview: Workflow Temporal QoS Support > Handling Temporal Violations in Scientific Workflows – Problem Analysis – An Effective Light-Weight Handling Framework – Two-Stage Local Workflow Rescheduling Strategy > Evaluation > Summary 18
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Evaluation >Performance analysis and comparison (with GA) for ACOWR –Optimisation on Total Makespan –Optimisation on Total Cost –Time Compensation on Violated Workflow Segment –CPU Time >Effectiveness evaluation of the three-level handing framework –Violation Rate of Global Temporal Constraints and Local Temporal Constraints –Cost Analysis
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Optimisation on Total Makespan 20
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Optimisation on Total Cost 21
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Time Compensation on Violated Workflow Segment 22
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China CPU Time 23
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Experiment Results on Temporal Violation Rates 24
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Cost Analysis
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Outline > Background – Workflow Technology Group – SwinDeW Family, SwinGrid, SwinCloud > Brief Overview: Workflow Temporal QoS Support > Handling Temporal Violations in Scientific Workflows – Problem Analysis – An Effective Light-Weight Handling Framework – Two-Stage Local Workflow Rescheduling Strategy > Evaluation > Summary 26
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China Summary >Temporal QoS Support is Critical in e-Science Applications >Temporal Violation Handling in Scientific Workflows –Automatic, Cost-Effective –Level I, Level II and Level III –TDA, ACOWR, TDA+ACOWR >A Two-Stage Workflow Local Rescheduling Strategy ACO, GA, PSO, many other metaheuristics >Future Work –Data movement cost –More scheduling algorithms 27
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China The End – Thank You! >Any questions or comments? > >Website: >An extension of this paper, titled “A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems,” has been accepted by Journal of Systems and Software (JSS), 28