Xiao Liu CITR - Centre for Information Technology Research Swinburne University of Technology, Australia Temporal Verification in Grid/

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Design of Experiments Lecture I
1 Analysis of workflows : Verification, validation, and performance analysis. Wil van der Aalst Eindhoven University of Technology Faculty of Technology.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 2.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Swinburne University.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Process Mining in CSCW Systems All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei ( )
Simulation.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
A university for the world real R © 2009, Chapter 17 Process Mining and Simulation Moe Wynn Anne Rozinat Wil van der Aalst Arthur.
X. Liu, J. Chen, Z. Wu, Z. Ni, D. Yuan, Y. Yang, CCGrid10, , Melbourne, Australia Handling Recoverable Temporal Violations in Scientific Workflow.
Insuring Sensitive Processes through Process Mining Jorge Munoz-Gama Isao Echizen Jorge Munoz-Gama and Isao Echizen.
Introduction of CS3 and Research in Workflow Technology Program Xiao Liu CS3, Swinburne University of Technology Melbourne, Australia.
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
AN OPTIMISTIC CONCURRENCY CONTROL ALGORITHM FOR MOBILE AD-HOC NETWORK DATABASES Brendan Walker.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
 1  Outline  stages and topics in simulation  generation of random variates.
DISTRIBUTED COMPUTING
Cluster Reliability Project ISIS Vanderbilt University.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Swarm Intelligence 虞台文.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
Xiao Liu CS3 -- Centre for Complex Software Systems and Services Swinburne University of Technology, Australia Key Research Issues in.
1 520 Student Presentation GridSim – Grid Modeling and Simulation Toolkit.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Chapter 10 Verification and Validation of Simulation Models
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
Xiao Liu 1, Yun Yang 1, Jinjun Chen 1, Qing Wang 2, and Mingshu Li 2 1 Centre for Complex Software Systems and Services Swinburne University of Technology.
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China An Effective Framework for Handling Recoverable Temporal Violations.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Xiao Liu, Jinjun Chen, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia {xliu, jchen,
Chapter 1 Overview of Databases and Transaction Processing.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Business process management (BPM)
Chapter 4: Business Process and Functional Modeling, continued
By Arijit Chatterjee Dr
Software Testing.
Software Engineering (CSI 321)
Job Scheduling in a Grid Computing Environment
Presented by Munezero Immaculee Joselyne PhD in Software Engineering
Business process management (BPM)
Design of a Multi-Agent System for Distributed Voltage Regulation
Types of Testing Visit to more Learning Resources.
Supporting Fault-Tolerance in Streaming Grid Applications
Chapter 10 Verification and Validation of Simulation Models
Model-Driven Analysis Frameworks for Embedded Systems
An Adaptive Middleware for Supporting Time-Critical Event Response
Chapter 5 Architectural Design.
Yining ZHAO Computer Network Information Center,
Presentation transcript:

Xiao Liu CITR - Centre for Information Technology Research Swinburne University of Technology, Australia Temporal Verification in Grid/ Scientific Workflows

2 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Content

3 Grid/Scientific Workflow Grid Workflow Management System  A type of workflow management system aiming at supporting large-scale sophisticated scientific and business processes in complex e-science and e-business applications, by facilitating the resource sharing and computing power of underlying grid infrastructure. Scientific Workflow Management System  A type of workflow management system aiming at supporting complex scientific processes in many e-science applications such as climate modelling, astronomy data processing. It may or may not be built upon grid infrastructure. Can be cluster or P2P.

4 How Are Grid Used High-performance computing Collaborative data-sharing Collaborative design Drug discovery Financial modeling Data center automation High-energy physics Life sciences E-Business E-Science Natural language processing & Data Mining Utility computing From

5 An Example Grid Application From

6 Grid Architecture From

7 Grid Workflow Engine From

8 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Where Are We

9 Temporal Verification In reality, complex scientific and business processes are normally time constrained. Hence, time constraints are often set when they are modelled as grid workflow specifications. Temporal constraints mainly include: upper bound, lower bound and fixed-time  Upper bound constraint  Lower bound constraint  Fixed-time constraint Temporal verification is used to identified any temporal violations so that we can handle them in time.

10 Temporal QOS Framework Constraint Setting  Setting temporal constraints according to temporal QOS Specifications Checkpoint Selection  Selecting necessary and sufficient checkpoints to conduct temporal verification Temporal Verification  Verifying the consistency states at selected checkpoints  Temporal Consistency: SC (Strong Consistency), WC (Weak Consistency), WI (Weak Consistency), SI (Strong Consistency) Temporal Adjustment  Handling temporal violations

11 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Where Are We

12 Setting Temporal Constraints Problem Statement  In scientific workflow systems, temporal consistency is critical to ensure the timely completion of workflow instances. To monitor and guarantee the correctness of temporal consistency, temporal constraints are often set and then verified. However, most current work adopts user specified temporal constraints without considering system performance, and hence may result in frequent temporal violations that deteriorate the overall workflow execution effectiveness. Granularity of temporal constraints  Coarse-grained constraints refer to those assigned to the entire workflow or workflow segments.  Fine-grained constraints refer to those assigned to individual activities.

13 A Motivating Example This workflow segment contains 12 activities which are modeled by SPN (Stochastic Petri Net) with additional graphic notations. For simplicity, we denote these activities as X 1 to X 12. The workflow process structures are composed with four SPN based building blocks, i.e. a choice block for data collection from two radars at different locations (activities X 1 to X 4 ), a compound block of parallelism and iteration for data updating and pre- processing (activities X 6 to X 10 ), and two sequence blocks for data transferring (activities X 5,X 11 to X 12 ).

14 Two Basic Requirements Temporal constraints should be well balanced between user requirements and system performance.  It is common that clients often suggest coarse-grained temporal constraints based on their own interest while with limited knowledge about the actual performance of workflow systems. Therefore, user specified constraints are normally prone to cause frequent temporal violations. Temporal constraints should facilitate both overall coarse-grained control and local fine-grained control.  Both coarse-grained temporal constraints and fine-grained temporal constraints should be supported. However, note that coarse-grained temporal constraints and fine-grained temporal constraints are not in a simple relationship of linear culmination and decomposition. Meanwhile, it is impractical to set fine-grained temporal constraints manually for a large amount of activities in scientific workflows.

15 A Probabilistic Strategy Probability based temporal consistency  A novel probability based temporal consistency which utilise the weighted joint distribution of workflow acitivity durations is proposed to facilitate setting temporal constraints. Two assumptions on activity durations  Assumption 1: The distribution of activity durations can be obtained from workflow system logs. Without losing generality, we assume all the activity durations follow the normal distribution model, which can be denoted as N(µ,σ 2 ).  Assumption 2: The activity durations are independent to each other.  Exception handling of assumptions : Using normal transformation and correlation analysis, or moreover, ignoring first when calculating joint distribution and then added up afterwards.

16 Weighted Joint Normal Distribution Joint normal distribution  If there are n independent variables of X i ~N (µ i,σ i 2 ) and n real numbers θ i, where n is a limited natural number, then the joint distribution of these variables can be obtained with the following formula: Weighted joint normal distribution  For a scientific workflow process SW which consists of n activities, we denote the activity duration distribution of activity a i as N (µ i,σ i 2 ) with (1≤i≤n). Then the weighted joint distribution is defined as: where w i stands for the weight of activity a i that denotes the choice probability or iteration times associated with the workflow path where a i belongs to.

17 Probabilistic Specification of Activity Durations Maximum Duration, Mean Duration, Minimum Duration The 3σ rule depicts that for any sample comes from normal distribution model, it has a probability of 99.73% to fall into the range [µ-3 σ, µ+3 σ] of which is a systematic interval of 3 standard deviation around the mean. According to this, in our strategy, we have the following specification of activity durations:  Maximum Duration D(a i )= µ+3 σ  Mean Duration M(a i )= µ  Minimum Duration d(a i )= µ-3 σ

18 Probability based Temporal Consistency

19 Setting Strategy

20 Stpe1: Weighted Joint Normal Distribution Here, to illustrate and facilitate the calculation of the weighted joint distribution, we analyse basic SPN based building blocks, i.e. sequence, iteration, parallelism and choice. These four building blocks consist of basic control flow patterns and are widely used in workflow modelling and structure analysis. Most workflow process models can be easily built by their compositions, and similarly for the weighted joint distribution of most workflow processes.

21 Step2: Setting Coarse-grained Constraints I Want the process be completed in 48 hours Let me check the probability The negotiation process

22 Step2: Setting Coarse-grained Constraints That’s not good, how about 52 hours Sir, its 70%, do you agree? Adjust the constraint

23 Step2: Setting Coarse-grained Constraints Err… how long will it take if I want to have 90% Then, it increases to 85% Adjust the probability

24 Step2: Setting Coarse-grained Constraints Ok, that’s the deal! Let’s do it! It will take us 54 hours Negotiation result

25 Step2: Setting Coarse-grained Constraints Ok! But, sir, I need to remind you that this is only a guarantee from statistic sense. If we cannot make it, please blame the stupid guy who invents the strategy! Sorry, statistically, no predictions can be 100% sure!

26 Step3: Setting Fine-grained Constrains Setting fine-grained constraints for individual activities  Assume the probability gained from the last step is θ% that is with a normal percentile of λ. Then the fine-grained constraints for individual activities are (µ i +λσ i ).  For example, if the coarse-grained temporal constraints are of 90% consistency, that is a normal percentile of 1.28, then the fine-grained constraint for activity a i with a distribution of N(µ I, σ i ) is (µ i +1.28σ i ).

27 Evaluation--Specification

28 Setting Results: Coarse-grained Constraint Negotiation for coarse-grained constraint 6300s 6360s 6390s 6400s 66% 75% 79% 81% WS~N(6210,218 2 ) U(WS)=6400, λ=0.87

29 Setting Results: Fine-grained Constraint

30 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in CITR Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Where Are We

31 SwinDeW-G Grid Workflow System SwinDeW-G stands for Swin burne De centralised W orkflow for G rid.  SwinDeW-G is a peer-to-peer based scientific grid workflow system running on the SwinGrid (Swinburne service Grid) platform. Swinburne CITR (Centre for Information Technology Research) Node, Swinburne ESR (Enterprise Systems Research laboratory) Node, Swinburne Astrophysics Supercomputer Node, and Beihang CROWN (China R&D environment Over Wide-area Network) Node in China. They are running Linux, GT4 (Globus Toolkit) or CROWN grid toolkit 2.5 where CROWN is an extension of GT4 with more middleware, hence compatible with GT4.

32 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in CITR Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Where Are We

33 Research Areas in WT Peer-to-peer based, service oriented and grid workflows  SwinDeW-A: SwinDeW with agent enhanced negotiation  SwinDeW-B: SwinDeW incorporating BPLE4WS (past)  SwinDeW-G: peer-to-peer based service grid workflow system  SwinDeW-S: SwinDeW incorporating Web services (past)  SwinDeW-V: temporal constraint verification in grid workflows  SwinDeW: peer-to-peer based decentralised workflow system (past) Service-oriented computing  SwinGrid - a Swinburne Service Grid Platform which connects Swinburne CITR nodes and Swinburne Supercomputer with external nodes nationally and internationally, forming a Grid computing environment.

34 Recent Publications in WT X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows, Proc. 6th International Conference on Business Process Management (BPM2008), Sept Milan, Italy. K. Ren, X. Liu, J. Chen, N. Xiao, J. Song, W. Zhang, A QSQL-based efficient Planning Algorithm for fully-automated Service Composition in Dynamic Service Environments, Proc. of IEEE International Conference on Services Computing (SCC2008), Honolulu, Hawaii, USA, July J. Chen and Y. Yang, A Taxonomy of Grid Workflow Verification and Validation. Concurrency and Computation: Practice and Experience, Wiley, 20(4): , J. Chen and Y. Yang, Adaptive Selection of Necessary and Sufficient Checkpoints for Dynamic Verification of Temporal Constraints in Grid Workflow Systems. ACM Transactions on Autonomous and Adaptive Systems, 2(2):Article6, June Q. He, J. Yan, R. Kowalczyk, H. Jin, Y. Yang, Lifetime Service Level Agreement Management with Autonomous Agents for Services Provision. Information Sciences, Elsevier, to appear. K. Liu, J. Chen, Y. Yang and H. Jin, A Throughput Maximisation Strategy for Scheduling Transaction Intensive Workflows on SwinDeW-G. Concurrency and Computation: Practice and Experience, Wiley, to appear. J. Yan, Y. Yang and G. K. Raikundalia. SwinDeW - A Peer-to-peer based Decentralized Workflow Management System. IEEE Transactions on Systems, Man and Cybernetics, Part A, 36(5): , 2006.

35 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in CITR Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Where Are We

36 Data Mining Techniques in Workflow area Process Mining Overview 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security From

37 Process Mining From 1.Process Discovery 2.Conformance testing 3.Log based verification

38 ProM Framework From

39 Other Workflow Mining Topics Successful Termination Prediction.  To choose an activity from a given set of potential activities which is the choice performed in the past that had more frequently led to a desired final configuration. Identification of Critical Activities.  To discover those activities that can be considered critical in the sense that they are scheduled by the system in every successful execution. Failure/Success Characterization.  By analysing the past experience, a workflow administrator may be interested in knowing which discriminate factors characterize the failure or the success in the executions. Workflow Optimization.  The information collected into the logs of the system can be profitably used to reason on the “optimality” of workflow executions. Workflow Performance Related Analysis and Prediction  Time series mining used in the prediction of activity durations, setting temporal constraints and dynamic temporal verification

40 References on Workflow Mining G. Greco, A. Guzzo, G. Manco and D. Sacca, Mining and Reasoning on Workflows, IEEE Trans. on Knowledge and Data Engineering, Vol. 17, No. 4, pp , APRIL W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A.J.M.M. Weijters, Workflow Mining: A Survey of Issues and Approaches. Data and Knowledge Engineering, Vol. 47, No. 2, pp , A.K.A. de Medeiros, W.M.P. van der Aalst, and A.J.M.M. Weijters, Workflow Mining: Current Status and Future Directions, CoopIS 2003, volume 2888 of Lecture Notes in Computer Science, pages Springer-Verlag, Berlin, W.M.P. van der Aalst, H.T. de Beer, and B.F. van Dongen, Process Mining and Verification of Properties: An Approach based on Temporal Logic, CoopIS 2005, volume 3760 of Lecture Notes in Computer Science, pages Springer-Verlag, Berlin, 2005.

41 Grid/Scientific Workflows Temporal QOS Framework  Setting Temporal Constraints in Scientific Workflows SwinDeW-G Grid Workflow Management System Additional Information  Research areas in CITR Workflow Technology Program  Data Mining Techniques in Workflow area  Optimization Algorithms in Workflow area Where Are We

42 Grid Resource Management System Resource Broker Grid Resource Manager Information Services Monitoring Services Security Services Core Grid Infrastructure Services Grid Middleware PBSLSF… Resource Local Resource Management Higher-Level Services User/ Application From

43 Grid Workflow Scheduling Scheduler Schedule time Job-Queue Machine 1 Scheduler Schedule time Job-Queue Machine 2 Scheduler Schedule time Job-Queue Machine 3 Grid-Scheduler Grid User From

44 A taxonomy of Grid workflow scheduling algorithms

45 GA based Scheduling Fundamentals for GA based Scheduling 1. Encoding/Decoding 2. Genetic Operators: Crossover, Mutation and Selection. 3. Fitness Evaluation Function

46 Others Simulated Annealing Ant Colony Workflow Rescheduling  When any QOS constraints are violated, how to handle those violations by rescheduling current task list to compensate, e.g. time or budget deficits.

47 Summary Grid/Scientific Workflows Temporal Verification and Temporal Adjustment to Support Temporal QOS Framework Workflow Mining (More than process mining ) Optimization Algorithms for Workflow Scheduling and Rescheduling

48 Useful Links  Our work on temporal verification in scientific/grid workflows  Home page of Pro. Wil van der Aalst, Workflow Research  Home page of Dr. Rajkumar Buyya, Grid Research  Home page of Eamonn Keogh, Time Series Mining  UCR Time Series Database

49 The End Any questions or comments?