Jérémie Sublime Sonia Yassa Development of meta-heuristics for workflow scheduling based on quality of service requirements 1
Plan Introduction Related works Problem modeling Work on Algorithms Ongoing work 2
Introduction : Cloud computing Grid Computing enables the sharing exchange, selection and aggregation of geographically distributed “autonomous” resources existing under different domains. Cloud computing provides infrastructures, platforms and software as subscription-based services in a pay-as-you-go model to consumers. Functional and non-functional services of these resources are called Quality of Service (QoS) requirements. The QoS are negociated and expressed by the means of Service Level Agreements (SLAs). 3
Introduction : Cloud Computing A Workflow is a set of Tasks and dependencies. Workflow Scheduling is the process that maps and manages the process of inter- dependant tasks on different resources. 4
Introduction : Motivations & Challenges Workflow technology has been introduced to help scientists to perform their work. Scientific workflows usually contain a large number of tasks and complex data. It requires high computation power that grid, and more recently cloud computing, can provide. Deciding on an effective scientific workflow scheduling on a grid or a cloud is a difficult problem. This problem is even more difficult when several criteria to optimize have to be taken into account: Heterogeneity, dynamicity and elasticity of resources Performance constraints (minimum execution time) Resources shared between multiple users Transfer of large volumes of data Scalability, security, etc. 5
Introduction : Motivations & Challenges Workflow scheduling algorithms are classified into two types: Best-effort based scheduling : attempts to minimize the execution time ignoring other factors such as cost for access to resources and levels of satisfaction of users QoS. QoS based scheduling tries to improve performances under QoS constraints, for example, minimizing the time under budget constraints or cost minimization under time constraints. Several algorithms have already been proposed for the first category, but the second one has been less studied. 6
Introduction : Objectives Merging and adapting the existing works based on single objective optimization and Best-effort optimization, and extend it to larger workflows. Developing multi-objective optimization algorithms for SLA based workflow scheduling in a cloud environments. The scheduling algorithms will be based on either PSO or GA and must be able to: Analyze the users QoS parameters, Negotiate with service providers to establish a SLA. Map the workflow tasks on the appropriate resources so that: The execution must be completed, Users QoS constraints must be satisfied, The use of cloud resources must be optimized. Trying algorithm hybridization, as well as combinations of different algorithms to improve meta-heuristics performances. Comparing algorithms performances for SLA based workflow Scheduling 7
Related Works R. Buyya, “Economic-based Distributed Resource Management and Scheduling for Grid Computing,” Ph.D. dissertation, 2002 M. Rahman, S.Venugopal, R. Buyya, A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids, Proc. of the 3rd IEEE International Conference on e-Science and Grid Computing, 2007 Yu J., R. Buyya : Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms. Sci. Program, 2006 W. Chen, J. Zhang, An Ant Colony Optimization Approach to a Grid Workflow Scheduling Problem With Various QoS Requirements, IEEE Transactions on Systems,
Problem modeling : Workflow model Workflow can be modeled as a Directed Acyclic Graph (DAG) G = (V, E), where: V = {T1, T2,..., Tn} is the set of workflow tasks. E represents the data dependencies between these tasks. F j,k = (Tj, Tk) ∈ E is the data produced by TJ and consumed by TK Assumptions : A child task wait for parents task to be completed and data transfer to be done One resource can handle one task at a time The time needed to compute a given task on a given resource for a given frequency is known. The volume of data transferred between 2 tasks is known 9
Problem Modeling : Resource grid The Target environment is a set of heterogeneous resources linked with each others. It will be modeled with a DAG too. Assumptions : For each resources the following parameters are known : Reliability Ranges of available frequencies and voltages (working in pairs) Speed of transfer from and to the other resources. 10 A resource : -Frequency - Voltage - Reliability - Cost of usage data Speed 2,3 Speed 1,3 Speed 1,2 A network of resources :
Problem modeling : Formulas Makespan T = func(∑ (T exec + T transfer )) T transfert(Ra,Rb) = DataSize/Speed Ra,Rb Energy cost for executions on a resource i : E i exec = ∑(Time exec ).f.V 2 + λ.( Makespan-∑(Time exec )) Total Energy = ∑ (E i exec + E i trsf ) Speed factor for a given resource, SpeedFactor = f / f nom T exec(Ti/Rj) = NominalExecTime Ti/Rj.SpeedFactor j Error coefficient for a given resource i : Coeff i = ∑(Time exec )Reliability i Overall Reliability (%) : Reliability = exp -∑ Coeffi Theoretical Availability of a Ressource K : Av K = (Makespan - ∑ Time exec(Ti/Rk) ) /Makespan Theoretical Global Workflow Availability A = ∏Av K 11
Problem modeling : Structure of the solutions 12 Task (index)T1T2T3T4….TN ResourceR3R2R1 ….R3 (Volt,Freq)(V1,f1)(V3,f3)(V1,f1)(V2,f2)….(V1,f1) Ranking (facultative) 5432….1 A solution is composed of a series of substructures, one for each task. A Substructure contains a task, the resource assigned to the task, the voltage and frequency couple associated with the resource, and for some algorithms a ranking priority for the task. For a solution to be valid, all tasks should be assigned to a single resource. A scheduler then apply the solution and compute the various QoS : Makespan Reliability Cost Energy consumption ….
Work on Algorithms : GA 13 - Generation of a Random Population of Solutions (or retrieve population generated by another algorithm) While ( constraint conditions are not met && n<NbMaxLoop ) loop : - Select X% of best solutions based on a fitness function - Random Cross-over between the Y% best solutions to create a new population - Random mutations on the new population - Replace Old Population with the new one End loop Work to be done : Choice of the fitness function model for multi-objective optimization : penalties, Pareto front, pondered addition of fitness function of different parameters, … Choice of a cross-over method : how to mix solution ? Elitism or not ? … Mutations : How ? Rate ? Replacing old population : keeping old best solution or not ?
Work on Algorithms : PSO/DPSO 14 - Generation of a Random Population of Solutions (or retrieve population generated by another algorithm) While ( constraint conditions are not met && n<NbMaxLoop ) loop : For each Solution : - Compute Velocity V i,n = ω.V i,n-1 + φ p.r p (xbest i – X i,n-1 ) + φ g.r g (swarmbest – X i,n-1 ) - Update position X i,n = V i,n + X i,n-1 - If fitness(X i,n )>xbest i then Update xbest i - If fitness(X i,n )>swarmbest then Update swarmbest end For End loop Work to be done : Choice of the fitness function model for multi-objective optimization : penalties, Pareto front, pondered addition of fitness function of different parameters, … Choice of a discrete model for PSO : overloading the operators +,*,- and adaptation the algorithm to the solution model Choice of parameters : ω, φ p, φ g
Work on algorithms : associations and hybridization 15 Associations of algorithms : With a lot of parameters to optimize, the work can be divided and done by different algorithms depending on their performances for a given problem : Generation of a probation set of solution with a classical algorithm like HEFT and then refining with GA or PSO to optimize on several criterions Several GA or PSO in parallel, each working on a specific criteria, and then merging their work by practicing populations migrations between them. Using GA or PSO to get a set of potential solutions and then refining the results with other algorithms such as Nelder-Mead polytope Hybridization : Adding a mutation factor to PSO by using some of GA functions Adding a memory to GA by using the same features than PSO Adapting and merging ACO and PSO algorithms
Ongoing Work : Simulation Environment A C++ Application with a Qt GUI will be used to simulate workflow scheduling operations and test several meta-heuristics. The application will include : The basis implementation of workflow data structures The implementation of several heuristics and meta-heuristics Interfaces : To set up the algorithms To show and modify workflow parameters Choose appropriated QoS restrictions Display results The Task graph as well as resources graph and characteristics will be represented as matrices. The software reliability will be tested by comparing the results of already implemented algorithms with those of other universities. 16
Ongoing work : Current state of the test program 17 Work done : Basic Interfaces : implemented Scheduler : implemented Current implemented and stable algorithms : HEFT, NGAII Current GA cross-over methods : random shuffle, double-point cross over, elitism based cross-over Current fitness function models : penalties, pondered QoS Implemented QoS variables : time, energy, reliability. The cost model is not yet clearly defined. To be done : Implementation of the missing QoS Interfaces for complex combination of algorithms Multithreading of the application Implementation of missing algorithms : PSO and variants documentation
Ongoing Work : current tests 18 Comparison of different GA versions performances for different sizes of workflow. Analysis process : Generation of a pool of solutions with HEFT algorithm Improvement of the set using NGAII algorithm : Pop size : 100 Mutation rate : 5% Selection rate : 50% NbMaxLoop : 25 The process is repeated several times to have an average result. HEFT can be used as reference to compare the different algorithms performances.
19 Questions ?