Download presentation
Presentation is loading. Please wait.
Published byDonald Neil Allison Modified over 9 years ago
1
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin Present by Sean Chou 1
2
A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 2
3
A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 3
4
I NTRODUCTION Grid computing is a newly developed technology for complex systems with large-scale resource sharing, wide-area communication, and multi- institutional collaboration. [1] This is required by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering. 4
5
I NTRODUCTION The sharing is controlled by a resource management system (RMS) [2] When the RMS receives a service request from a user, the task can be divided into a set of execution blocks (EBs) that are executed in parallel. The RMS assigns those EBs to available resources for execution. After the resources finish the assigned jobs, they return the results back to the RMS 5
6
I NTRODUCTION The above grid service process can be approximated by a structure with star topology 6
7
I NTRODUCTION The performance of grid computing is of great concern. Usually the measure of grid performance is the task execution time (service time). This index can be significantly improved by using the RMS that divides a task into a set of EBs which can be executed in parallel by multiple online resources. Many complicated and time-consuming tasks that could not be implemented before are currently working well under the grid computing environment 7
8
I NTRODUCTION The service time is a random variable affected by many factors [3]. 1. There are many resources available online, that have different task processing speeds. 2. Some resources can fail when running the jobs 3. The communication links in grid service can fail during the data transmission. 4. The choice of the group of subtasks assigned to the same EB and running on the same resource can influence the total amount of data transmitted between the RMS and the resource since different subtasks can use common input data blocks. 8
9
I NTRODUCTION Most of the previous researchers separated performance and reliability into two different fields and studied them individually. However in fact, performance and reliability are closely related and affect each other, in particular when the grid computing is implemented. 9
10
I NTRODUCTION For example, when a task is fully parallelized into n different EBs executed by n resources simultaneously, the performance is high but the reliability can be low because failure of any resource makes the entire task incomplete. Therefore, it is worth having some redundant resources to execute same EB especially for those failure-prone resources. However, too many redundancies, even though improving the reliability, can decrease the performance by not fully parallelizing the task. 10
11
I NTRODUCTION Performance and reliability should be studied together in the grid service analysis. The first model for evaluating performance (service time) of grid with star topology taking the service reliability into account was presented in [4]. 11
12
I NTRODUCTION Optimizing the division of a service task into EBs and distribution of these EBs among available grid resources can considerably improve the service performance. This paper presents an algorithm for solving these optimization problems based on the model developed in [4]. 12
13
A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 13
14
T HE MODEL 2.1. Service execution by the grid system with star architecture 2.2. Assumptions 2.3. Service execution time 2.4. Service reliability and expected performance 14
15
T HE MODEL Service execution by the grid system with star architecture Different resources are distributed in the grid system. The considered service can use a given set of resources. All the resources and communication channels from this set are available at the time when the request for service arrives to the RMS 15
16
T HE MODEL Each resource is directly connected to the RMS by a single communication channel forming the star topology. 16
17
T HE MODEL The service task consists of subtasks that can be independently executed by different resources. Different subtasks may need some common input data blocks for their execution. The subtasks can be grouped into EBs. The input data for any EB consists of input data blocks necessary for executing all the subtasks belonging to this EB. 17
18
T HE MODEL The request for service (task execution) arrives to the RMS which forms the EBs and assigns them to different resources for processing. Each resource gets no more than one EB for processing. The same EB can be assigned to several resources for parallel execution. If the same EB is processed by several resources, it is completed when first output is returned to the RMS. The entire task is completed when all of the EBs are completed and their results are returned to the RMS from the resources. 18
19
T HE MODEL Assumptions Each resource starts processing the assigned EB immediately after it gets all the necessary input data from the RMS through the corresponding communication channel. Each resource sends the output data to the RMS through the same communication channel immediately after it completes the EB. Each resource has a given constant processing speed when it is available. Each resource has a given constant failure rate. 19
20
T HE MODEL Each communication channel has constant data transmission speed (bandwidth) when it is available. Each communication channel has a constant failure rate. The subtasks belonging to an EB are processed in sequence. The subtask processing time is proportional to its computational complexity. The data transmission time is proportional to the amount of data transmitted between the RMS and a resource. 20
21
T HE MODEL The failure rates of the communication channels or resources are the same when they are idle or loaded (hot standby model). The failures at different resources and communication channels are independent. The RMS is fully reliable. The time of task processing by the RMS (formation and assignment of EBs, sending them to the resources, receiving the results and integrating them into entire task output) is negligible when compared with the EBs’ processing time. 21
22
T HE MODEL Service execution time The entire task consists of m subtasks that can be executed independently Any EB i consisting of a set of subtasks EB’s computational complexity : 22
23
T HE MODEL Each subtask j needs a set Bj of data blocks as its input and produces amount Oj of output data. The set of the input data blocks necessary for execution of EB i is [j2siBj the amount of data to be transmitted from the RMS to the resource executing this EB is 23
24
T HE MODEL The total amount of data (input and output) Di that should be transmitted between the RMS and a resource executing EB i is 24
25
T HE MODEL The EB execution time is defined as time from the beginning of input data transmission from the RMS to a resource to the end of output data transmission from the resource to the RMS. Therefore, the random time tij of EB i completion by resource j can take two possible values If the resource j and the communication channel j do not fail until the subtask completion, and otherwise. 25
26
T HE MODEL EB i can be successfully completed by resource j if this resource and communication link j do not fail before the end of subtask execution. For constant failure rates of resource j and communication link j one can obtain the probability of EB success as 26
27
T HE MODEL Assume that each EB i is assigned to resources composing set oi such that oi \ oj ?; for any iaj. The random time of EB i completion is The entire task is completed when all of the subtasks (including the slowest one) are completed. The random task execution time takes the form: 27
28
T HE MODEL Service reliability and expected performance In order to estimate both the service reliability and performance of a grid system, different measures can be used depending on the application. The system reliability ReyT is defined (according to performability concept [5,6]) as a probability that the correct output is produced in time less than y. 28
29
T HE MODEL The service reliability is defined as the probability that it produces correct outputs without respect to the service time. This index can be referred to as The conditional expected service time W is considered to be a measure of its performance. 29
30
T HE MODEL The service task partition into EBs (represented by the sets si, 1piph) and distribution of the EBs among the resources (represented by the sets oi, 1piph) determine the service reliability and performance. Two optimization problems: 30
31
A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 31
32
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The procedure used for the evaluation of service time distribution is based on the universal generating function (u-function) technique. Its high computational efficiency that allows it to be used in optimization procedures where a large number of different solutions should be estimated. 32
33
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The u-function ui;fjge can define pmf of total completion time tij for EB i assigned to resource j. This u-function takes the form of 33
34
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The total completion time of EB i assigned to a pair of resources k and j is equal to the minimum of completion times for different resources To obtain the u-function representing the pmf of this time, composition operator with should be used: 34
35
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The u-function representing the pmf of completion time of EB i assigned to all of the resources from set can be obtained recursively: 35
36
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME Having the u-functions uj;oj ez for each EB i (1piph) one can obtain the u-function representing the pmf of the entire task completion time Y 36
37
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The final u-function Uh(z represents the pmf of random task completion time Y in the form 37
38
A LGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME Algorithm for determining service performance/reliability indices for arbitrary task partition and distribution : 38
39
A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 39
40
N UMERICAL EXAMPLE Formulations (9) and (10) define a complicated NP complete partitioning/allocation problem. An exhaustive examination of all possible solutions is not realistic, considering reasonable time limitations. 40
41
N UMERICAL EXAMPLE A heuristic search algorithm is needed which uses only estimates of solution quality and which does not require derivative information to determine the next direction of the search. The genetic algorithm (GA) has been proven to be an effective optimization tool for a large number of complicated problems in reliability engineering [10,11]. 41
42
N UMERICAL EXAMPLE Consider a grid service that uses six resources distributed in the grid system. 42
43
N UMERICAL EXAMPLE The entire service task can be divided into eight independent subtasks. 43
44
N UMERICAL EXAMPLE The amount of data in each input data block is presented in Table 4. 44
45
N UMERICAL EXAMPLE First the optimal task partition and distribution problem was solved by the GA for formulation (9): The solutions for different allowed service time y are presented in Tables 5 and 6. 45
46
N UMERICAL EXAMPLE Table 5 contains obtained task partition into EB and their distribution among the resources 46
47
N UMERICAL EXAMPLE Table 6 contains minimal and maximal possible service times, the service reliability and the conditional expected service time for each obtained solution. 47
48
N UMERICAL EXAMPLE Functions for the obtained solutions are presented in Fig. 2. It can be seen that the best solutions obtained for certain y provide the greatest reliability for this value of service time whereas for other values of y they provide lower reliability than the solutions obtained for these values. 48
49
N UMERICAL EXAMPLE 49
50
N UMERICAL EXAMPLE 50
51
A GENDA Introduction The model Algorithm for determining the pmf of the service time Numerical example Conclusions 51
52
C ONCLUSIONS Grid technology is a newly developed method for large scale distributed system. This technology allows effective distribution of computational tasks among different resources presented in the grid. The resource management system (RMS) can divide service task into subtasks and send the subtasks to different resources for parallel execution. 52
53
C ONCLUSIONS For any given service task the service reliability and performance indices depend on task partition into EBs and their distribution among the available resources. The suggested optimization algorithm is aimed at achieving the greatest reliability/performance by the optimal task partition and distribution. 53
54
C ONCLUSIONS Most of the previous researchers separated performance and reliability into two different fields and studied them individually. However in fact, performance and reliability are closely related and affect each other, in particular when the grid computing is implemented. This paper presents an algorithm for solving these optimization problems about evaluating performance (service time) of grid with star topology taking the service reliability into account. 54
55
Thanks for your listening. 55
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.