Gantenbein & Sung CAINE 2003 1 Task Scheduling in Distributed Data Mining for Medical Applications Rex E. Gantenbein, University of Wyoming, Laramie WY.

Gantenbein & Sung CAINE 2003 1 Task Scheduling in Distributed Data Mining for Medical Applications Rex E. Gantenbein, University of Wyoming, Laramie WY Chang-Oan Sung, Indiana University Southeast, New Albany IN

Gantenbein & SungCAINE 20032 Data mining & medical applications Data mining: exploration of large data sets looking for patterns or relationships Can be done centrally or across a distributed framework Latter approach looks for previous unknown patterns among dispersed data sets Valuable in medical applications (epidemiology, clinical research, etc.)

Gantenbein & SungCAINE 20033 Task scheduling & data mining Performance is an issue with data mining Scheduling algorithms determine which processors on the network execute a particular task Scheduling may be static (predetermined task-processor mapping) or dynamic (look for free processors at time of task creation) Distributed data mining typically uses a simple dynamic algorithm

Gantenbein & SungCAINE 20034 Task scheduling & data mining Resource-aware dynamic scheduling requires knowledge of resources available at each processor in the network Resources can be estimated from a list of known resources (such as data sets) and the tasks previously executed by a processor

Gantenbein & SungCAINE 20035 Resource-aware scheduling Parthasarathy and Subramonian proposed an algorithm for scheduling distributed data mining tasks Tasks are distributed based on required resources for each task Tasks are monitored interactively Good for simple distributed data mining tasks

Gantenbein & SungCAINE 20036 Resource-aware scheduling PS algorithm assumes that all resources are of equal size and “value” May not be true in complex applications or with varying resource types A resource may differ significantly in size or importance (age v. radiograph, for example)

Gantenbein & SungCAINE 20037 Improved scheduling algorithm Need modifications to account for varying resource types Improve usefulness of scheduling algorithm for medical applications (among others) Avoid hampering performance

Gantenbein & SungCAINE 20038 Improved scheduling algorithm The cost of executing a task t on processor P is defined as the sum of the lengths of the resources to be acquired before the task can execute Cost here represents acquisition of resources, not actual computation time If there is a task of zero cost for a processor, then that task is scheduled on that processor

Gantenbein & SungCAINE 20039 Improved scheduling algorithm If the cost of a task is non-zero, then we can choose the task that maximizes the profit in the system Gain is determined by the decrease in cost to perform all remaining tasks Profit then is the difference between the gain and the cost

Gantenbein & SungCAINE 200310 Improved scheduling algorithm If there are schedules that have the same profit, then we break the tie by: Choosing the task with the minimum length of resources needed Choosing the task that minimizes the resource overlap among processors possessing the resources needed

Gantenbein & SungCAINE 200311 Load balancing Sometimes it may be better to assign a task to an idle processor, even if the assignment costs more Our algorithm includes a factor 0 <  < 1 to account for the relative increase in costs due to balancing Computing this value is still an open problem

Gantenbein & SungCAINE 200312 Performance evaluation We have implemented the new algorithm on a 4-computer network  chosen at 0.5 Tested on four separate data sets In one case, the new algorithm was slightly slower than old In three of four cases, performance improved by 3-6%

Gantenbein & SungCAINE 200313 Performance evaluation Scheduling performance (seconds)

Gantenbein & SungCAINE 200314 Conclusions Resource-aware scheduling for distributed data mining must allow for varying resource size and value An algorithm based on length of resources improves flexibility and does not hamper performance significantly (and in some cases, improves it)

Gantenbein & SungCAINE 200315 Questions? This publication was made possible by NIH Grant P20 RR16474 from the BRIN Program of the National Center for Research Resources.

Gantenbein & Sung CAINE 2003 1 Task Scheduling in Distributed Data Mining for Medical Applications Rex E. Gantenbein, University of Wyoming, Laramie WY.

Similar presentations

Presentation on theme: "Gantenbein & Sung CAINE 2003 1 Task Scheduling in Distributed Data Mining for Medical Applications Rex E. Gantenbein, University of Wyoming, Laramie WY."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gantenbein & Sung CAINE 2003 1 Task Scheduling in Distributed Data Mining for Medical Applications Rex E. Gantenbein, University of Wyoming, Laramie WY.

Similar presentations

Presentation on theme: "Gantenbein & Sung CAINE 2003 1 Task Scheduling in Distributed Data Mining for Medical Applications Rex E. Gantenbein, University of Wyoming, Laramie WY."— Presentation transcript:

Similar presentations

About project

Feedback