Job Scheduling in a Grid Computing Environment

Job Scheduling in a Grid Computing Environment
Colton Lewis

Agenda Last presentation: introduce grid computing
This presentation: address job scheduling techniques in detail Review what is grid computing Job scheduling challenges in grid computing Case study in approaching these challenges

Components of Grid Computing
Multiple computers Independently functioning hardware Multiple locations and/or owners Shared computational goal Distributed resources over a network Typically an already existing network

Benefits of Grid Computing
Large pool of resources Large grids are comparable in FLOP/s to top 500 supercomputers Distributed costs Administration Maintenance Electricity Space Utilize existing infrastructure and avoid specialized hardware

Inherent Parallelism Large numbers of computers means lots of possible parallelism Great for handling large numbers of easily separable tasks Easily parallelizable problems with little communication Signal processing, graphics and animation, search and simulation, etc. High volume of very similar tasks

Example Grid Project: SETI@home

Job Scheduling NP-Hard computer science problem
Optimality is computationally intractable in general Combinatorial Optimization Grids must consider even more factors

Heterogeneous Machines
Machines on the grid may have vastly different resources Dedicated clusters Desktop computer donating spare cycles Embedded devices Must account for this to balance load

Dynamic Network Resources may not be available
Computers may be shut off, software uninstalled Resources may not be reliable Hardware errors, malicious participants returning incorrect results

General Strategies Know as much as possible Use Heuristics
Job intensity Client capabilities Use Heuristics

Examining the BOINC Scheduler
Berkeley Open Infrastructure for Network Computing Software behind many volunteer computing projects

Terminology Host – a worker machine
May work on multiple projects Client – program for fetching jobs from servers All server communication is issued by the client Server – a task assignment program Project – a long-running computation on the grid May have its own server or share SETI is a BOINC project Job – subtask of a project assigned to a host Application – program for performing a job Supplied by project

BOINC Host Architecture

User Preferences Informs many scheduling decisions
Owner of host can specify Resource share of projects Limits on CPU, RAM, Network Bandwidth Connection interval to server(s)

Credit Hosts are assigned credit for jobs completed before deadline
Based on estimated number of FLOPs Each project awards credit Provides a way to rank performance of hosts Points toward possible grid improvements

Host Perspective Each host must solve two related problems
CPU scheduling – when to run currently assigned jobs When to ask a project for more work Works to maximize credit subject to constraints User preferences Hardware

Early Policies CPU Scheduling – Weighted Round Robin
Each project given CPU time according to user specified percentage Does not account for deadlines, may waste lots of work Work Fetch Scheduling – Keep enough work for full connection interval for all projects

Example Failure Consider the table to the right
Jobs complete in 250, 20, and 10 hours CPU is never idle, but all work is wasted

Earliest Deadline First
Does not enforce desired resource sharing Projects with long jobs will stave

Estimating CPU Time Knowing CPU time means knowing which jobs can be completed by deadline Project supplied FLOPs estimate divided by host CPU benchmark Can be consistently wrong, real projects need memory, io, etc. Duration correction factor per project How much CPU time did last project take compared to estimate CPU efficiency factor How does actual CPU time compare to wall time Applications may periodically report percentage done

Debt The amount of work “owed” to a project
Long term enforcement of resource shares while still attending to deadlines Short term debt controls CPU scheduling over one connection interval Long term debt controls Work Fetching

CPU Scheduling Periodically calculate debt to each project
CPU time expected by resource sharing minus CPU time spent Deduct expected payoff from currently running jobs Run earliest deadline job from project with most debt

Work Fetching Same general method as CPU Scheduling
Controls new jobs requested rather than CPU time

Server Perspective Must ensure correctness of results, if needed
Must deliver reasonable jobs to hosts requesting work

BOINC Server Architecture

Credit and Redundancy Many jobs require error checking
Solution: assign same job to two or more hosts Answers are compared by project server If enough hosts agree, answer is accepted Credit is awarded to all correct hosts When assigning new work, prioritize jobs waiting for an answer

Job Size Matching Assume jobs can be created in various size classes
Keep order statistics of known host performance When assigning new work, prioritize jobs that are the right size for the requesting host If possible, create jobs according the distribution of known hosts

Summary Effective grid computing must consider both host and server
Nature of grid means different interests may control each Long running projects allow for predictive statistics CPU time, job matching, etc. The best known methods use heuristics to decide what to do Human-like notions of “credit”, “debt”, etc.

Works Consulted D. P. Anderson and J. McLeod, "Local Scheduling for Volunteer Computing," IEEE International Parallel and Distributed Processing Symposium, Long Beach, CA, 2007, pp. 1-8. D. P. Anderson, E. Korpela and R. Walton, "High-performance task distribution for volunteer computing," First International Conference on e-Science and Grid Computing (e-Science'05), Melbourne, Vic., 2005, pp. 8 pp.-203. E. Korpela, D. Werthimer, D. Anderson, J. Cobb and M. Leboisky, massively distributed computing for SETI," in Computing in Science & Engineering, vol. 3, no. 1, pp , Jan/Feb 2001. Jacob, Bart, et al. Introduction to Grid Computing. United States: IBM, International Technical Support Organization, Web. < <

Job Scheduling in a Grid Computing Environment

Similar presentations

Presentation on theme: "Job Scheduling in a Grid Computing Environment"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Job Scheduling in a Grid Computing Environment

Similar presentations

Presentation on theme: "Job Scheduling in a Grid Computing Environment"— Presentation transcript:

Similar presentations

About project

Feedback