Presentation is loading. Please wait.

Presentation is loading. Please wait.

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.

Similar presentations


Presentation on theme: "December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez."— Presentation transcript:

1 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez Senior Administrator Texas Tech University

2 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Outline What is a Job Manager? Types of Job Managers PBS Pro SGE LSF Condor/Condor-DAGman Rocks + Rolls (Quick overview)

3 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide What is a Job Manager? A Job Management System is a software component that ensures: Balanced use of cluster resources. Fair allocation of these resources to user's jobs in a process that determines which job to run When and where to run compute jobs.

4 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide What is a Job Manager?

5 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Components of a Job Manager Resource Management System –a process that maintains the current state of all the resources under its control, including the physical resources of the cluster and account information such as relative priorities and account balances. Queuing System – a process that maintains the current state of jobs submitted but not completed. Scheduler – a system that assigns jobs to resources.

6 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Why do we need a Job Manager? A Job Management System should always be used for a cluster: Operated as a public resource. If there are a large number of users or users who don't know each other. With a large number of nodes and processors. that runs a large number of jobs. Whose nodes are heterogeneous in terms of memory, speed, number of processors, software licenses, networking, and other features. Note: Most clusters are homogeneous with respect to hardware and software.

7 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Types of Job Managers Feature PBS ProSGECondorLSF Single process preemptive jobs NoYes Multi-process preemptive jobs NoYesNoYes Single process interactive jobs Yes Multi-process interactive jobs NoYesNoYes Single-process preemptive, interactive jobs NoYes Multi-process, preemptive, interactive jobs NoYesNoYes Costs Free academic Free academic Free academic Commercial Users’ desktops included in cluster (Cycle Scavenging Grid) Yes

8 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide PBS Pro Components: PBS Pro is made up of a number of components: The server and clients such as user commands. A server component manages a number of different objects, such as queues or jobs. Each object consists of a number of data items or attributes. Scheduling is policy based and operates in a FIFO round-robin type fashion. Specific Queues can be configured for priority queuing. Minimal Queue/Scheduler configuration

9 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide PBS Pro

10 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide PBS Pro Graphical User Interface

11 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide PBS Pro Graphical User Interface

12 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SGE – Sun Grid Engine The SGE version 6 queue configuration allows for a queue to span more than one execution host to provide multiple hosts per queue configuration. Uses concept of SGE Master node controlling “pools” of compute clients. Can manage up to 10,000 clients per SGE Master node. SGE can provide Load Leveling on the fly. Scheduling can be policy based or topologically based. Addresses the “Backfill” problem. (More on that later.) Queue optimization is not automatic. It requires “tuning”.

13 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SGE - Basic Cluster Configuration Configured to reflect site dependencies and to influence batch system behavior. Site dependencies include valid paths for programs such as mail or xterm. A global configuration is provided for the Master Host as well as for every host in the grid engine system pool. Can configure the system to use a configuration local to each host to override particular entries in the global configuration.

14 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SGE – Cluster Configuration GUI

15 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide SGE – Host Configuration GUI

16 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide (L)oad (S)haring (F)acility - LSF

17 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LSF Scheduling can be policy based or topologically based. Queue optimization is not automatic. It requires “tuning”. Topologically based scheduling can use load information to schedule jobs. Addresses the “Backfill” problem. Jobs in a backfill queue cannot be preempted (a job in a backfill queue might be running in a reserved job slot, and starting a new job in that slot might delay the start of the big parallel job): A backfill queue cannot be preemptable. A preemptive queue whose priority is higher than the backfill queue cannot preempt the jobs in backfill queue.

18 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LSF - How backfilling works LSF assumes that a job will run until its run limit expires. Backfill scheduling works most efficiently when all the jobs in the cluster have a run limit. Since jobs with a shorter run limit have more chance of being scheduled as backfill jobs, users who specify appropriate run limits in a backfill queue will be rewarded by improved turnaround time.

19 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LSF - How backfilling works

20 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LSF - How backfilling works

21 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LSF GUI

22 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide LSF – Cluster Monitoring GUI

23 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Condor Provides a job queuing mechanism Scheduling policy Priority scheme Resource monitoring Resource management.

24 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Users submit their serial or parallel jobs to Condor. Condor places them into a queue. Chooses when and where to run the jobs based upon a policy. Carefully monitors their progress Informs the user upon completion Uses FIFO round-robin scheduling out of the box. Can use attribute-based scheduling.

25 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Condor can be used to build Grid-style computing environments that cross administrative boundaries. Condor's "flocking" technology allows multiple Condor compute installations to work together. Condor incorporates many of the emerging Grid-based computing methodologies and protocols. For instance, Condor-G is fully interoperable with resources managed by Globus.Condor-GGlobus

26 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Condor-DAGMan DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor. It manages dependencies between jobs at a higher level than the Condor Scheduler. DAGMan is responsible for scheduling, recovery, and reporting for the set of programs submitted to Condor

27 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Rocks + Rolls

28 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Rocks + Rolls The complexity of cluster management (e.g., determining if all nodes have a consistent set of software) often overwhelms part-time cluster administrators, who are usually domain application scientists. Rocks is a complete clustering solution with a goal to help deliver the computational power of clusters to a wide range of scientific users.

29 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Rocks + Rolls Before you install Rocks, be sure you have decided what Rolls you wish to include in your installations. You may install whatever you like, however remember you can only choose one scheduler: LSF, SGE, PBS, or Condor. Schedulers do not like being used together due to resource conflicts.

30 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Rocks + Rolls Required Rolls: Base Hpc Kernel Web-server

31 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Rocks + Rolls List of various rolls: Area51System - security related services and utilities GangliaCluster - monitoring system from UCB GridGlobus 4.0.1 (GT4) Condor Roll JavaSun Java SDK and JVM MyrinetMyricom’s Myrinet drivers and MPICH environments PbsPBS - job queueing system NinfNinf-G - a simple, yet powerful, client-server-based standard RPC mechanism SgeSun - Grid Engine job queueing system VizSupport - for building visualization clusters LSF - comes with Platform Rocks

32 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Rocks + Rolls

33 December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Thank You. Questions?


Download ppt "December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez."

Similar presentations


Ads by Google