Gridengine Configuration review ● Gridengine overview ● Our current setup ● The scheduler ● Scheduling policies ● Stats from the clusters.

Slides:



Advertisements
Similar presentations
Chapter 9 Uniprocessor Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College, Venice,
Advertisements

© 2004, D. J. Foreman 1 Scheduling & Dispatching.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 17 Scheduling III.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 16 Scheduling II.
IT Systems Multiprocessor System EN230-1 Justin Champion C208 –
6/25/2015Page 1 Process Scheduling B.Ramamurthy. 6/25/2015Page 2 Introduction An important aspect of multiprogramming is scheduling. The resources that.
Chapter 11 Operating Systems
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
7/12/2015Page 1 Process Scheduling B.Ramamurthy. 7/12/2015Page 2 Introduction An important aspect of multiprogramming is scheduling. The resources that.
Operating Systems (CSCI2413) Lecture 3 Processes phones off (please)
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Chapter 4 Processor Management
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Introduction to the Grid N1™ Grid Engine 6 Software.
Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
Operating Systems Scheduling. Scheduling Short term scheduler (CPU Scheduler) –Whenever the CPU becomes idle, a process must be selected for execution.
Lecturer 5: Process Scheduling Process Scheduling  Criteria & Objectives Types of Scheduling  Long term  Medium term  Short term CPU Scheduling Algorithms.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
Welcome to Indiana University Clusters
CPU SCHEDULING.
EEE Embedded Systems Design Process in Operating Systems 서강대학교 전자공학과
Applied Operating System Concepts -
Copyright ©: Nahrstedt, Angrave, Abdelzaher
William Stallings Computer Organization and Architecture
NGS Oracle Service.
BIMSB Bioinformatics Coordination
Virtual Memory Networks and Communication Department.
Chapter 2.2 : Process Scheduling
Process Scheduling B.Ramamurthy 9/16/2018.
Integration of Singularity With Makeflow
CPU Scheduling.
Operating Systems CPU Scheduling.
ICS 143 Principles of Operating Systems
Process Scheduling B.Ramamurthy 11/18/2018.
Processor Management Damian Gordon.
Chapter 6: CPU Scheduling
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 21: Introduction to Process Scheduling
OverView of Scheduling
Process Scheduling B.Ramamurthy 12/5/2018.
Chapter 2: The Linux System Part 3
Operating Systems.
TDC 311 Process Scheduling.
CPU SCHEDULING.
CPU scheduling decisions may take place when a process:
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 2/23/2019.
Sun Grid Engine.
Lecture 21: Introduction to Process Scheduling
Chapter 10 Multiprocessor and Real-Time Scheduling
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Uniprocessor scheduling
Process Scheduling B.Ramamurthy 4/19/2019.
Uniprocessor Process Management & Process Scheduling
Process Scheduling B.Ramamurthy 4/24/2019.
Process Scheduling B.Ramamurthy 5/7/2019.
GPU Scheduling on the NVIDIA TX2:
Uniprocessor Scheduling
Processor Management Damian Gordon.
Uniprocessor Process Management & Process Scheduling
CPU Scheduling David Ferry CSCI 3500 – Operating Systems
Presentation transcript:

Gridengine Configuration review ● Gridengine overview ● Our current setup ● The scheduler ● Scheduling policies ● Stats from the clusters

Gridengine Overview ● Accepts jobs from the outside world ● Puts jobs in a holding area until they can be run ● Sends jobs from the holding area to an execution device ● Manages running jobs ● Records details about finished job

Gridengine Overview (2) ● Four types of hosts – Execution: runs jobs. – Submit: allowed to submit jobs from – Master: schedules jobs. – Admin: allowed to run admin cluster from. ● Hosts can be many types but only one master (hot spare). ● Could run everything on one host...silly but possible.

Queues (Cluster Queues) ● Container for a class of jobs ● Can define specific resources – large memory machines – specific processor – architecture – time restricted (runtime or time of day/week) ● Contain one or more execution hosts ● Can be preemptive ● Can contain subqueues

Queues(2) ● Queue instance – Each queue is bound to an included execution host via a queue instance – Each execution host can have multiple queue instances attached. – Can have one or more job slots.

Simple configuration ● One cluster queue ● Each execution host has one queue instance ● Jobs are scheduled in FIFO. ● This is the default configuration gridengine ships with.

Our Hardware ● 4 clusters running gridengine – Lion: 64+ nodes (GX240) – Lutzow: 16 nodes (PE530) – Townhill: 34 nodes (PE1425 dual CPU) – Hermes: 24 nodes (PE1425 single CPU) ● 4 head nodes (1 per cluster) ● 1Tb local home directories ● 1Tb “scratch” space

Current setup ● All hosts are admin hosts ● Single “head node” configured as submit/master ● Execution hosts have ssh blocked ● Users ssh onto head node and submit jobs. – Actually they tend to run scripts which submit jobs – Lots of jobs – Not all of them will run properly.

The Scheduler Process

Prioritisation ● Prioritisation based on – Entitlement – Urgency – Custom ● Generates a Dispatch priority ● Real number based on combination of above.

Entitlement ● Priority based on users/groups ● Can be explicit(user A jobs before user B) ● Can allocate ratio of resources (group A get 60% CPU usage over, group B get 40%) ● Share tree allows the allocation to be spread over a defined time period. ● Need to configure information for users/groups

Share tree example

Urgency ● Deadline contribtion – Priority rises closer to deadline specified at submission ● Wait time contribution – Priority rises with time ● Resource contribution – Can assign urgency to a resource (Maltab licenses)

Custom ● Allows for prioritisation based on site specific requirements ● Run arbitrary script which alters priority. ● Defaults to posix priority (like nice) – Users can lower priority – Admin can raise priority

Summary ● Can control job execution based on – Queues: assign specific execution hosts for specific tasks or users/groups. Queues can be calendar controlled. – Scheduler: prioritise jobs based on who submitted them or what resources they require.

Current setup ● Single queue containing all nodes in a cluster ● Limited user/group support (FC5) ● Allocates equal priority to each user with jobs in pending queue.

It's mostly downhill from here

Gathering job data ● Sun dbwriter ● Java script runs on accounting/reporting file and populates postgresql database (42GB footprint). ● Data from Dec/Jan until yesterday “with holes” ● Difficult to analyse some jobs (parallel,stopped jobs)

How many jobs

Hmm thats a lot of short jobs

That's really a lot of short jobs ● Remember all those scripts? ● How many of these jobs actually run for any length of time?

How many jobs (>3min)

Remove the <3min jobs

Tentative conclusions ● Could add more submit hosts/backup scheduler for redundancy (virtualisation). ● Need to set up queue to handle short jobs with quick turnaround ● Also need preempted queue for longer running jobs. ● User scripts can muddy the water, can't assume quiet time for system admin tasks