Resource Management for High-Throughput Computing at the ESRF G

Slides:



Advertisements
Similar presentations
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Advertisements

Lecture 1: History of Operating System
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
1: Operating Systems Overview
OPERATING SYSTEM OVERVIEW
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Introduction Optimizing Application Performance with Pinpoint Accuracy What every IT Executive, Administrator & Developer Needs to Know.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Computer Emergency Notification System (CENS)
Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
OAR : a batch scheduler Grenoble University LIG (Mescal Team)
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Resource Management IB Computer Science.
Processes and threads.
OpenPBS – Distributed Workload Management System
Chapter 3: Process Concept
EEE Embedded Systems Design Process in Operating Systems 서강대학교 전자공학과
Software Architecture in Practice
Hands-On Microsoft Windows Server 2008
GWE Core Grid Wizard Enterprise (
Operating Systems (CS 340 D)
Sujata Ray Dey Maheshtala College Computer Science Department
Chapter 2: System Structures
William Stallings Computer Organization and Architecture
Architecture & System Overview
Introduction to Operating System (OS)
CRESCO Project: Salvatore Raia
Client-Server Interaction
Operating Systems (CS 340 D)
Chapter 1: Introduction
Operating Systems CPU Scheduling.
Process management Information maintained by OS for process management
Bruce Pullig Solution Architect
Lecture 2: Processes Part 1
ICS 143 Principles of Operating Systems
Privilege Separation in Condor
Process & its States Lecture 5.
Operating systems Process scheduling.
CPU SCHEDULING.
Simulation in a Distributed Computing Environment
Sujata Ray Dey Maheshtala College Computer Science Department
Chapter 2: Operating-System Structures
CS385T Software Engineering Dr.Doaa Sami
Ainsley Smith Tel: Ex
Sun Grid Engine.
Chapter 3: Processes.
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
LO2 – Understand Computer Software
Quick Tutorial on MPICH for NIC-Cluster
Chapter 2: Operating-System Structures
Kajornsak Piyoungkorn,
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Introduction to research computing using Condor
Presentation transcript:

Resource Management for High-Throughput Computing at the ESRF G Resource Management for High-Throughput Computing at the ESRF G. Förstner and R. Wilcke ESRF, Grenoble (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Why and how high-throughput computing? Basic problem: process user data “fast” – bigger and faster detectors – data rate increases – time available for analysis gets shorter Brute-force solution: buy more faster computers – Moore’s law: transistor density doubles every 18 months – limits: mono-atomic layers – higher clock speed Þ exponential temperature rise – may need more support staff, new software licenses, electrical power, cooling, rooms... Þ total cost of ownership can become unacceptable (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Thus: brute force no longer sufficient Þ look at precise needs of “fast data processing” Two scenarios are typically found: – many (possibly small) sets of independent data Þ important is overall throughput, not execution time for each job – (possibly few) sequence-dependent data sets with much data and / or complex processing Þ important is elapsed time / job, as job n must finish before n + 1 can start (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Improve overall throughput: – split task in several (many) independent jobs – distribute jobs to several (many) different processors – select “most appropriate” processor for each job – scales well with number of processors – no change to program code needed Þ simple parallel processing Reduce elapsed time for each job: – distribute each job over several processors – (re-)structure program into independent loops – optimize data access for each processor – needs change (possibly restructuring/rewrite) of program code Þ task for parallel programming (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Even parallelization has its limits (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Very expensive to provide for “worst case” Better: use existing resources more efficiently – select “most appropriate” processor for each job – balance processing load between processors – fill under-used periods (nights, weekends, ...) Þ task for resource management system Required features: – resource distribution – resource monitoring – job queuing mechanism – scheduling policy – priority scheme ESRF uses OAR (ENSIMAG Grenoble: oar.imag.fr) (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Resource Manager OAR: Basic Use – OAR features: – interactive or batch mode – controls processor and memory placement of jobs (cpuset) – request resources (cores, memory, time...) – specify properties (manufacturer, speed, network access...) – can define installation-specific properties and rules – manage OAR with 3 basic commands: – request resources (submit job) (oarsub) – inquire status of requests (oarstat) – if necessary, cancel requests (oardel) – after submitting jobs, user can log off and go home – OAR starts jobs when convenient, delivers results (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR at ESRF: present status – 1 public, 2 private clusters: total 109 nodes, 803 CPUs – access to clusters only via front-end computers (30 nodes) – no direct login to high-performance computers – dedicated clusters are strong incentive for using OAR – front-ends are deliberately low performance: discourage production jobs – any computer can be front-end: need to install OAR client (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR cluster: usage statistics (January - February 2013) jobs submitted jobs running jobs waiting (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR cluster: runtime distribution < 5 m: 42.5% 5 m - 3 h: 51.1% > 3 h: 6.4% interactive: < 0.4% (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR monitoring tools: for efficient resource management, need to know: – load on computing system (active / waiting jobs, available / free / used resources, ...) important for users: – see available resources – estimate wait time for jobs – hardware status of computing system (hosts down, memory use, network traffic, ...) important for administrators to spot problems – DrawOarGantt (part of OAR): shows load on computing system – Ganglia (ganglia.sourceforge.net): monitors hardware of system (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR Status Monitoring with DrawOarGantt: shows scheduling of jobs to computers: user, # cores requested, times of submission, start and end users also see free resources: useful to start interactive jobs (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Hardware Monitoring with Ganglia: – show performance (e.g. load, memory, network traffic... can be configured) – select time period (hour, day, week, month, year) (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

How well is OAR filling underused periods? interactive day managed – OAR fills evening / night much better (interactive almost empty) – low OAR use in morning (users looking at results?) – few long-running jobs (> 1 day) managed week interactive – OAR effect even stronger on weekend – but system frequently still underused (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR cluster: Spotting problems with Ganglia Obviously something wrong! (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

But problems can be more subtle: – constant but small “Memory Swapped” – slight “1-min Load” overload around 17:30 – incoming network traffic spikes around 17:25 (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Look at node list gives clearer picture: rnice32 heavily overloaded (up to factor 17) for the last 10 minutes (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Confirm by looking at the node: – node obviously has way too many jobs – now can try to find out who and do something – but: administrator must be aware of problem first! Impossible to monitor manually Þ need alarm tool (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Icinga alarm monitor (www.icinga.org, fork from Nagios) – Zulu for “it examines” (with click sound for the “c”) – monitors network services, host resources and software problems – integrates installation-developed services and checks – notifies of problems via email and monitor window (Nagstamon) – more detailed status information from the web interface (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR: Problems / FAQs (1) Syntax of submit command is not user-friendly – true (for non-SQL specialists) – in sample of 618428 requests, » 12% terminate with error – simple request (gives 1 core and default time 2 h): interactive (oarsub -I) or not (oarsub prog_name) – complicated request: write script file using documentation, modify script as needed – most efficient way to accomplish a task often not obvious Þ users need help to set up and run OAR jobs (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR: Problems / FAQs (2) OAR manages short jobs badly – startup overhead » 25 s / request – “normal” OAR not suited for requests < 2 min runtime – but, in sample of 541938 requests, » 30 % had < 2 min (23 % even < 1 min)! – very inefficient way of operating – many short requests create big load: OAR scheduler essentially stops – had to create small cluster without OAR for short jobs: access via ssh hostname prog_name – can be handled better in OAR: e.g. submit one script that starts many short jobs (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

OAR: Problems / FAQs (3) OAR kills jobs after time limit – do not set time limit too tight – but not too generous: low priority from scheduler average OAR user overestimates time by » factor 10 – use checkpointing to send signal to process before time limit – job needs to catch signal and terminate in controlled fashion – also usable to stop jobs without losing all results – useful to stop / reboot computers with long-running jobs – in sample of 541938 requests » 6 % had > 3 h (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013

Conclusions: Acknowledgements: – OAR helps to fill underused periods (nights / weekends) – efficient and relatively easy for batch jobs – administration of interactive jobs more complex – need hardware and OAR queue monitoring for efficiency – system at ESRF frequently underused – problems in particular with very short and very long jobs – users need help to set up OAR jobs still room for improvement Þ watch this space! Acknowledgements: – C. Ferrero, J. Kieffer, A. Mirone and B. Rousselle – ESRF Systems & Communications - Unix Unit – ESRF Data Analysis Unit (parallel) Computing for Neutron and Photon Science, DESY 04-05/03/2013