QoS in the Tier1 batch system(LSF)

Slides:



Advertisements
Similar presentations
Scheduling Introduction to Scheduling
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Operating Systems Process Scheduling (Ch 3.2, )
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Running DiFX with SGE/OGE Helge Rottmann Max-Planck-Institut für Radioastronomie Bonn, Germany DiFX Meeting Sydney.
Chapter 9 Uniprocessor Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2008,
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
1 Uniprocessor Scheduling Chapter 9. 2 Aims of Scheduling Assign processes to be executed by the processor(s) Response time Throughput Processor efficiency.
Workload Management Massimo Sgaravatto INFN Padova.
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
DONVITO GIACINTO (INFN) ZANGRANDO, LUIGI (INFN) SGARAVATTO, MASSIMO (INFN) REBATTO, DAVID (INFN) MEZZADRI, MASSIMO (INFN) FRIZZIERO, ERIC (INFN) DORIGO,
1 Uniprocessor Scheduling Chapter 9. 2 Aim of Scheduling Main Job: Assign processes to be executed by the processor(s) and processes to be loaded in main.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Quantitative Methodologies for the Scientific Computing: An Introductory Sketch Alberto Ciampa, INFN-Pisa Enrico Mazzoni, INFN-Pisa.
Scientific Computing Division Juli Rew CISL User Forum May 19, 2005 Scheduler Basics.
Yeti Operations INTRODUCTION AND DAY 1 SETTINGS. Rob Lane HPC Support Research Computing Services CUIT
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
October 18, 2005 Charm++ Workshop Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 31 – Multimedia OS (Part 1) Klara Nahrstedt Spring 2011.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
Testing the dynamic per-query scheduling (with a FIFO queue) Jan Iwaszkiewicz.
2.5 Scheduling Given a multiprogramming system. Given a multiprogramming system. Many times when more than 1 process is waiting for the CPU (in the ready.
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
Operating Systems Scheduling. Bursts of CPU usage alternate with periods of waiting for I/O. (a) A CPU-bound process. (b) An I/O-bound process. Scheduling.
Uniprocessor Scheduling
Easier Platform Administration using SAS 9.4 Grid Option Sets SAS New South Wales User Group - Nov 2015 Andrew Howell ANJ Solutions Pty Ltd.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Uniprocessor Scheduling Chapter 9. Aim of Scheduling Assign processes to be executed by the processor or processors: –Response time –Throughput –Processor.
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ® GRID AT PHAC SAS OTTAWA PLATFORM USERS SOCIETY, NOVEMBER 2012.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems.
INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.
Sharing Resources Lesson 6. Objectives Manage NTFS and share permissions Determine effective permissions Configure Windows printing.
Chapter 9 Uniprocessor Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2008,
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
 Tata consultancy services Production Planning WORK CENTERS.
1 Platform LSF6 What’s new in LSF6
Job Priorities and Resource sharing in CMS A. Sciabà ECGI meeting on job priorities 15 May 2006.
CE design report Luigi Zangrando
EGEE-II INFSO-RI Enabling Grids for E-sciencE Simone Campana (CERN) Job Priorities: status.
1 Uniprocessor Scheduling Chapter 9. 2 Aim of Scheduling Assign processes to be executed by the processor(s) Response time Throughput Processor efficiency.
ITMT Windows 7 Configuration Chapter 6 – Sharing Resource ITMT 1371 – Windows 7 Configuration 1.
Gridengine Configuration review ● Gridengine overview ● Our current setup ● The scheduler ● Scheduling policies ● Stats from the clusters.
Dynamic Extension of the INFN Tier-1 on external resources
Workload Management Workpackage
Scheduling systems Carsten Preuß
OpenPBS – Distributed Workload Management System
Farida Naz Andrea Sciabà
CREAM-CE/HTCondor site
Uniprocessor Scheduling
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Uniprocessor Scheduling
Lecture 24: Process Scheduling Examples and for Real-time Systems
Operating Systems CPU Scheduling.
Basic Grid Projects – Condor (Part I)
Jason Neih and Monica.S.Lam
TDC 311 Process Scheduling.
Operating systems Process scheduling.
Chapter 9 Uniprocessor Scheduling
GRUBER: A Grid Resource Usage SLA Broker
QoS and SLA in INFN Grid INFN team: Andrea Ceccanti, Vincenzo Ciaschini, Alberto Forti, Andrea Ferraro, Valerio Venturi Location Catania (Italy) Date 4/3/2008.
Scheduling.
Uniprocessor Process Management & Process Scheduling
Experiences in Running Workloads over OSG/Grid3
Uniprocessor Scheduling
Uniprocessor Process Management & Process Scheduling
Presentation transcript:

QoS in the Tier1 batch system(LSF) Alessandro Italiano (INFN-CNAF) Tier1 - Farming Group

QoS definition From WikiPedia: http://en.wikipedia.org/wiki/QoS Quality of service is the ability to provide different priority to different applications and users in resource usage. QoS mechanisms are not required if there is not resource contention

Tier1 scenario More than 20 different Experiments(Application) Each Experiment has several computing activities with different priorities Each year the Tier1 committee defines the highest amount of resources that each Experiment can use

From LSF Documentation FairShare definition From LSF Documentation Fairshare scheduling divides the processing power of the LSF cluster among users to provide fair access to resources, so that no user or subgroup of users can monopolize the resources of the cluster

Hierarchical FairShare a first level of QoS Define dynamic priorities for every group/subgroup Dynamically grants a resource quota to each group/subgroup Used only where there is resource contention Optimized resource usage

Hierarchical Fairshare: Parameters Share assigned Resource percentage assigned to every group e subgroup Resource usage Time Window time slot used to compute the total amount of resources used by every group Normalization factors Dynamic priority formula: Share DP = (ResourceUsage x Nf) + 1

Hierachical Fairshare: How it works cms rel share = 15 abs share = 4.5 cmsprd rel share = 70 abs share = 21 cmssgm alice abs share = 4.05 alicesgm rel share = 85 abs share = 22.95 Available Resources CMS share = 30 ALICE share = 27 SHARE_INFO_FOR: SLC4_GLOBAL/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIMEgroup_test 10000 3236.948 0 0 3621.6 4588group_admin 1000 333.333 0 0 0.0 0group_dteam 1000 333.325 0 0 0.8 4group_egee 1000 325.245 0 0 16.4 3836group_ops 1000 142.119 1 0 26107.9 53260group_magic 174 55.994 0 0 4420.6 5522group_ams 45 15.000 0 0 0.0 0group_ingv 31 10.333 0 0 0.0 0group_theophys 31 10.333 0 0 0.0 0group_biomed 31 10.333 0 0 0.0 0group_t1bio 31 10.333 0 0 0.0 0group_cdfcaf 1616 3.266 34 0 7585199.5 20034766group_infngrid 6 2.000 0 0 0.3 6group_pamela 35 1.111 0 0 821776.2 1464160group_lhcb 1355 0.883 153 0 813004.1 55167538u_cms 1665 0.809 449 0 14599556.0 36415071group_babar 1691 0.784 388 0 15616334.0 50845584u_atlas 1514 0.640 451 0 21058258.0 51814853group_alice 1041 0.638 439 0 1478367.4 15974277group_argo 401 0.637 159 0 3155703.0 7686056group_virgo 348 0.392 33 0 51287.3 40428843

Hierarchical Fairshare: constraint In case of no intra-VO resources contention, one user could use all the resources available to his experiment. In this way all the others users, also those belong to high priority group, could wait for a long time before to run a job

LSF SLA Second level of QoS LSF Service Level Agreement are batch system functionalities which can provide different service level goals oriented. There are four goals available: Deadline: complete a specified #jobs in a time window Velocity: maintain #jobs running in a time window. Used for short jobs Throughput: complete #jobs per hour. Used for medium and long jobs Combination of different goals

the specific SLA to each user or subgroup LSF SLA: Constraint You can’t configure a specific queue or user subgroup to use a SLA, because SLAs can be only invoked at submission time. To avoid this limitation the batch manager can easily provide an automatic hook in order to grant the specific SLA to each user or subgroup

A detail which can improve QoS: One queue for each Application in order to customized execution environment and make easier the administration of application requirements Run time resources limits Dedicate computing resources Use specific computing architectures Queue administrator Scheduling parameters Pre and post execution script ……

How GRID can match the right service class ? LSF QoS Role: cms QoS: Low Priority Role: cmsprd QoS: Medium Priority Role: cmssgm QoS: High Priority

Matching service class: Statically GRID LSF QoS Role: cms QoS: Low Priority lcmaps configuration file "/VO=cms/GROUP=/cms/ROLE=lcgadmin" cmssgm"/VO=cms/GROUP=/cms/ROLE=production" .cmsprd"/VO=cms/GROUP=/cms/HeavyIons/" .cms Role: cmsprd QoS: Medium Priority Role: cmssgm QoS: High Priority

Matching service class: Dynamically GRID LSF QoS Role: cms QoS: Low Priority GPBox Role: cmsprd QoS: Medium Priority Role: cmssgm QoS: High Priority