Job-Property-Based Scheduling at Sites This is about the way sites present their local resources at the Grid Interface level – So far Sites maintained.

Slides:

Advertisements

Similar presentations

Service Capabilities vs. Attributes

Advertisements

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.

Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

1 October 2013 APF Summary Oct 2013 John Hover John Hover.

Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 16 Scheduling II.

SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.

Operating Systems 1 K. Salah Module 2.1: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.

CS444/CS544 Operating Systems Scheduling 1/31/2007 Prof. Searleman

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Björn Landfeldt School of Information Technologies Investigating a theoretical model Bjorn Landfeldt University of Sydney.

Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova

Distributive Property

Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Condor at Brookhaven Xin Zhao, Antonio Chan Brookhaven National Lab CondorWeek 2009 Tuesday, April 21.

Objectives of the Lecture :

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

Process Description and Control. Process concepts n Definitions – replaces task, job – program in execution – entity that can be assigned to and executed.

Operating Systems.  Operating System Support Operating System Support  OS As User/Computer Interface OS As User/Computer Interface  OS As Resource.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

STAR scheduling future directions Gabriele Carcassi 9 September 2002.

1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

1 Overview of the Application Hosting Environment Stefan Zasada University College London.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April

F.Pacini - Milan - 8 May, n° 1 Results of Meeting on Workload Manager Components Interaction DataGrid WP1 F. Pacini

Grid Workload Management Massimo Sgaravatto INFN Padova.

Grid job submission using HTCondor Andrew Lahiff.

Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.

Uniprocessor Scheduling Chapter 9. Aim of Scheduling Minimize response time Maximize throughput Maximize processor efficiency.

Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.

Multi-core jobs at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly February 25 th 2014.

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.

Resource Management Task Report Thomas Röblitz 19th June 2002.

FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.

A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.

Class Builder Tutorial Presented By- Amit Singh & Sylendra Prasad.

Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Workload management, virtualisation, clouds & multicore Andrew Lahiff.

DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.

Lecture Topics: 11/15 CPU scheduling: –Scheduling goals and algorithms.

INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.

Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.

Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.

The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.

CERN IT Department t SDC 1 AGIS: Topology & SchedConfig parameters Production /Analysis share configuration The HTTP ecosystem 3 September.

EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz The future of AliEn.

HTCondor-CE for USATLAS Bob Ball AGLT2/University of Michigan OSG AHM March, 2015 Bob Ball AGLT2/University of Michigan OSG AHM March, 2015.

EGEE-III INFSO-RI Enabling Grids for E-sciencE VO Authorization in EGEE Erwin Laure EGEE Technical Director Joint EGEE and OSG Workshop.

First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.

Integrating HTCondor with ARC Andrew Lahiff, STFC Rutherford Appleton Laboratory HTCondor/ARC CE Workshop, Barcelona.

Best 20 jobs jobs sites.

A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.

OPERATING SYSTEMS CS 3502 Fall 2017

First proposal for a modification of the GIS schema

Distributive Property

Grid Resource Allocation Agreement Protocol Working Group

Lecture 23: Process Scheduling for Interactive Systems

The Condor JobRouter.

Improving ARC backends: Condor and SGE/GE LRMS interface

Distributive Property

Distributive Property

Presentation transcript:

Job-Property-Based Scheduling at Sites This is about the way sites present their local resources at the Grid Interface level – So far Sites maintained 1:1 correspondence between PanDA queues, Globus RSL queue (and other parameters) and internal Site Batch queues With increasing number of functionally specialized PanDA queues (e.g. MCORE, LMEM, SHORT) this may no longer be desirable – Directly effects how pilot submission is handled APF is in the middle between the grid view of the site, and the state of the WMS queues We could improve by eliminating grid-level queues and always submit Jobs with their requirements expressed as standard attributes (e.g. cpucount, maxwalltime, maxmemory) – E.g. this would allow us to get rid of “PROD” vs. “ANALY” – If the job properties are ~the same we could let PanDA broker both types to the same PanDA Pilot PanDA is doing this already with different sub-job types, e.g. evgen vs. reco vs. simul using priorities and desired percentages in AGIS – If we don’t simplify, PanDA queue and Pilot Factory entities will become more complicated and numerous “R” job requirements and “T” types of work => (“R * T”) PanDA and Factory queues

Conclusion (1/2) The diagram assumes a static number of bins, represented by PanDA queues – Saves needing PROD or ANALY queues (or any queues at all) at the Grid Interface to the Sites – Work can be submitted to the Site’s Default Queue APF/PanDA Interactions – APF must be able to query PanDA for ready/activated Jobs within an equivalence class Each Pilot guaranteed to have a Job for dispatch to it Any system where equivalence classes can change, are inconsistent, or overlapping, is unworkable

Conclusion (2/2) PROD vs. ANALY – Goal is to dispense PROD vs. ANALY concept – ANALY vs. PROD used as a proxy for I/O intensity Better would be to let Scout calculate walltime and cputime and calculate I/O intensity factor Site Processing – Proposed concept works with popular Batch Systems – E.g. HTCondorCE job router provides hooks for scripts to transform incoming Class Ad to local Batch ClassAd