Migration to 7.4, Group Quotas, and More William Strecker-Kellogg Brookhaven National Lab.

Slides:



Advertisements
Similar presentations
CamGrid Mark Calleja Cambridge eScience Centre. What is it? A number of like minded groups and departments (10), each running their own Condor pool(s),
Advertisements

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Condor at Brookhaven Xin Zhao, Antonio Chan Brookhaven National Lab CondorWeek 2009 Tuesday, April 21.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
No Idle Cores DYNAMIC LOAD BALANCING IN ATLAS & OTHER NEWS WILLIAM STRECKER-KELLOGG.
Tier-1 Batch System Report Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier 5 June 2013, HEP SYSMAN.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Grid Computing I CONDOR.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
PROOF work progress. Progress on PROOF The TCondor class was rewritten. Tested on a condor pool with 44 nodes. Monitoring with Ganglia page. The tests.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
Condor Usage at Brookhaven National Lab Alexander Withers (talk given by Tony Chan) RHIC Computing Facility Condor Week - March 15, 2005.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Integrating HTCondor with ARC Andrew Lahiff, STFC Rutherford Appleton Laboratory HTCondor/ARC CE Workshop, Barcelona.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.
References A. Silberschatz, P. B. Galvin, and G. Gagne, “Operating Systems Concepts (with Java)”, 8th Edition, John Wiley & Sons, 2009.
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
Compute and Storage For the Farm at Jlab
HPC In The Cloud Case Study: Proteomics Workflow
Review of the WLCG experiments compute plans
Processes and threads.
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Embeddable Discussions Ivelin Elenchev
Workload Management System
Matchmaker Policies: Users and Groups HTCondor Week, Madison 2017
Moving from CREAM CE to ARC CE
William Stallings Computer Organization and Architecture
CREAM-CE/HTCondor site
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Accounting, Group Quotas, and User Priorities
Basic Grid Projects – Condor (Part I)
Operating Systems.
GLOW A Campus Grid within OSG
Presentation transcript:

Migration to 7.4, Group Quotas, and More William Strecker-Kellogg Brookhaven National Lab

RHIC/ATLAS Computing Facility Overview Physics Dept. at Brookhaven National Lab—provides computing and storage to active RHIC experiments Serves as a Teir-1 for ATLAS computing Uses Condor to manage RHIC and ATLAS compute clusters 14.4k cores running SL5.3 currently With new hyper- threaded 12-core Westmere systems, to grow to over 20k cores

RHIC/ATLAS Computing Facility Overview One instance for ATLAS 5100 slots 2 submit nodes manage all production/analysis jobs Other smaller queues managed on 3 other submit nodes Instance each for STAR and PHENIX experiments 4300, 4500 slots resp. 20 submit nodes each “General Queue”— flocking between RHIC pools Smaller experiments grouped into another instance

New Since Last Year New condor administrator Migration to Up from 6.8.9—long overdue Move ATLAS to Group Quotas Easier configuration—from 16 configuration files to 5— 90% of slots use just 1) Management via web-interface Some problems we’ve had to solve…more later

Upgrade to Get rid of suspension model Undesirable to have slots =!= real cores Simplify START expression Better negotiator performance, results later Bugfixes all around From this To this

Group Quotas ATLAS only, for now PHENIX to follow suit in a few months No plans for STAR What groups buy us: Manage ATLAS production/analysis jobs separately from many other smaller queues Unify configuration files—one config for vast majority of ATLAS nodes

ATLAS Group Quotas Reallocation of resources between queues managed via web interface A Day in the Life of ATLAS

Issues Moving to Group Quotas Backwards compatibility with our accounting and monitoring systems Solution: Retain previous job-type flag that used to hard-partition slots How does it interact with flocking? Fairness / Enforcement of group memberships “We rely on societal enforcement” Not good enough…solution for ATLAS ATLAS uses PANDA, we control local submission Other users few enough to monitor individually

Issues Moving to Group Quotas PHENIX: two classes—user and special jobs Special jobs submitted from few machines, separate users User jobs from 20 submit nodes Two solutions Submit node based partition: regex-match GlobalJobID ClassAd against list of valid sources in START expr. Coexist with users: three configs, user nodes w/ no AccountingGroup flag, shared nodes that run anything but are ranked last by user and special jobs, and special nodes requiring AG flag

Group Priorities ATLAS setup: three main groups 1)Production: highest prio., hard quota, no preemption 2)Short analysis: medium prio., auto-regroup on, preemption enabled 3)Long analysis: lowest prio., auto-regroup on, preemption enabled Idea—short and long analysis spill over into each other as needed and not be squeezed out by production Problem—sometimes short and long will “eat into” production even when they are over-quota and production is under its quota

ATLAS Priority Inversion Group-priority affects only order of negotiation When an analysis queue starts up after a quiet period, production starts to lose out. Even though production is below its quota it loses slots to analysis jobs because they get negotiated for first. Negotiation should stop for a queue that is over quota (w/ auto-regroup on) and there are other queues with waiting jobs below their quotas. Problem area

ATLAS Priority Inversion Solution? Increasing the spread of priority factors as more lots get added to production. Required spread scales with size of the largest queue, and if another queue quiesces for long enough it will outrank production E.g. Production goes from 3k to 4k slots: usage increases 33% making its priority that much worse and an inversion that much more likely to occur…

Negotiator Performance Performance from one day of the STAR negotiator Cycle occasionally blocks waiting for a schedd (many successes near the default 30s timeout) In case where many submitters are on many machines each, much wasted time

Issues with Scheduler/Negotiator User frequently polling large queue Schedd would fork a child which would use 5-10s of CPU time to answer query (1.6Gb Size!) Auto-clustering sometimes doesn’t skip similar jobs Globally-scoped ClassAds would be nice, e.g. for the usage of a shared scratch NFS filesystem

Puppet-ize Configuration Files New Puppet-based centralized configuration management system of general-purpose servers Will templatize condor configuration Configuration done using a Ruby-based object-oriented templating language Suitably scary at first…but worth the effort

Motivation to use Puppet Configuration is similar in structure between experiments Memory Limits for regular and flocked jobs Preemption/Retirement-time on a per-job-type basis Policy expressions (RANK/START/etc…) List of currently blocked users Recent blocking/unblocking of users took editing 6 different files and a reconfig everywhere Using Puppet would separate each logical entity, making it easy to change things on a per-entity basis, and would automate pushing of changes and reconfiguration. All changes versioned in git—accountability and reliability

Questions? Comments?

CRS Job System Written in Python, submits to condor Asynchronous IO for staging data from tape Stages in and out are done outside of condor Previously done with extra slots, not good aesthetically and otherwise Can combine stage requests for jobs intelligently Abstraction layer for IO, similar to plugins Own basic workflow tools—DAGs not suitable

ATLAS Shared Pool Allow smaller ATLAS queues and OSG grid jobs to run in their own queue that can utilize a small shared pool of resources Implemented with Group Quotas Jobs “compete” for extra slots made available by ATLAS Necessitates adding AG flag by users (small enough it works)