Experience on HTCondor batch system for HEP and other research fields at KISTI-GSDC Sang Un Ahn, Sangwook Bae, Amol Jaikar, Jin Kim, Byungyun Kong, Ilyeon.

Slides:

Advertisements

Similar presentations

UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison The Bologna.

Advertisements

Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service.

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.

HTCondor scheduling policy

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

Priority and Provisioning Greg Thain HTCondorWeek 2015.

First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova

Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.

The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.

Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:

HTCondor at the RAL Tier-1 Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Tier-1 Batch System Report Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier 5 June 2013, HEP SYSMAN.

Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)

Grid Computing I CONDOR.

November 16, 2012 Seo-Young Noh Haengjin Jang {rsyoung, Status Updates on STAR Computing at KISTI.

Grid job submission using HTCondor Andrew Lahiff.

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

Multi-core jobs at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly February 25 th 2014.

Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.

Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.

Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.

Pilot Factory using Schedd Glidein Barnett Chiu BNL

Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.

A Year of HTCondor at the RAL Tier-1 Ian Collier, Andrew Lahiff STFC Rutherford Appleton Laboratory HEPiX Spring 2014 Workshop.

HTCondor Private Cloud Integration Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Matchmaker Policies: Users and Groups HTCondor Week, Madison 2016 Zach Miller Jaime Frey Center for High Throughput.

HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!

Gabi Kliot Computer Sciences Department Technion – Israel Institute of Technology Adding High Availability to Condor Central Manager Adding High Availability.

GSDC: A Unique Data Center in Korea for Fundamental Research Global Science experimental Data hub Center Korea Institute of Science and Technology Information.

Introduction to Distributed HTC and overlay systems Tuesday morning, 9:00am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California.

OSG Consortium Meeting - March 6th 2007Evaluation of WMS for OSG - by I. Sfiligoi1 OSG Consortium Meeting Evaluation of Workload Management Systems for.

UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.

Condor Week 2006, University of Wisconsin 1 Matthew Norman Using Condor Glide-ins and GCB to run in a grid environment Elliot Lipeles, Matthew Norman,

Debugging Common Problems in HTCondor

HTCondor and TORQUE/Maui

Scheduling Policy John (TJ) Knoeller Condor Week 2017.

Elastic Computing Resource Management Based on HTCondor

HTCondor Security Basics

Quick Architecture Overview INFN HTCondor Workshop Oct 2016

Dynamic Deployment of VO Specific Condor Scheduler using GT4

Scheduling Policy John (TJ) Knoeller Condor Week 2017.

Examples Example: UW-Madison CHTC Example: Global CMS Pool

Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)

Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")

Workload Management System

Matchmaker Policies: Users and Groups HTCondor Week, Madison 2017

High Availability in HTCondor

CREAM-CE/HTCondor site

Adding High Availability to Condor Central Manager Tutorial

Monitoring HTCondor with Ganglia

Artem Trunov and EKP team EPK – Uni Karlsruhe

Accounting in HTCondor

The Scheduling Strategy and Experience of IHEP HTCondor Cluster

HTCondor Command Line Monitoring Tool

Negotiator Policy and Configuration

Accounting, Group Quotas, and User Priorities

Job Matching, Handling, and Other HTCondor Features

Condor Glidein: Condor Daemons On-The-Fly

Basic Grid Projects – Condor (Part I)

HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.

Condor Administration in the Open Science Grid

Condor-G Making Condor Grid Enabled

GLOW A Campus Grid within OSG

Negotiator Policy and Configuration

Job Submission Via File Transfer

PU. Setting up parallel universe in your pool and when (not

Presentation transcript:

Experience on HTCondor batch system for HEP and other research fields at KISTI-GSDC Sang Un Ahn, Sangwook Bae, Amol Jaikar, Jin Kim, Byungyun Kong, Ilyeon Yeo 13 October 2016 @CHEP2016, San Francisco

Introduction GSDC Job Profiling for 2015 showed that resource utilization of global services varies from 65%(B) to 90%(A) while that of local services is below 25%(D) (worst 0.7%(C)) Consistent job throughput from Grid consumed resources effectively (A, B) Chaotic job activities of local users led to low resource utilization (C, D) A: ALICE, B: Belle II, C: CMS, D: LIGO S. U. Ahn, J. Kim, “Profiling Job Activities of Batch Systems in the Data Center”, PlatCon 2016 2016-10-13 CHEP2016

Goal To improve resource utilization by allocating idle resources to where demanded (dynamic resource management) Constraints: Resources are dedicated to each experiment based on MoU and being audited regularly on physical allocation Dedicated resources are preferred by users since they can secure resources when crowded, esp. before conferences Batch systems are different: Torque for EMI (ALICE and Belle II), HTCondor for OSG (CMS) and LDG (LIGO) Queues cannot be shared between Torque and HTCondor 2016-10-13 CHEP2016

Goal #2 To have one batch system for all: HTCondor Already have operation experience Actively developed and widely used nowadays Some issues with Torque Instability issue with Maui scheduler – required to re-start occasionally Workaround: setting a cron job to check the health of maui service and re-start if required Any change in “nodes” requires Torque to be re-started Not applicable to a huge pool (more than thousands) Constraints: HTCondor is incompatible with CREAM-CE 2016-10-13 CHEP2016

Procedure Step 1: Set-up a HTCondor pool for all Local farms Step 2: In addition to CMS and LIGO, there are a few more experiments we support: Genome (HTCondor), RENO and TEM (Torque) Step 2: Set-up a bigger pool including Grid farms Replacing CREAM-CE by HTCondor-CE Or making some modification on communicating part between CREAM-CE and Batch system 2016-10-13 CHEP2016

Obstacles Step 1: We have some experience on HTCondor but, resource dedication does not require complicated scheduling policy i.e. we have few knowledges on HTCondor configuration Step 2: CREAM-CE is not easily replaced by HTCondor-CE since EMI middleware does not support HTCondor-CE Modification on CREAM-CE requires additional man-power 2016-10-13 CHEP2016

Requirements Dynamic Resource Allocation within a HTCondor Pool Resources written in MoU should be guaranteed when users demand Separate User Interfaces Sharing UI among different user groups may cause some issues: Compilation on UI before job submission may affect overall performance of the machine badly Exposure of mount points for experiment data or scratch may have potential security glitches even though the access by others is not allowed Remote Submission Independent Schedd machines managing shared queues High Availability Schedd Central Manager: Collector, Negotiator 2016-10-13 CHEP2016

Test-bed Setup UI Schedd UI Schedd CM CM Startd 2016-10-13 CHEP2016 ui1.example.com ui2.example.com UI condor_q condor_submit Schedd UI Schedd sched1.example.com sched2.example.com condor_status Matchmaking CM CM Machine Info Startd wn1.example.com … wn9.example.com cm1.example.com cm2.example.com UID_DOMAIN = example.com CENTRAL_MANAGER1 = cm1.example.com CENTRAL_MANAGER2 = cm2.example.com CONDOR_HOST = $(CENTRAL_MANAGER1), $(CENTRAL_MANAGER2) 72 slots 2016-10-13 CHEP2016

HA Daemons UI Schedd UI Schedd CM CM Startd 2016-10-13 CHEP2016 SCHEDD_NAME = had_schedd@ MASTER_HA_LIST = SCHEDD HA_LOCK_URL = file:/var/lib/condor/spool (NFS exported) VALID_SPOOL_FILES = $(VALID_SPOOL_FILES) SCHEDD.lock UI Schedd UI Schedd CM CM Startd DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, HAD, REPLICATION STATE_FILE = $(SPOOL)/Accountantnew.log 2016-10-13 CHEP2016

Accounting Group Define Accounting Group for sharing resources among several user groups negotiator configuration GROUP_NAMES = group_alice, group_cms, group_ligo, group_reno … Place quota and allow to exceed the limit GROUP_QUOTA_group_alice, GROUP_QUOTA_group_cms, … GROUP_ACCEPT_SURPLUS = True Preemption enabled to guarantee the quota NEGOTIATOR_CONSIDER_PREEMPTION = True PREEMPTION_REQUIREMENTS = $(PREEMPTION_REQUIREMENTS) && (((SubmitterGroupResourcesInUse < SubmitterGroupQuota) && (RemoteGroupResourcesInUse > RemoteGroupQuota)) || (SubmitterGroup =?= RemoteGroup)) Ref: http://erikerlandson.github.io/blog/2012/06/27/maintaining-accounting-group-quotas-with-preemption-policy/ 2016-10-13 CHEP2016

DEMO No activity from group_alice, then everybody freely shares the resources as much as they want A group_alice user login and check the status and it shows that only 12 slots are available [alice_user1@sched1 ~]$ condor_status -format "%s" AccountingGroup -format " | %s" State -format " | %s\n" Activity -constraint 'True' | sort | uniq -c | awk '{print $0; t += $1 } END { printf("%7d total\n",t)}' 10 group_belle.belle_user1@example.com | Claimed | Busy 10 group_cms.cms_user1@example.com | Claimed | Busy 10 group_genome.genome_user1@example.com | Claimed | Busy 20 group_ligo.ligo_user1@example.com | Claimed | Busy 10 group_reno.reno_user1@example.com | Claimed | Busy | Unclaimed | Idle 72 total Regardless the number of slots available, alice_user1 claims 40 slots [alice_user1@sched1 ~]$ condor_submit job Submitting job(s)........................................ 40 job(s) submitted to cluster 32. 40 group_alice.alice_user1@example.com | Claimed | Busy 4 group_belle.belle_user1@example.com | Claimed | Busy 10 group_cms.cms_user1@example.com | Claimed | Busy 3 group_genome.genome_user1@example.com | Claimed | Busy 11 group_ligo.ligo_user1@example.com | Claimed | Busy 4 group_reno.reno_user1@example.com | Claimed | Busy 72 total GROUP_QUOTA_group_alice = 40 GROUP_QUOTA_group_cms = 12 GROUP_QUOTA_group_ligo = 8 GROUP_QUOTA_group_belle = 4 GROUP_QUOTA_group_reno = 4 GROUP_QUOTA_group_genome = 4 Qouta ligo user gets cms unused slots Preempted jobs goes to idle state … accounting_group = group_<exp> accounting_group_user = <exp>_user1 Job Description File 2016-10-13 CHEP2016

Remarks Delicate allocation policy is required when preemption enforced Treatment on preempted jobs: Kill or Suspend? Checkpoint would help preempted jobs resumed in other places Checkpoint is known to only work with Standard Universe But there is a way that it works with Vanilla Universe: http://www.ucs.cam.ac.uk/scientific/camgrid/technical/blcr Fairshare based on user priority affects in a way that preemption does not work Setting PRIORITY_HALFLIFE to high enough makes user priority (effectively) constant By default, remote job submission requires stronger security Password, FS (or FS_remote) are not working GSI, Kerberos methods has to be setup 2016-10-13 CHEP2016

To do Fine tuning on negotiator configuration is required to deploy the test-bed setup in production level Should come with delicate resource policy Setup and test a Dedicated Scheduler for jobs submitted with Parallel Universe in order to allow a job to be run on two or more physical machines at the same time 30% of Local LIGO user jobs run with MPI HTCondor-CE study for Grid 2016-10-13 CHEP2016

Conclusions We setup a test-bed with HTCondor to achieve the followings: Dynamic resource allocation One batch system for all Simple and quick setup showed that we could do what we want Just took one step towards the HTCondor world Lots of things to study 2016-10-13 CHEP2016

References HTCondor Manual v8.4.X HTCondor: How To Admin Recipes http://research.cs.wisc.edu/htcondor/manual/v8.4/ HTCondor: How To Admin Recipes https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAdminRecipes Maintaining Accounting Group Quotas with Preemption Policy http://erikerlandson.github.io/blog/2012/06/27/maintaining-accounting-group-quotas-with-preemption-policy/ Checkpointing Vanilla Jobs with BLCR http://www.ucs.cam.ac.uk/scientific/camgrid/technical/blcr 2016-10-13 CHEP2016