Accounting in HTCondor

Slides:

Advertisements

Similar presentations

HTCondor scheduling policy

Advertisements

Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.

More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

Priority and Provisioning Greg Thain HTCondorWeek 2015.

Monitoring HTCondor Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova

Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.

Implementing a Central Quill Database in a Large Condor Installation Preston Smith Condor Week April 30, 2008.

The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.

PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.

HTCondor at the RAL Tier-1 Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.

Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand.

PROOF work progress. Progress on PROOF The TCondor class was rewritten. Tested on a condor pool with 44 nodes. Monitoring with Ganglia page. The tests.

Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison

Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.

Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.

A Personal Cloud Controller Yuan Luo School of Informatics and Computing, Indiana University Bloomington, USA PRAGMA 26 Workshop.

Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.

Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.

GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.

© 2005 Altera Corporation © 2006 Altera Corporation Batch Computing at Altera Condor, Quill and The Enterprise.

Mar 27, gLExec Accounting Solutions in OSG Gabriele Garzoglio gLExec Accounting Solutions in OSG Mar 27, 2008 Middleware Security Group Meeting Igor.

Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.

Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,

Landing in the Right Nest: New Negotiation Features for Enterprise Environments Jason Stowe.

How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.

Building the International Data Placement Lab Greg Thain Center for High Throughput Computing.

Matchmaker Policies: Users and Groups HTCondor Week, Madison 2016 Zach Miller Jaime Frey Center for High Throughput.

HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.

HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

Integrating HTCondor with ARC Andrew Lahiff, STFC Rutherford Appleton Laboratory HTCondor/ARC CE Workshop, Barcelona.

3 Compute Elements are manageable By hand 2 ? We need middleware – specifically a Workload Management System (and more specifically, “glideinWMS”) 3.

Taming Local Users and Remote Clouds with HTCondor at CERN

Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.

CHTC Policy and Configuration

Workload Management Workpackage

HTCondor Networking Concepts

Scheduling Policy John (TJ) Knoeller Condor Week 2017.

HTCondor Networking Concepts

HTCondor Security Basics

Quick Architecture Overview INFN HTCondor Workshop Oct 2016

Intermediate HTCondor: Workflows Monday pm

Scheduling Policy John (TJ) Knoeller Condor Week 2017.

Examples Example: UW-Madison CHTC Example: Global CMS Pool

Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)

Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")

Matchmaker Policies: Users and Groups HTCondor Week, Madison 2017

High Availability in HTCondor

CREAM-CE/HTCondor site

Monitoring HTCondor with Ganglia

Moving CHTC from RHEL 6 to RHEL 7

The Scheduling Strategy and Experience of IHEP HTCondor Cluster

HTCondor Command Line Monitoring Tool

Negotiator Policy and Configuration

Accounting, Group Quotas, and User Priorities

HTCondor Security Basics HTCondor Week, Madison 2016

Basic Grid Projects – Condor (Part I)

Upgrading Condor Best Practices

1. 4 BECOME A PAGE ADMINISTRATOR

HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.

Condor: Firewall Mirroring

GRID Workload Management System for CMS fall production

Condor Administration in the Open Science Grid

Negotiator Policy and Configuration

Putting your users in a Box

Presentation transcript:

Accounting in HTCondor Greg Thain INFN Workshop 2016

HTCondor Architecture

Overview of Condor Architecture Schedd A Central Manager worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker worker Usage History Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 worker

2 Steps to Scheduling

Step 1: Schedd A Central Manager Schedd B worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker Usage History worker Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 1: The CM assigns SLOTS to USERS based on historical fair share Keeps tracks of usage persistently Informs schedds slots for their users worker This is called the negotiation cycle

Step 2: Schedd A Central Manager Schedd B worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker Usage History worker Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 1: Schedds assign SLOTS to JOBS based on job prio, fit will REUSE slots for > 1 job worker Many schedds in pools.

Consequences All accounting happens in CM Both monitoring and control Accounting data is persistent Accounting is rolled-up, not a log CM knows NOTHING about jobs!

What’s a user? Bob in schedd1 same as Bob in schedd2? If have same UID_DOMAIN, the are. Prevents cheating by adding shedds Map files can define the local user name

condor_userprio Command usage to view current priorities: condor_userprio –most Effective Priority User Name Priority Factor In Use (wghted-hrs) Last Usage ---------------------------------------------- --------- ------ ----------- ---------- lmichael@submit-3.chtc.wisc.edu 5.00 10.00 0 16.37 0+23:46 blin@osghost.chtc.wisc.edu 7.71 10.00 0 5412.38 0+01:05 osgtest@osghost.chtc.wisc.edu 90.57 10.00 47 45505.99 <now> cxiong36@submit-3.chtc.wisc.edu 500.00 1000.00 0 0.29 0+00:09 ojalvo@hep.wisc.edu 500.00 1000.00 0 398148.56 0+05:37 wjiang4@submit-3.chtc.wisc.edu 500.00 1000.00 0 0.22 0+21:25 cxiong36@submit.chtc.wisc.edu 500.00 1000.00 0 63.38 0+21:42 Tool to view/ change user prio

Metric: Effective Priority Negotiator computes, stores the user prio Inversely related to machines allocated (lower number is better priority) A user with priority of 10 will be able to claim twice as many machines as a user with priority 20

Effective User Priority (Effective) User Priority is determined by multiplying two components Real Priority * Priority Factor

Real Priority Based on actual usage, starts at .5 Approaches actual number of machines used over time Configuration setting PRIORITY_HALFLIFE If PRIORITY_HALFLIFE = +Inf, no history Default one day (in seconds) Asymptotically grows/shrinks to current usage

Priority Factor Assigned by administrator Set/viewed with condor_userprio Persistently stored in CM Defaults to 1000 (DEFAULT_PRIO_FACTOR) Allows admins to give prio to sets of users, while still having fair share within a group “Nice user”s have Prio Factors of 1,000,000

Condor principle #2

Condor principle #2 Condor provides access to operational info You need to make it pretty…

Condor_userprio -l For machine-parseable formats Parse yourself, and create graphs, reports.

condor_userprio -l Name = "gthain@chevre.cs.wisc.edu" ResourcesUsed = 11 WeightedResourcesUsed = 11.0 LastHeardFrom = 1477379405 LastUpdate = 1477379405 Priority = 500.0799865722656 WeightedAccumulatedUsage = 3878596.0 UpdateSequenceNumber = 0 PriorityFactor = 1000.0 MyType = "Accounting" IsAccountingGroup = false AccumulatedUsage = 3861736.0 BeginUsageTime = 1469125737

Overview of Condor Architecture Schedd A Central Manager worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker worker Usage History Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 Schedds also keep logs rolled, not snapshots Have per-job data Gratia uses these worker

condor_history ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 10973.8 gthain 10/25 02:09 0+00:00:51 C 10/25 02:10 /scratch/gthai 10973.7 gthain 10/25 02:09 0+00:00:50 C 10/25 02:10 /scratch/gthai 10973.9 gthain 10/25 02:09 0+00:00:50 C 10/25 02:10 /scratch/gthai 10973.6 gthain 10/25 02:09 0+00:00:49 C 10/25 02:10 /scratch/gthai

Worker nodes also have history condor_history –f startd_history

Summary HTCondor uses two-steps for scheduling All accounting is handled in central manager Monitoring data also available in sched and startd