The Scheduling Strategy and Experience of IHEP HTCondor Cluster

Slides:



Advertisements
Similar presentations
PlanetLab Operating System support* *a work in progress.
Advertisements

1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Maintaining and Updating Windows Server 2008
Installing software on personal computer
Printing Terminology. Requirements for Network Printing At least one computer to operate as the print server Sufficient RAM to process documents Sufficient.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Office of Science U.S. Department of Energy Evaluating Checkpoint/Restart on the IBM SP Jay Srinivasan
Grid Computing I CONDOR.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Next Generation of Apache Hadoop MapReduce Owen
Virtual Cluster Computing in IHEPCloud Haibo Li, Yaodong Cheng, Jingyan Shi, Tao Cui Computer Center, IHEP HEPIX Spring 2016.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Maintaining and Updating Windows Server 2008 Lesson 8.
SQL Database Management
SmartCenter for Pointsec - MI
Workload Management Workpackage
Understanding The Cloud
Status of BESIII Computing
The advances in IHEP Cloud facility
Chapter Objectives In this chapter, you will learn:
Troubleshooting Tools
OpenPBS – Distributed Workload Management System
Elastic Computing Resource Management Based on HTCondor
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Integration of Openstack Cloud Resources in BES III Computing Cluster
Workload Management System
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
MCTS Guide to Microsoft Windows 7
High Availability in HTCondor
William Stallings Computer Organization and Architecture
CREAM-CE/HTCondor site
Monitoring HTCondor with Ganglia
Artem Trunov and EKP team EPK – Uni Karlsruhe
Conditions Data access using FroNTier Squid cache Server
Accounting in HTCondor
Discussions on group meeting
HTCondor Command Line Monitoring Tool
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Advanced Computing Facility Introduction
Basic Grid Projects – Condor (Part I)
Upgrading Condor Best Practices
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Condor-G Making Condor Grid Enabled
Exploring Multi-Core on
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

The Scheduling Strategy and Experience of IHEP HTCondor Cluster Shi, Jingyan (shijy@ihep.ac.cn) On behalf of scheduling group of Computing Center, IHEP

Outline 1 Migration to HTCondor 2 Scheduling Policy to HTCondor Works Designed and Developed 3 4 Problems We Met 5 Summary and Future Work

The Migration to HTCondor Motivation PBS had been used at IHEP for more than 10 years Limited Scalability and growing resources and users ~10,000+ job slots and 20,000+ jobs: Performance bottleneck Migration to HTCondor: Better performance and active community Migration step by step with risk control Jan, 2015 : ~ 1,100 CPU cores May, 2016: ~ 3,500 CPU cores Dec, 2016: ~ 11,000 CPU cores

Current Status Architecture Jobs 28 submitting nodes 2 scheduler machine (local cluster, virtual cluster) 2 central manager (local cluster, virtual cluster) ~ 10,000 physical CPU cores + an elastic number of virtual slots Jobs Avg 100,000 jobs/day; 60,000 jobs in queue at peak time Serial and single-core jobs

Outline 1 Migration to HTCondor 2 Scheduling Policy to HTCondor Works Designed and Developed 3 4 Problems We Met 5 Summary and Future Work

Resource Divided at PBS Cluster Several HEP experiments supported BES, Daya Bay, Juno, Lhaaso, HXMT etc. Resources are funded and dedicated for different experiments No resource sharing among experiments 55 jobs queues with group permission limits set at PBS Low resource utility Coexistence busy queues and free resources Busy Group Idle Group

Scheduling Strategy at HTCondor Cluster Resource sharing Break the resource separation Busy groups can occupy more resource from the resource of idle groups Fairness guarantee Peak computing requirements from different experiment usually happened at different time period Jobs from idle group have high priority The more resource the experiment contribute to share, the more its jobs can be scheduled to run

Resource Sharing at HTCondor Based on job slots (mainly CPU cores) As a first step, resources are partially shared Some exclusive resources are kept by experiments own Only run jobs from the resource owner Sharing resource pool Resource contributed by all experiments Slot can accept for jobs from all experiments At least 20% slots are shared by each experiment encourage experiments to share more resources The exclusive and shared slots of different groups

Fairness and Priority Scheduling preference Experiemnt quota Jobs prefers to run on exclusive slots of its own experiment The shared slots are kept for busy experiments Experiemnt quota Users from the experiment are in the same linux group The initial group quota is set to the amount of real resources from experiments The quota can be exceeded if there are idle slots in the sharing pool Group priority and User priority Group priority is correlated to the group quota and the group slots occupancy User Priority is effective inside same group users

Outline 1 Migration to HTCondor 2 Scheduling Policy to HTCondor Works Designed and Developed 3 4 Problems We Met 5 Summary and Future Work

Central Controller The central control of groups, users and work nodes All information is collected and saved into Central Database Necessary information is updated and published to relative services Work nodes update its configuration via httpd periodically

Error Detection and Recovery Health status of all workers are collected from monitoring system and saved into Central Database Central controller updates work noworkers’ attributes automatically

The Toolkit: hep_job Motivation Implementation Smooth migration from PBS to HTCondor for users Simplify users’ work Help to achieve our scheduling strategy Implementation Base on python API of HTCondor Integrated with IHEP computing platform Server name, group name Several Jobs template according the experiments requirements

Job Monitoring Queueing and running statistics The overall clusters Each group/experiment The exclusive and sharing resource statistics Nagios and Ganglia

Global Accounting Detailed accounting to each group and each user Weighting slots with slow/fast CPU, Memory, Disk, etc.

Put All Together jobs schedd negotiator collector startd Central Controller System http HTCondor user group commands Hepjob Sharing Policy Dynamic Configuration Accounting Monitoring nagios /web page Interface to cc accounting

Outline 1 Migration to HTCondor 2 Scheduling Policy to HTCondor Works Designed and Developed 3 4 Problems We Met 5 Summary and Future Work

Problems We Met – dishonest user Claimed with other group member to obtain more job slots Running sshd daemon at work nods ssh to work nodes without password run big MPI task from login node secretly occupied more cpu cores than the slots declaimed in job How to deal with Add group priority check at wrapper of work nodes Zombie process check deployed at work nodes to kill the process which does not belong to jobs running on the work node

Problem We Met – job hung Sched daemon can not be connected in a short time suddenly Jobs are hung and re-queued unexpectedly Reason: Default open file limit: 1024 How to deal with Increase the system limit Restart sched process 03/10/17 17:56:47 (pid:1105883) Started shadow for job 7339826.0 on slot1@bws0472.ihep.ac.cn <192.168.57.232:6795?addrs=192.168.57.232-6795> for physics.mahl, (shadow pid = 3809619) 03/11/17 01:50:56 (pid:1105883) ERROR: Child pid 3809619 appears hung! Killing it hard. 03/11/17 01:50:56 (pid:1105883) Shadow pid 3809619 successfully killed because it was hung. 03/11/17 01:50:56 (pid:1105883) Shadow pid 3809619 for job 7339826.0 exited with status 4 03/11/17 01:50:56 (pid:1105883) ERROR: Shadow exited with job exception code!

Problems We Met – sched owner changed The owner of “condor_sched” changed from condor to normal user Reason: Disk mounted at Sched server inaccessible How to deal with: Disk check added and report to monitoring Version upgrade consideration

Outline 1 Migration to HTCondor 2 Scheduling Policy to HTCondor Works Designed and Developed 3 4 Problems We Met 5 Summary and Future Work

Summary and Future Work The resource utility has been significantly improved with the resource sharing policy We implemented a number of tools to enhance the system interaction and robustness Future work Automatically tuning the resource sharing ratio according to the overloads of each group The integration of Job Monitoring and Central Controller HTCondor sites union

Thank you ! Question?