HTCondor and TORQUE/Maui

Slides:



Advertisements
Similar presentations
Open Science Grid Discovering and understanding the site environment Or, yet another site test kit.
Advertisements

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Pre-GDB on Batch Systems (Bologna)11 th March Torque/Maui PIC and NIKHEF experience C. Acosta-Silva, J. Flix, A. Pérez-Calero (PIC) J. Templon (NIKHEF)
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Chapter 5 Roles and features. objectives Performing management tasks using the Server Manager console Understanding the Windows Server 2008 roles Understanding.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
Grid job submission using HTCondor Andrew Lahiff.
Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Portal Update Plan Ashok Adiga (512)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
A Comparative Analysis of Localized Command Line Execution, Remote Execution through Command Line, and Torque Submissions of MATLAB® Scripts for the Charting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Information Initiative Center, Hokkaido University North 11, West 5, Sapporo , Japan Tel, Fax: General.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
Job Priorities and Resource sharing in CMS A. Sciabà ECGI meeting on job priorities 15 May 2006.
CE design report Luigi Zangrando
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
HTCondor Accounting Update
Interacting with the cluster ssh, sftp, & slurm batch scripts
Hello everyone I am rajul and I’ll be speaking on a case study of elastic slurm Case Study of Elastic Slurm Rajul Kumar, Northeastern University
Parrot and ATLAS Connect
Accessing the VI-SEEM infrastructure
Gri2Win: Porting gLite to run under Windows XP Platform
Grid2Win Porting of gLite middleware to Windows XP platform
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Open OnDemand: Open Source General Purpose HPC Portal
Computing Clusters, Grids and Clouds Globus data service
OpenPBS – Distributed Workload Management System
Operating System Concepts
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Belle II Physics Analysis Center at TIFR
GWE Core Grid Wizard Enterprise (
An Introduction to Cloud Computing
How to enable computing
CREAM-CE/HTCondor site
Gri2Win: Porting gLite to run under Windows XP Platform
Radoslaw Jedynak, PhD Poland, Technical University of Radom
Creating a Dynamic HPC Infrastructure with Platform Computing
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
Working in The IITJ HPC System
Pete Gronbech, Kashif Mohammad and Vipul Davda
Presentation transcript:

HTCondor and TORQUE/Maui Office of Technology Services I Research Data Center HTCondor and TORQUE/Maui Resource Management Integration Grigoriy Abramov Systems Administrator, Research Data Center Chicago, IL HTCondor Week 2017

Problems: Low efficiency of available computational resources Office of Technology Services | Research Data Center Problems: Low efficiency of available computational resources Disproportionate utilization of each individual cluster Absence of sharing computational resources Absence of a unified, resource-management platform Variations on Linux OS releases Distributed ownership of computational resources HTCondor Week 2017

Office of Technology Services | Research Data Center HTCondor Week 2017

Office of Technology Services | Research Data Center HTCondor Week 2017

Office of Technology Services| Research Data Center Advantage: Availability of TORQUE/Maui resource management and Portable Batch System (PBS) on all HPC computational resources HTCondor Week 2017

Goals: Working Principles: Office of Technology Services | Research Data Center Goals: Cost-efficient optimization of computational resources Greater access to computational resources for faculty members and research groups who do not have the resources to obtain their own clusters Working Principles: Departments and research groups retain full ownership of their clusters and have priority in performing their computations All running computational jobs submitted by “guests” on shared resources should be removed from the queue when needed HTCondor Week 2017

Office of Technology Services | Research Data Center Challenges: Finding an optimal platform that does not require system reinstallation or significant, configuration updates that would interrupt an already-running computation HTCondor Week 2017

OS and Applications: ROCKS cluster OS TORQUE/Maui (Adaptive Computing) Office of Technology Services | Research Data Center OS and Applications: ROCKS cluster OS TORQUE/Maui (Adaptive Computing) HTCondor, HTCondor-CE-BOSCO, or both HTCondor Week 2017

Open Science Grid IIT Compute Resource

Office of Technology Services | Research Data Center HTCondor Week 2017

GridIIT/OSG Computing Grid Diagram HTCondor-CE-Bosco HTCondor-CE-Bosco SQUID-proxy HTCondor IIT Campus Grid Research Data Center computational resources

Implementation: PBS Queue-Manager (qmgr) configuration Office of Technology Services | Research Data Center Implementation: Single-user account and GID on all HPC clusters: guestuser/guestuser PBS Queue-Manager (qmgr) configuration Maui configuration Installation of HTCondor-CE-Bosco or HTCondor Access to computational resources via Secure Shell (SSH) Testing computing grid resources for incoming/outgoing traffic HTCondor Week 2017

Implementation | TORQUE configuration Office of Technology Services | Research Data Center Implementation | TORQUE configuration Create and define queue grid at qmgr prompt: create queue grid set queue grid queue_type = Execution set queue grid max_user_queuable = 14 set queue grid resources_default.walltime = 48:00:00 set queue grid resources_default.ncpus = 1 set queue grid acl_group_enable = True set queue grid acl_groups = guestuser set queue grid kill_delay = 120 set queue grid keep_completed = 120 set queue grid enabled = True set queue grid started = True HTCondor Week 2017

Implementation | Maui configuration Office of Technology Services | Research Data Center Implementation | Maui configuration Priority Preemption Preemption policy Partitioning QOS – Quality of Services HTCondor Week 2017

Implementation | Maui configuration (cont.) Office of Technology Services | Research Data Center Implementation | Maui configuration (cont.) RMCFG[base] TYPE=PBS SUSPENDSIG=15 PREEMPTPOLICY SUSPEND NODEALLOCATIONPOLICY PRIORITY QOSCFG[hi] QFLAGS=PREEMPTOR QOSCFG[low] QFLAGS=NOBF:PREEMPTEE CLASSWEIGHT 10 CLASSCFG[batch] QDEF=hi PRIORITY=1000 CLASSCFG[grid] QDEF=low PRIORITY=1 HTCondor Week 2017

Implementation | MAUI configuration (cont.) Office of Technology Services | Research Data Center Implementation | MAUI configuration (cont.) GROUPCFG[users] PRIORITY=1000 QLIST=hi QDEF=hi QFLAGS=PREEMPTOR GROUPCFG[guestuser] PRIORITY=1 QLIST=low QDEF=low QFLAGS=PREEMPTEE USERCFG[guestuser] PRIORITY=1 QLIST=low QDEF=low QFLAGS=PREEMPTEE PARTITIONMODE ON NODECFG[compute-1-1] PARTITION=grid NODECFG[compute-1-2] PARTITION=grid SYSCFG[base] PLIST=default, grid& USERCFG[DEFAULT] PLIST=default GROUPCFG[guestuser] PLIST=default:grid PDEF=default * Maui service needs to be restarted HTCondor Week 2017

Office of Technology Services | Research Data Center Test job submission via PBS as guestuser on compute cluster In submit script, the below-listed options should be presented: #PBS -q grid #PBS -W x=“PARTITION:grid” Reliability of PREEMPTION needs to be verified Install and configure HTCondor or HTCondor-CE-BOSCO Add on computational cluster’s head node following lines to file ../bosco/glite/bin/pbs_local_submit_attributes.sh #!/bin/sh echo "#PBS -q grid” echo '#PBS -W x="PARTITION:grid”’ Submit test job from remote server via command: bosco_cluster –t guestuser@your_cluster_name.edu What title for this slide? HTCondor Week 2017

GridIIT and OSG Shared Computational Resources Over a 6-Month Period Office of Technology Services | Research Data Center GridIIT and OSG Shared Computational Resources Over a 6-Month Period *Opportunistic *Dedicated Source: http://gracc.opensciencegrid.org HTCondor Week 2017

? ? ?

Office of Technology Services | Research Data Center Not sure whether there should be any text here? HTCondor Week 2017