Running DiFX with SGE/OGE Helge Rottmann Max-Planck-Institut für Radioastronomie Bonn, Germany DiFX Meeting 24.9. - 28.9.2012 Sydney.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Parallel ISDS Chris Hans 29 November 2004.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
DCC/FCUP Grid Computing 1 Resource Management Systems.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Minerva Infrastructure Meeting – October 04, 2011.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Grid Computing I CONDOR.
HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
A Brief Documentation.  Provides basic information about connection, server, and client.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
TORQUE Kerry Chang CCLS December 13, O UTLINE Torque How does it work? Architecture MADA Demo Results Problems Future Improvements.
1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi
Virtual Private Grid (VPG) : A Command Shell for Utilizing Remote Machines Efficiently Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa Department of Computer.
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen Department of Computer Sciences University of Wisconsin-Madison Jeff Linderoth, Argonne.
DiFX New Features. Disk based file specification Improved specification of files to correlate Wild cards with time of scan name filtering? External reference.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
Next Generation of Apache Hadoop MapReduce Owen
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
CFI 2004 UW A quick overview with lots of time for Q&A and exploration.
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
1 High-Performance Grid Computing and Research Networking Presented by Javier Delgodo Slides prepared by David Villegas Instructor: S. Masoud Sadjadi
Gridengine Configuration review ● Gridengine overview ● Our current setup ● The scheduler ● Scheduling policies ● Stats from the clusters.
GRID COMPUTING.
Applied Operating System Concepts
OpenPBS – Distributed Workload Management System
Using Paraguin to Create Parallel Programs
BIMSB Bioinformatics Coordination
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
NGS computation services: APIs and Parallel Jobs
湖南大学-信息科学与工程学院-计算机与科学系
Operating System Concepts
Gonçalo Borges Jornadas LIP – 21/22 Dezembro Peniche
Sun Grid Engine.
Operating System Concepts
Working in The IITJ HPC System
Presentation transcript:

Running DiFX with SGE/OGE Helge Rottmann Max-Planck-Institut für Radioastronomie Bonn, Germany DiFX Meeting Sydney

What is SGE/OGE ? SGE = Sun Grid Engine Now Oracle Grid Engine (= OGE) but lots of online resources still refer to SGE. OGE is a Distributed Resource Management (DRM) System Goal: maximize resource utilization by matching incoming workload to available resources. Read more:

DRM overview Users can: specify minimum requirements (e.g number of nodes, memory available etc.) assign the start time of the job Assign job priorities (requires right to do so) 1.User submits a job to the master host 2.Master host schedules the job according to requested and available resources 3.Master host assigns the job to one or more execution hosts 4.Execution hosts execute the job

Why OGE? There are numerous DRM systems available (e.g. Torque) OGE (at least theoretically) meets all DiFX requirements OGE is very well documented OGE is very simple to install (part of RHEL and many other distributions) OGE is freely available

Submitting jobs with OGE Submit “simple” (non-parallelized) jobs qsub myjobscript qsub –l m_core=8 myjobscript #!/bin/csh #$ -M #$ -o flow.out -j y cd TEST f77 flow.f -o flow #!/bin/csh #$ -M #$ -o flow.out -j y cd TEST f77 flow.f -o flow Example myjobscript: Write an when job starts / finishes Redirect standard output And standard error to file Redirect standard output And standard error to file Commands to execute

Submitting jobs with OGE – cont. Submit “advanced” (parallel) jobs qsub –pe parallel_environment slots myjobscript Administrator provides a “parallel environment” for user submission of jobs. Administrator can set various contraints (e.g. which nodes to make available and many others) OGE tightly integrates with the most common parallelization frameworks e.g. OpenMPI

Submitting parallel jobs qsub –pe difxpe 4 myjobscript OGE chooses 4 execution nodes to start the job on. No machine file needed. #!/bin/csh mpirun hostname #!/bin/csh mpirun hostname Example myjobscript:

Why use OGE with DiFX ? Maximize cluster utilization. Use the cluster for other projects when not correlating. At MPIfR cluster is regularly used for: Pulsar search Numerical simulations of jets FPGA routing Simultaneously run multiple DiFX correlations Schedule execution of multiple correlation runs Out-of the box suspend/restart facilities Maybe you must correlate at an eternal computing facility. Utilization of Bonn cluster is only 20% most of the time. Pulsar people would like to consume DiFX resources when available.

Special DiFX requirements OGE must support DiFX threaded operations. Never start more than one job/node. OGE must obey special DiFX machine file order (1. head node, 2. datastream nodes, 3.compute nodes) Operational requirement: Must allow immediate execution of DiFX jobs even if other jobs are running on the nodes.

Limit process per node DiFX typically starts N-1 threads on each node (N= number of cores). To prevent overbooking it is necessary to tell OGE never to start more than one DiFX process per node. Non-DiFX jobs should be allowed to start more than one process /node OGE lets you define resource quotas e.g: limit users {difx, oper} to slots=1 or limit projects {difx} to slots=1

Enforce DiFX node order OGE “tight” integration with OpenMPI is convenient, but does not allow the user to influence e.g. node selection. DiFX requires special node order: head node first, then datastream nodes (fixed order in case of Mark5 units), then compute nodes. OGE provides a hook to loosen up the tight integration. Parallel environment can execute a script every time a job is submitted to it. This script genmachines.oge can produce a custom machine file. Make use of “loose” integration by providing the custom machine file to openmpi.

Enforce DiFX node order – cont. fxmanager Master host PE difx genmachines.oge machine.job qsub –pe difx 20 startdifx.oge #!/bin/csh mpirun –machinefile = $TMPDIR/machine runmpifxcorr.d21 #!/bin/csh mpirun –machinefile = $TMPDIR/machine runmpifxcorr.d21 mark5fx01 mark5fx08 node21 node35 … startdifx.oge

Giving DiFX priority We happily share correlator resources with other projects…. ….but when correlation should be scheduled in executed immediately OGE provides the concept of hierarchical queues: A queue is a group of resources (e.g. nodes) that jobs can be submitted to A node can belong to multiple queues Queues can be subordinate to other queues Jobs on a subordinate queue get suspended automatically if a job is submitted to its master.

Hierarchical queues node09 node10 node11 node09 node10 node11 node01 node02 node03 node04 node05 node06 node07 node08 node01 node02 node03 node04 node05 node06 node07 node08 A: master queue B: subordinate queue Scenario 1: a 6 process job is submitted to queue B a 5 process job is submitted to queue A => Jobs can run concurrently Scenario 2: a 6 process job is submitted to queue B a 8 process job is submitted to queue A  Queue B processes running on nodes 6, 7 and 8 get suspended => When job in queue A finishes suspended processes automatically resume

Summary Proof of principle that DiFX can be run from within OGE has been done at MPIfR Various things to be explored: How do suspended processes on mark5 units behave Explore queue setups that reflect workflow on the Bonn cluster Review requirements of non-DiFX cluster users …. Things to be done: Write production versions of genmachines.oge and startdifx.oge