HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.

Slides:



Advertisements
Similar presentations
Queuing ANSYS jobs on a local machine
Advertisements

Operating System Structure
High Volume Batch Submission System for Earthquake Engineering ( Batchsubmit ) By Anup Thomas Hacker Gregory Rodgers.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
Job Submission on WestGrid Feb on Access Grid.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 8: Implementing and Managing Printers.
Front end GUI for PsExec, A fast and easy remote deployment utility.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Installing and running COMSOL on a Windows HPCS2008(R2) cluster
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Ajmer Singh PGT(IP) Software Concepts. Ajmer Singh PGT(IP) Operating System It is a program which acts as an interface between a user and hardware.
Christian Kocks April 3, 2012 High-Performance Computing Cluster in Aachen.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Module 14: Configuring Print Resources and Printing Pools.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
USING THE HUBZERO PLATFORM TO ENABLE REMOTE COMPUTING ON DIAGRID HUBBUB SEPTEMBER 15 TH 2015 Christopher Thompson Rosen Center of Advanced Computing.
Carol Song Hubbub 2013 September 5, 2013 Power to the Masses.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
9 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
Review of Condor,SGE,LSF,PBS
Having a Blast! on DiaGrid Carol Song Rosen Center for Advanced Computing December 9, 2011.
Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
1 P-GRADE Portal tutorial at EGEE’09 Introduction to hands-on Gergely Sipos MTA SZTAKI EGEE.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Using Simulation Workspaces Michael McLennan HUBzero® Platform for Scientific Collaboration Purdue University 1 This work licensed under Creative Commons.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
1 P-GRADE Portal hands-on Gergely Sipos MTA SZTAKI Hungarian Academy of Sciences.
Manchester Computing Supercomputing, Visualization & eScience Seamless Access to Multiple Datasets Mike AS Jones ● Demo Run-through.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
HUBzero® Platform for Scientific Collaboration Copyright © 2012 HUBzero Foundation, LLC International Workshop on Science Gateways, ETH Zürich, June 3-5,
Compute and Storage For the Farm at Jlab
Integrating Scientific Tools and Web Portals
OpenPBS – Distributed Workload Management System
Using simulation workspaces to “submit” jobs and workflows
IW2D migration to HTCondor
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Presentation transcript:

HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013

Example  “SubmitR” tool running on the DiaGrid hub DiaGrid: distributed research computing network SubmitR: hub tool for running R scripts on DiaGrid

SubmitR  Move files, run job on remote system, view results Hub

Building a job: Files, options/arguments, job parameters Job Types One process Multiple processes, communicating (parameter sweep) independent processes

The “submit” command:  Runs user command on a remote system submit 1.Connect to remote system 2.Transfer input files and program 3.Create script for user’s command 4.Talk to batch or workflow system 5.Output periodic status updates 6.Transfer files back to hub

For SubmitR, submit uses: PBS job scheduling on Purdue’s Hansen cluster(single or parallel jobs) Pegasus workflow management with HTCondor (parameter sweeps) submit options: VENUES- remote systems MANAGERS- commands that can be run on remote systems

Building the submit command: submit -n 2 -w 60 -v hansen -M -i inp.dat R CMD BATCH -q “--args inp.dat” myscipt.R Use manager “R ”. Causes “R” interpreter to run on remote system. Job should use 2 processors, 60 minutes walltime, run on Hansen cluster, and collect metrics. File “inp.dat” should be included (transported to remote system). Options for the R interpreter. Note: submit detects that “myscript.R” is used and transports it to remote system.

Executing the submit command, getting status updates:

Tips for using submit:  Test submit from the hub’s command line (workspace): $> submit -n 1 -w 5 -v hansen -M R CMD BATCH -q "--args 1 2" testargs.R" =SUBMIT-METRICS=> job= ( ) Job Submitted at hansen-a Mon Sep 2 17:38: ( ) Simulation Queued at hansen-a Mon Sep 2 17:39: ( ) Simulation Complete at hansen-a Mon Sep 2 17:39: ( ) Simulation Done at hansen-a Mon Sep 2 17:39: =SUBMIT-METRICS=> job= venue=1:sshPBS: :diagrid- status=0 cpu= real= wait= (end of output)  Use submit’s notification feature to alert user when job finishes: $> submit mail2self –s ‘Hey’ –t ‘Your job is done.’

Additional submit feature:  Automatic breakout of parameter combinations (for sweeps)  “ submit … -p … ” User wants six runs. Parameters:

Directories:  “Run” directory: A tool-specific directory under hub’s session directory. Current working directory for executing submit. Isolates job-related files. Ex. “~/data/sessions/6716/submitr”  Parameter sweep output: Job directory created under run directory. Pegasus puts each run’s (sub-job’s) output in separate directory under job directory. Pegasus bookkeeping files in job directory.

Exiting the tool, canceling the job:

Moving files: 1.Browse - moving files between directories on hub (“ os.rename(pathname,newpath ”) 2.Upload / download - moving files between workstation and hub Hub commands: importfile and exportfile. Execute importfile from separate thread to handle user-canceled uploads  Concept: File “import / export” Bringing files into and out of tool. Two flavors:

ResourceLink Rob Research Computing at Purdue DiaGrid Hubhttp://diagrid.org SubmitRhttps://diagrid.org/tools/submitr Tool Developers Guidehttp://hubzero.org/documentation/1.1.0/tooldevs The submit commandhttp://hubzero.org/documentation/1.1.0/tooldevs/grid.submitcmd Pegasushttp://pegasus.isi.edu/ HTCondorhttp://research.cs.wisc.edu/htcondor/ Information