Condor Tugba Taskaya-Temizel 6 March 2006. What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.

Slides:



Advertisements
Similar presentations
Community Grids Lab1 CICC Project Meeting VOTable Developed VotableToSpreadsheet Service which accepts VOTable file location as an input, converts to Excel.
Advertisements

Basic Grid Projects – Condor Part II Sathish Vadhiyar Sources/Credits: Condor Project web pages.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
SIE’s favourite pet: Condor (or how to easily run your programs in dozens of machines at a time) Adrián Santos Marrero E.T.S.I. Informática - ULL.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
1 Using Condor An Introduction ICE 2008.
Douglas Thain Computer Sciences Department University of Wisconsin-Madison October Condor by Example.
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
When and How to Use Large-Scale Computing: CHTC and HTCondor Lauren Michael, Research Computing Facilitator Center for High Throughput Computing STAT 692,
Introduction to Condor DMD/DFS J.Knudstrup December 2005.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
April Open Science Grid Campus Condor Pools Mats Rynge – Renaissance Computing Institute University of North Carolina, Chapel Hill.
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
1 Using Condor An Introduction ICE 2010.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Grid job submission using HTCondor Andrew Lahiff.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
July 28' 2011INDIA-CMS_meeting_BARC1 Tier-3 TIFR Makrand Siddhabhatti DHEP, TIFR Mumbai July 291INDIA-CMS_meeting_BARC.
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Working with Condor. Links: Condor’s homepage:  Condor manual (for the version currently.
An Introduction to Using HTCondor The Team Established in 1985, to do research and development of distributed high-throughput computing.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Weekly Work Dates:2010 8/20~8/25 Subject:Condor C.Y Hsieh.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
HTCondor Advanced Job Submission John (TJ) Knoeller Center for High Throughput Computing.
An Introduction to Using
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Intermediate HTCondor: Workflows Monday pm
Grid Compute Resources and Job Management
Condor: Job Management
An Introduction to Using Condor Condor Week 2012
Job Matching, Handling, and Other HTCondor Features
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The Condor JobRouter.
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Condor Tugba Taskaya-Temizel 6 March 2006

What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities such as job management, scheduling policy, priority scheme, resource monitoring and management (Thain, et al. 2005). They offer the following features: ClassAds: A framework to match the resources with the specified job descriptions Job Checkpoint and Migration: For some particular applications, it is possible to resume the application from its last state using a checkpoint file. This provides a means of fault tolerance. For example, in the case of a failure in a machine, the job can be safely transferred to another machine Remote System Calls: Condor supports I/O related jobs (processes, executables) which require processing input files and generating output files. By using this way, the files will automatically be transferred to the remote machines, hence you are not required to transfer the files manually by yourself or have a shared file system.

Condor in our Department  There are 30 machines in our departmental Condor pool in which 19 of them are Linux based (concorde01-concorde06, tornado01-13) and 11 of them have NT operating system. The number of CPUs is 107. To connect to one of the Condor machines, type: telnet concorde03.mcs.surrey.ac.uk telnet concorde03.mcs.surrey.ac.uk  In order to inspect the Condor pool, you can run: condor_status condor_status  The output will be: Name OpSys Arch State Activity LoadAv Mem ActvtyTime LINUX INTEL Owner Idle :30:56 LINUX INTEL Owner Idle :45:42 LINUX INTEL Claimed Busy :14:40 LINUX INTEL Claimed Busy :48:20 LINUX INTEL Unclaimed Idle :50:13 LINUX INTEL Unclaimed Idle :50:05  In order to see the available machines, you can call: condor_ status -available

How to Run a Job in Condor The jobs that run in Condor environment are background jobs. Hence, they will not accept any input from the user during its run. According to the type of your application, you should choose an appropriate universe. A universe is defined as an execution environment in Condor. The Condor provides many universes such as ’Standard’, ’Vanilla’, ’PVM’, ’MPI’, ’Globus’, ’Java’ and ’Scheduler’. The universe type should be specified in the ClassAd file.

How to Run a Job in Condor Standard Universe Standard universe provides checkpoint mechanism that saves the last state of the job. This is of benefit when the long running jobs are required to migrate to another machine. Create a directory such as $HOME/gt3/samples/condor.

How to Run a Job in Condor Standard Universe Create a file called counter.c. Write the following lines to the file: #include int main(int args,char *argv[]) { int i; for (i=atoi(argv[1]);i<atoi(argv[2]);i++) { printf ("%d \n",i); }   Compile the file and link it to the Condor. condor_compile cc counter.c -o counter

How to Run a Job in Condor Standard Universe Once it was linked, we should create a ClassAd file to execute our job. Create a file with an extension ’cmd’ such as ’standardunitest.cmd’. Then, write the following lines: Executable = counter Arguments = 1 30 Output = counterc1.out Log = counterc1.log Queue 1 Arguments = Output = counterc2.out Queue 1

How to Run a Job in Condor Standard Universe Once it was linked, we should create a ClassAd file to execute our job. Create a file with an extension ’cmd’ such as ’standardunitest.cmd’. Then, write the following lines: Executable = counter Requirements = (Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== || Name== Arguments = 1 30 Output = counterc1.out Log = counterc1.log Queue 1 Arguments = Output = counterc2.out Queue 1

How to Run a Job in Condor Standard Universe To submit the job to the Condor pool, run the following command: condor_submit standardunitest.cmd The output will be: Submitting job(s).. Logging submit event(s).. 2 job(s) submitted to cluster 92. To inspect your job, run: condor_q This will display your jobs. At first, you are expected to see that your jobs are idle: -- Submitter: concorde03.mcs.surrey.ac.uk : : concorde03.mcs.surrey.ac.uk ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 92.0 css1tt 3/3 14: :00:00 I counter css1tt 3/3 14: :00:00 I counter jobs; 2 idle, 0 running, 0 held After couple of minutes, when you call the same command ’condor q’, you should expect to see the following: -- Submitter: concorde03.mcs.surrey.ac.uk : : concorde03.mcs.surrey.ac.uk ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held

How to Run a Job in Condor Java Universe A java file can be run on a machine with a JVM. Unlike in standard universe, the jobs cannot be suspended and moved to another machine. However, in the case of a failure, the jobs can be restarted in another machine. Create a Java file called ’Counter.java’ and write the following lines to the file: import java.lang.*; public class Counter{ public static void main(String [] args) { int startt = Integer.parseInt(args[0]); int stopp = Integer.parseInt(args[1]); for(int i=startt;i<stopp;i++) { System.out.println(i); } Then, compile the program: javac Counter.java

How to Run a Job in Condor Java Universe   We should create a submit description file. Recall that the file extension should be ’cmd’ such as ’javaunitest.cmd’. Add the following lines to the file: universe = java executable= Counter.class log= counter.log arguments = Counter 1 30 output = counter1.output error = counter1.error should_transfer_files = YES when_to_transfer_output = ON_EXIT queue arguments = Counter output = counter2.output error = counter2.error should_transfer_files = YES when_to_transfer_output = ON_EXIT queue

How to Run a Job in Condor Java Universe   To submit the jobs to Condor, run condor_submit javaunitest.cmd To inspect its status, type: condor_q The output of the command will look like: -- Submitter: concorde03.mcs.surrey.ac.uk : : concorde03.mcs.surrey.ac.uk ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 98.0 css1tt 3/3 15: :00:00 I java Counter css1tt 3/3 15: :00:00 I java Counter jobs; 2 idle, 0 running, 0 held When you notice a problem with your job, you need to remove it from the Condor pool. To do it, you need to call: condor_rm ID

How to Run a Job in Condor Vanilla Universe   There are some applications that cannot be run in standard and java universe such as shell scripts. Shell scripts can be used to call external applications such as Matlab. Create a file, called ’count.m’ and write the following lines to the file: function count(startt, stopp) for i=startt:stopp-1 i end To call the matlab program, we should call the Matlab application and then call our program. To do it, we should write a script. Create a file called ’runmatlab.sh’: #!/bin/sh echo "Number of arguments: $#" matlab -r "addpath /user/csckmst/css1tt/gt3/samples/csm23_2006/Tutorials/condortutorial/; count($1,$2);quit;"

How to Run a Job in Condor Vanilla Universe As a final step, we should prepare the description file. Create a file with extension ’.cmd’ such as ’matlabtest.cmd’. Universe = vanilla executable = /a/filer2/home/filer2/csckmst/css1tt/gt3/samples/csm23_2006/Tutorials/ condortutorial/runmatlab.sh Initialdir = /a/filer2/home/filer2/csckmst/css1tt/gt3/samples/csm23_2006/Tutorials/ condortutorial Requirements = Memory>=20 && Arch == "INTEL" && OpSys == "LINUX" Getenv = True Log = matlabpro.log # main matlab file to execute Arguments = 1 30 Output = matlab1.out Error = matlab1.err transfer_input_files = count.m should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT Queue 1 # main matlab file to execute Arguments = Output = matlab2.out Error = matlab2.err transfer_input_files = count.m should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT Queue 1

How to Run a Job in Condor Vanilla Universe   To submit the job, run: condor_submit matlabtest.cmd   To see the output of your job, call: more matlab1.out more matlab2.out EXERCISE: Submit the matlab and java counter programs together using the same description file. Both programs should be specified in the vanilla universe.