Presentation is loading. Please wait.

Presentation is loading. Please wait.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

Similar presentations


Presentation on theme: "Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya."— Presentation transcript:

1 Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya Williams nadya@sdsc.edu

2 © 2005 UC Regents2 Condor  developed at University of Wisconsin see http://www.cs.wisc.edu/condor  system that creates a High-Throughput Computing (HTC) environment  specialized workload management system for compute-intensive jobs

3 © 2005 UC Regents3 Condor is:  a software system that runs on a cluster of workstations to harness wasted CPU cycles  condor pool consists of any number of machines  possibly different architectures  possibly different operating systems  connected by a network

4 © 2005 UC Regents4 Unique features  Transparent process checkpoint & migration  only migrates processes between machines of the same architecture  only migrates processes within its own server pool  Remote system calls  system calls are executed on submit machine  ClassAds  Use of idle resources

5 © 2005 UC Regents5 When to use Condor ?  parameter studies  embarrassingly parallel  high-throughput computing where subjobs do not need to communicate  long computation

6 © 2005 UC Regents6 Condor Pool on Rocks  one condor pool per Rocks cluster  frontend:  central manager  submit  compute node  submit  execute

7 © 2005 UC Regents7 Condor daemons  frontend condor_mastermanages other daemons condor_collectorcollects info about computers and jobs condor_negotiatordecides what/where to run condor_scheddallows job submission condor_shadowwatches the running job (only when jobs are active)  compute node condor_mastermanages other daemons condor_startd allows jobs to be started condor_scheddallows job submission

8 © 2005 UC Regents8 Basic commands  condor_qshows jobs queue  condor_submitsubmit a job  condor_rm remove jobs from the queue  condor_compilelink with condor libraries  condor_config_valquery configuration values  condor_statusshows pool status

9 © 2005 UC Regents9 Roadmap to run condor jobs  code preparation  job must be run as a background batch job (no user IO)  If must, create files with needed input/keystrokes  If possible, relink with condor libraries  create submit description files  submit jobs  monitor jobs

10 © 2005 UC Regents10 Condor universes Universe - run time environment  Standard  Handles system calls by returning them to submit machine  Provides mechanisms to checkpoitn and migrate partially compelted job  Must relink with condor libraries  Vanila  No checkpoint or migration  No relinking (3rd party binary)  Input/output files reside on shared file system or use Condor transfer mechanism  PVM  MPI (deprecated)  Java  Parallel  Globus

11 © 2005 UC Regents11 DAG jobs  Complex sequence of jobs B3 A B2 C B1

12 © 2005 UC Regents12 More info on jobs  Condor documentation: http://www.cs.wisc.edu/condor/manual/v6.7.12  For security reasons do not run jobs  As root  Any user with GID=0 (wheel)  Length limits in submit files:  Path names < 256  Command line args < 4096  If error check the log files  Specified by user for a job  Specified by admin in config files  To find files: condor_config_val -config


Download ppt "Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya."

Similar presentations


Ads by Google