Download presentation
Presentation is loading. Please wait.
Published byAvis Wells Modified over 9 years ago
1
Greg Thain Computer Sciences Department University of Wisconsin-Madison Gthain @ cs.wisc.edu http://www.cs.wisc.edu/condor Interactive MPI on Demand
2
www.cs.wisc.edu/condor Unix Tool Philosophy › 1) Individual tools do one thing well › 2) Communicate via ascii streams › 3) Are composable
3
www.cs.wisc.edu/condor The Paradox › Universal assent that it’s good › No one uses it (Except for shell one-liners) grep ^abc| sort | uniq –c | sort –n
4
www.cs.wisc.edu/condor More than just shell scripts Division in Unix processes provides: Restartabilty Better security Scalable across multi-core
5
www.cs.wisc.edu/condor For example… › Qmail: Secure, stable Implemented across ~dozen processes
6
www.cs.wisc.edu/condor Getting back to Condor… › Condor uses this in some places x-Gahp’s condor_master Replaceable shadow/starter pairs Multi_shadow vs. many shadow › But not everywhere schedd
7
www.cs.wisc.edu/condor Condor Daemons as Components › Very Successful strategy: Glide-in Personal-condor “Hoffman” and schedd’s as jobs Condor-c
8
www.cs.wisc.edu/condor Case Study: MPI on Demand › The problem: Have a pool with lots of machines Very-long running (weeks) vanilla jobs Need to run big, but short MPI Can’t reboot startds › Need Dedicated scheduler Requires dedicated machines
9
www.cs.wisc.edu/condor Possible Solutions › Add “suspension slot” Requires Reboot › Submit MPI job normally Preempts vanilla job
10
www.cs.wisc.edu/condor COD refresher › COD: Computing On Demand No Scheduling No File Transfer When COD runs, vanilla job suspends “Checkpoint to swap” Needs security on to work Explicitly allowed
11
www.cs.wisc.edu/condor Startd as COD job › Overview: › Launch personal condor › Run startds as COD jobs on base pool Report to personal Condor Base jobs suspend › Submit parallel job to personal Condor › Remove COD startds
12
www.cs.wisc.edu/condor Startd under COD: Details › Two condor_config files: careful! › COD provides no file transfer Can re-use existing startd binary Need to pre-stage or NFS config_file › Don’t lose claimid!
13
www.cs.wisc.edu/condor Example code › HOSTS=“a b c” › For h in hosts do; Condor_cod request –name $h > claimid.$h › For n in claimid.* do; Condor_cod activate –id `cat $n` -jobad ja
14
www.cs.wisc.edu/condor Cod JOB_AD › CMD = “/nfs/path/run-startd.sh” › IWD = “/tmp” › Out = “startd.out” › Err = “startd.err” › Universe = 5
15
www.cs.wisc.edu/condor Run-startd.sh › Mkdir –p p-condor/{spool,log,execute) › CONDOR_CONFIG=/nfs/new_config › Exec /usr/sbin/condor_master –f -t
16
www.cs.wisc.edu/condor Summary › Use condor daemons as components › Mix-and-match as needed
17
www.cs.wisc.edu/condor Questions? › Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.