Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand.

Similar presentations


Presentation on theme: "Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand."— Presentation transcript:

1 Greg Thain Computer Sciences Department University of Wisconsin-Madison Gthain @ cs.wisc.edu http://www.cs.wisc.edu/condor Interactive MPI on Demand

2 www.cs.wisc.edu/condor Unix Tool Philosophy › 1) Individual tools do one thing well › 2) Communicate via ascii streams › 3) Are composable

3 www.cs.wisc.edu/condor The Paradox › Universal assent that it’s good › No one uses it  (Except for shell one-liners) grep ^abc| sort | uniq –c | sort –n

4 www.cs.wisc.edu/condor More than just shell scripts Division in Unix processes provides: Restartabilty Better security Scalable across multi-core

5 www.cs.wisc.edu/condor For example… › Qmail:  Secure, stable  Implemented across ~dozen processes

6 www.cs.wisc.edu/condor Getting back to Condor… › Condor uses this in some places  x-Gahp’s  condor_master  Replaceable shadow/starter pairs  Multi_shadow vs. many shadow › But not everywhere  schedd

7 www.cs.wisc.edu/condor Condor Daemons as Components › Very Successful strategy:  Glide-in  Personal-condor  “Hoffman” and schedd’s as jobs  Condor-c

8 www.cs.wisc.edu/condor Case Study: MPI on Demand › The problem:  Have a pool with lots of machines  Very-long running (weeks) vanilla jobs  Need to run big, but short MPI  Can’t reboot startds › Need Dedicated scheduler  Requires dedicated machines

9 www.cs.wisc.edu/condor Possible Solutions › Add “suspension slot”  Requires Reboot › Submit MPI job normally  Preempts vanilla job

10 www.cs.wisc.edu/condor COD refresher › COD: Computing On Demand  No Scheduling  No File Transfer  When COD runs, vanilla job suspends “Checkpoint to swap”  Needs security on to work  Explicitly allowed

11 www.cs.wisc.edu/condor Startd as COD job › Overview: › Launch personal condor › Run startds as COD jobs on base pool  Report to personal Condor  Base jobs suspend › Submit parallel job to personal Condor › Remove COD startds

12 www.cs.wisc.edu/condor Startd under COD: Details › Two condor_config files: careful! › COD provides no file transfer  Can re-use existing startd binary  Need to pre-stage or NFS config_file › Don’t lose claimid!

13 www.cs.wisc.edu/condor Example code › HOSTS=“a b c” › For h in hosts do;  Condor_cod request –name $h > claimid.$h › For n in claimid.* do;  Condor_cod activate –id `cat $n` -jobad ja

14 www.cs.wisc.edu/condor Cod JOB_AD › CMD = “/nfs/path/run-startd.sh” › IWD = “/tmp” › Out = “startd.out” › Err = “startd.err” › Universe = 5

15 www.cs.wisc.edu/condor Run-startd.sh › Mkdir –p p-condor/{spool,log,execute) › CONDOR_CONFIG=/nfs/new_config › Exec /usr/sbin/condor_master –f -t

16 www.cs.wisc.edu/condor Summary › Use condor daemons as components › Mix-and-match as needed

17 www.cs.wisc.edu/condor Questions? › Thank You!


Download ppt "Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand."

Similar presentations


Ads by Google