Download presentation
Presentation is loading. Please wait.
Published byIsabelle Caswell Modified over 9 years ago
1
Greg Thain Computer Sciences Department University of Wisconsin-Madison gthain@cs.wisc.edu http://www.cs.wisc.edu/condor Condor Parallel Universe
2
www.cs.wisc.edu/condor Overview › Task vs. Job Parallelism › New Condor support for Task- Parallelism › Other goodies
3
www.cs.wisc.edu/condor The Talk in one Slide Parallel Universe can run any * task parallel job Not just MPICH 1.2.4 Not just MPI…
4
www.cs.wisc.edu/condor Job vs Task Parallelism › Condor historically focused on Job Parallelism › Job parallelism either manually or via DAGman › Rest of talk on task parallelism › Can also get task parallel via pvm or MW
5
www.cs.wisc.edu/condor Parallel Universe › Adaptation of MPI universe › Modifications based on experience with MPI › User feedback › But, more than just MPI
6
www.cs.wisc.edu/condor MPI lifecycle without Condor › Lam Version 1. lamboot lamboot -ssi boot ssh machine_file 2. mpirun mpirun -np 8 exe arg1 arg2... 3. lamhalt lamhalt
7
www.cs.wisc.edu/condor Scheduling › Need “Dedicated Scheduler” "Dedicated" has a specific Condor meaning Nodes running MPI require a dedicated scheduler A Given machine can have many opportunistic schedulers ... but only 1 dedicated scheduler
8
www.cs.wisc.edu/condor DedicatedScheduler surprises › DedicatedScheduler co-opts normal negotiation cycle › Preemption and scheduling work differently than opportunistic › DedicatedScheduler schedules First- Fit, sorted by UserJobPrio › Condor_q –analyze mystery!
9
www.cs.wisc.edu/condor Job startup › Same file transfer, etc. as Vanilla › One shadow, many starters › Starter runs sshd on all machines, does key exchange › Starter runs the exe on first machine (head node, Rank0)
10
www.cs.wisc.edu/condor Your script Here › Script on the head node has contact file › We provide samples for LAM, MPICH › We try to mimic “by hand” startup › Use condor_ssh to start remote jobs › When script exits, condor cleans up
11
www.cs.wisc.edu/condor Parallel Example Submit Machine Execute Machines Schedd Shadow Startd Sshd Script Job starter
12
www.cs.wisc.edu/condor Example submit file Universe = Parallel # executable is a script executable = script # the real binary transfer_input_files = executable arguments = arg1 arg2 arg3 machine_count = 8 output = out.$(Cluster).$(NODE) queue
13
www.cs.wisc.edu/condor Example Script chmod 755 simple lamboot –ssi boot rsh $MACHINE_FILE mpirun –np $NO_MACHINES simple lamhalt
14
www.cs.wisc.edu/condor Example submit file 2 Universe = Parallel Requirements = (Hostname == “somemachine”) queue Requirements = (Hostname != “somemachine”) queue 7
15
www.cs.wisc.edu/condor Example Script 2 mach1 = `sed –n 1p $MACHINE_FILE` mach2 = `sed –n 2p $MACHINE_FILE`./server & ssh $mach1 client_app ssh $mach2 client_app wait
16
www.cs.wisc.edu/condor Summary › With Parallel Universe in Condor 6.8 comes: › Support for most MPI implementations (some scripting required) › Somewhat better MPI scheduling › Better node placement via condor matchmaking
17
www.cs.wisc.edu/condor Questions? › Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.