Master-Worker Tutorial Condor Week 2006
Agenda What is M-W When to use M-W How to build a simple M-W application Q & A
Why M-W? M-W addresses a weakness in Condor: Short jobs Also, for dynamic, parallel workflows
A Condor Job… A Condor job is like money in the bank
An easy solution: Why not just wrap up smaller jobs into a bigger Condor job? Partial failures? Load balancing? Dynamic creation of work? B
Solution: Lightweight Tasks Multiplexed on top of Jobs Process : Thread :: Condor Job : MW Task MWTask dispatch in milliseconds, Condor job can take minutes An MW Task is like money in your pocket!
MW is… C++ Framework To re-use condor worker jobs To each run many tasks Results in very parallel application
MW is not MPI General parallel programming scheme
MW in action T Worker Master exe T T T T T T T T T Worker T condor_submit Worker Submit machine
You Must Write 3 Classes Subclasses of … MWDriver MWTask MWWorker Master exe Worker exe
Your_MWTask Subclass MWTask Data members for inputs Data member for results Serialization of inputs and results Distinct instances on each side
The Four Task Methods void MyTask::pack_work(void); void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void); Also ctor/dtor!
RMComms Abstraction for communication (and some other stuff…) RMC->pack(int *array, int length); RMC->unpack(int *array, int length);
MWWorker Just one method: executeTask(MWTask *t) Also ctor/dtor!
MWDriver get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements); setup_initial_tasks(int num_tasks, MWTask ***init_tasks) act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t) Also ctor/dtor
Putting it all together: new_skel ./new_skel MY_PROJECT Use configure –help for options make
Debugging with Independent Mode Special RMComm for debugging Single process, can run under gdb
Running on the Grid… Just launch the appropriate master condor_q to see it in action
Advice for Large Runs Use personal condor Use checkpointing! Flock, glide-in, schedd-on-side, hobblein Use checkpointing! Set_worker_increment high
User-level Checkpointing MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *) MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)
Example codes with MW Matmul Blackbox knapsack
MW Philosophy Reuse either code or concept Key idea: Late binding
Other resources http://www.cs.wisc.edu/condor/mw Online manual MW-users mailing list
Thank You! Questions? MW Home page: http://www.cs.wisc.edu/condor/mw