Download presentation
Presentation is loading. Please wait.
1
Master-Worker Tutorial Condor Week 2006
2
Agenda What is M-W When to use M-W
How to build a simple M-W application Q & A
3
Why M-W? M-W addresses a weakness in Condor:
Short jobs Also, for dynamic, parallel workflows
4
A Condor Job… A Condor job is like money in the bank
5
An easy solution: Why not just wrap up smaller jobs into a bigger Condor job? Partial failures? Load balancing? Dynamic creation of work? B
6
Solution: Lightweight Tasks Multiplexed on top of Jobs
Process : Thread :: Condor Job : MW Task MWTask dispatch in milliseconds, Condor job can take minutes An MW Task is like money in your pocket!
7
MW is… C++ Framework To re-use condor worker jobs
To each run many tasks Results in very parallel application
8
MW is not MPI General parallel programming scheme
9
MW in action T Worker Master exe T T T T T T T T T Worker T
condor_submit Worker Submit machine
10
You Must Write 3 Classes Subclasses of … MWDriver MWTask MWWorker
Master exe Worker exe
11
Your_MWTask Subclass MWTask Data members for inputs
Data member for results Serialization of inputs and results Distinct instances on each side
12
The Four Task Methods void MyTask::pack_work(void);
void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void); Also ctor/dtor!
13
RMComms Abstraction for communication
(and some other stuff…) RMC->pack(int *array, int length); RMC->unpack(int *array, int length);
14
MWWorker Just one method: executeTask(MWTask *t) Also ctor/dtor!
15
MWDriver get_userinfo(int argc, char **argv)
RMC->add_executable(char *exe, char *requirements); setup_initial_tasks(int num_tasks, MWTask ***init_tasks) act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t) Also ctor/dtor
16
Putting it all together: new_skel
./new_skel MY_PROJECT Use configure –help for options make
17
Debugging with Independent Mode
Special RMComm for debugging Single process, can run under gdb
18
Running on the Grid… Just launch the appropriate master
condor_q to see it in action
19
Advice for Large Runs Use personal condor Use checkpointing!
Flock, glide-in, schedd-on-side, hobblein Use checkpointing! Set_worker_increment high
20
User-level Checkpointing
MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *) MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)
21
Example codes with MW Matmul Blackbox knapsack
22
MW Philosophy Reuse either code or concept Key idea: Late binding
23
Other resources http://www.cs.wisc.edu/condor/mw Online manual
MW-users mailing list
24
Thank You! Questions? MW Home page:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.