Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor.

Slides:



Advertisements
Similar presentations
NGS computation services: API's,
Advertisements

MPI Message Passing Interface
Practical techniques & Examples
Programming Paradigms Introduction. 6/15/2005 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved. L1:
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences Qun Chen, Jin-Ho.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 6: Threads Chapter 4.
OBJECT ORIENTED PROGRAMMING M Taimoor Khan
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
Object-Oriented PHP (1)
Computer Science 162 Section 1 CS162 Teaching Staff.
1 Data Structures Data Structures Topic #2. 2 Today’s Agenda Data Abstraction –Given what we talked about last time, we need to step through an example.
Diffusion scheduling in multiagent computing system MotivationArchitectureAlgorithmsExamplesDynamics Robert Schaefer, AGH University of Science and Technology,
C++ fundamentals.
1 Web Based Interface for Numerical Simulations of Nonlinear Evolution Equations Ryan N. Foster & Thiab Taha Department of Computer Science The University.
Using Personal Condor to Solve Quadratic Assignment Problems Jeff Linderoth Axioma, Inc.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Task Farming on HPCx David Henty HPCx Applications Support
Programming Languages and Paradigms Object-Oriented Programming.
Polymorphism. Introduction ‘one name multiple forms’ Implemented using overloaded functions and operators Early binding or static binding or static linking.
MWDriver: An Object-Oriented Library for Master-Worker Applications Mike Yoder, Jeff Linderoth, Jean-Pierre Goux February 26, 1999.
Austin Java Users Group developerWorks article – µActor Library BARRY FEIGENBAUM, PH. D. 02/26/13.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Programming Languages and Paradigms Object-Oriented Programming (Part II)
Parallel Optimization Tools for High Performance Design of Integrated Circuits WISCAD VLSI Design Automation Lab Azadeh Davoodi.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Condor Project Computer Sciences Department University of Wisconsin-Madison Case Studies of Using.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
C++ Programming Basic Learning Prepared By The Smartpath Information systems
CS333 Intro to Operating Systems Jonathan Walpole.
1 Programming Paradigms Object Orientated Programming Paradigm (OOP)
Spring/2002 Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads 1 Reusing threads.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Remote Procedure Calls CS587x Lecture Department of Computer Science Iowa State University.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 14 Threads 2 Read Ch.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen Department of Computer Sciences University of Wisconsin-Madison Jeff Linderoth, Argonne.
Jichuan Chang Computer Sciences Department University of Wisconsin-Madison MW – A Framework to Support.
CPS Inheritance and the Yahtzee program l In version of Yahtzee given previously, scorecard.h held information about every score-card entry, e.g.,
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Chapter 4 Generic Vector Class. Agenda A systemic problem with Vector of Object – Several approaches at a solution – Generic structures Converting classes.
An Introduction to MPI (message passing interface)
Solving Multistage Stochastic Linear Programs on the Computational Grid Jerry Shen June 8, 2004.
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen University of Wisconsin-Madison Jeffrey Linderoth Argonne National Laboratories.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
MWDriver: An Object-Oriented Library for Master-Worker Applications Mike Yoder, Jeff Linderoth, Jean-Pierre Goux June 3rd, 1999.
Reading from a file, Sorting, and a little Searching Data Structures and Algorithms CS 244 Brent M. Dingle, Ph.D. Department of Mathematics, Statistics,
Mid-Year Review. Coding Problems In general, solve the coding problems by doing it piece by piece. Makes it easier to think about Break parts of code.
Pegasus WMS Extends DAGMan to the grid world
Java Array Object Chuen-Liang Chen Department of Computer Science
Inheritance and Big O And your first reading assignment
CS 153: Concepts of Compiler Design November 30 Class Meeting
A Distributed Bucket Elimination Algorithm
Master-Worker Tutorial Condor Week 2006
What’s New in Work Queue
VIRTUAL FUNCTIONS RITIKA SHARMA.
Presentation transcript:

Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor Barcelona, 2006

2 Agenda  Extended user’s tutorial  Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing  Case studies, and a discussion of your application‘s needs

3 Why Master Worker?  MW addresses a weakness in Condor: Short jobs  Excellent for dynamic, parallel workflows

4 A Workflow Problem A problem requires that we do A 60,000 times, and we do B 100,000 times  A takes 1 second  B takes 3 seconds Computation time for the problem is (60000 x 1) + ( x 3) = 360,000 seconds or 100 hours

5 Condor Runs the Workflow Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small) Time for Condor to do the problem is (60000 x 21) + ( x 23) = 3,560,000 seconds or 989 hours

6 A Condor Job…

7  Bundle several As or Bs into a single Condor job  Must address further issues:  Partial failures  Load balancing  Dynamic creation of work An Often Considered Solution A A A One Condor job

8 Basics of MW The master gives tasks to the workers.

9 Workers and Tasks Each worker serially takes on tasks, as assigned by the master feed me change diaper bathe me one worker

10 Relating MW to Condor  There is 1 master  The master determines the number of workers  Each worker is a Condor job  Each worker receives tasks serially  Many workers do tasks at the same time (in parallel)  Workers communicate only with the master

11 Solution: Lightweight Tasks Multiplexed on top of Jobs The analogy: Process is to Thread as Condor Job is to an MW Task A Condor job may take minutes to create and dispatch; an MWTask dispatch takes milliseconds

12 MW is  C++ Framework  A way to re-use Condor worker jobs  Each worker may run many tasks  Results in a very parallel application

13 MW is not  MPI (Message Passing Interface)  General parallel programming scheme

14 MW in action condor_submit Submit machine Master exe T T T Worker T T T T T

15 You Must Write 3 Classes, the Subclasses of... MWDriver MWTask MWWorker Master exe Worker exe

16 An MWTask  Subclass MWTask  Data members for inputs  Data member for results  Serialization of inputs and results  Distinct instances on each side

17 The Four Task Methods  void MyTask::pack_work(void);  void MyTask::unpack_work(void);  void MyTask::pack_results(void);  void MyTask::unpack_results(void);  Also constructors and destructors!

18 RMC  Resource Management and Communication  An abstraction to set up communication, to specify resource requirements, etc.  RMC->pack(int *array, int length);  RMC->unpack(int *array, int length);

19 MWWorker Just one method: executeTask(MWTask *t) Also constructor and destructor!

20 MWDriver (the master)  get_userinfo(int argc, char **argv)  RMC->add_executable(char *exe, char *requirements);  setup_initial_tasks(int num_tasks, MWTask ***init_tasks)  act_on_completed_task(MWTask *t)  RMC->add_task(MWTask *t)  Also constructor and destructor

21 MWTask ***init_tasks task array of pointers to tasks pointer to the array

22 MWDriver (the master)  get_userinfo(int argc, char **argv)  RMC->add_executable(char *exe, char *requirements);  setup_initial_tasks(int num_tasks, MWTask ***init_tasks)  act_on_completed_task(MWTask *t)  RMC->add_task(MWTask *t)  Also constructor and destructor

23 Putting it all together: examples/new_skel ./new_app MY_PROJECT A Perl script to create appropriately named files containing skeleton code  Use configure –help for options  make

24 Running an application  Just launch the appropriate master  use condor_q to see it in action

25 Real MW Applications  MWFATCOP (Chen, Ferris, Linderoth) A branch and cut code for linear integer programming  MWMINLP (Goux, Leyffer, Nocedal) A branch and bound code for nonlinear integer programming  MWQPBB (Linderoth) A (simplicial) branch and bound code for solving quadratically constrained quadratic programs  MWAND (Linderoth, Shen) A nested decomposition based solver for multistage stochastic linear programming  MWATR (Linderoth, Shapiro, Wright) A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality.  MWQAP (Anstreicher, Brixius, Goux, Linderoth) A branch and bound code for solving the quadratic assignment problem

26 Other resources   Online manual  MW-users mailing list

27 Extra Slides

28 Advice for Large Runs  Use Personal Condor  Flock, glidein, schedd-on-side, hobblein  Use checkpoints!  Set worker_increment high

29 Debugging with Independent Mode  Special RMComm for debugging  Single process, can run under gdb

30 MW Philosophy  Reuse either code or concept  Key idea: Late binding

31 User-level Checkpoints  MWTask::write_chkpt_info(FILE *)  MWTask::read_chkpt_info(FILE *)  MWDriver::read_master_state(FILE *)  MWDriver::write_master_state(FILE *)

32 Example codes with MW  Matmul  Blackbox  knapsack

33 More on MW   Version 0.2 is the latest  It is more stable than the version number suggests!  Mailing list available for discussion  Active development by the Condor team