Master-Worker Tutorial Condor Week 2006

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Threads Relation to processes Threads exist as subsets of processes Threads share memory and state information within a process Switching between threads.
Practical techniques & Examples
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Task Scheduling and Distribution System Saeed Mahameed, Hani Ayoub Electrical Engineering Department, Technion – Israel Institute of Technology
C++ fundamentals.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
Parallel Optimization Tools for High Performance Design of Integrated Circuits WISCAD VLSI Design Automation Lab Azadeh Davoodi.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
Spring/2002 Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads 1 Reusing threads.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor.
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Jichuan Chang Computer Sciences Department University of Wisconsin-Madison MW – A Framework to Support.
MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison
MWDriver: An Object-Oriented Library for Master-Worker Applications Mike Yoder, Jeff Linderoth, Jean-Pierre Goux June 3rd, 1999.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Copyright © SoftTree Technologies, Inc. DB Log Expert From Oracle Admin Manual: “The best way to determine the appropriate number of online redo.
Cs431-cotter1 Processes and Threads Tanenbaum 2.1, 2.2 Crowley Chapters 3, 5 Stallings Chapter 3, 4 Silberschaz & Galvin 3, 4.
2.2 Threads  Process: address space + code execution  There is no law that states that a process cannot have more than one “line” of execution.  Threads:
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
PVM and MPI.
Apache Ignite Compute Grid Research Corey Pentasuglia.
cs612/2002sp/projects/ CS612 Term Projects cs612/2002sp/projects/
SPiiPlus Training Class
The build process + misc
Remote execution of long-running CGIs
Checkpoint/restart in Slurm: current status and new developments
Pegasus WMS Extends DAGMan to the grid world
Chapter 1: Introduction to Systems Analysis and Design
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Writing Grid Portlets.
Self Healing and Dynamic Construction Framework:
Inheritance and Big O And your first reading assignment
High Availability in HTCondor
Migratory File Services for Batch-Pipelined Workloads
Review: Two Programming Paradigms
Building Grids with Condor
CS 153: Concepts of Compiler Design November 30 Class Meeting
NGS computation services: APIs and Parallel Jobs
INDIGO – DataCloud PaaS
Condor: Job Management
Design III Chapter 13 9/20/2018 Crowley OS Chap. 13.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Introduction to Makeflow and Work Queue
Object Oriented Analysis and Design
Troubleshooting Your Jobs
Rui Wu, Jose Painumkal, Sergiu M. Dascalu, Frederick C. Harris, Jr
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Chapter 1: Introduction to Systems Analysis and Design
MAPREDUCE TYPES, FORMATS AND FEATURES
MULTITHREADING PROGRAMMING
A Map-Reduce System with an Alternate API for Multi-Core Environments
Condor-G Making Condor Grid Enabled
GLOW A Campus Grid within OSG
Operating System Overview
Frieda meets Pegasus-WMS
Chapter 1: Introduction to Systems Analysis and Design
MapReduce: Simplified Data Processing on Large Clusters
Condor-G: An Update.
Object Oriented Design
Presentation transcript:

Master-Worker Tutorial Condor Week 2006

Agenda What is M-W When to use M-W How to build a simple M-W application Q & A

Why M-W? M-W addresses a weakness in Condor: Short jobs Also, for dynamic, parallel workflows

A Condor Job… A Condor job is like money in the bank

An easy solution: Why not just wrap up smaller jobs into a bigger Condor job? Partial failures? Load balancing? Dynamic creation of work? B

Solution: Lightweight Tasks Multiplexed on top of Jobs Process : Thread :: Condor Job : MW Task MWTask dispatch in milliseconds, Condor job can take minutes An MW Task is like money in your pocket!

MW is… C++ Framework To re-use condor worker jobs To each run many tasks Results in very parallel application

MW is not MPI General parallel programming scheme

MW in action T Worker Master exe T T T T T T T T T Worker T condor_submit Worker Submit machine

You Must Write 3 Classes Subclasses of … MWDriver MWTask MWWorker Master exe Worker exe

Your_MWTask Subclass MWTask Data members for inputs Data member for results Serialization of inputs and results Distinct instances on each side

The Four Task Methods void MyTask::pack_work(void); void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void); Also ctor/dtor!

RMComms Abstraction for communication (and some other stuff…) RMC->pack(int *array, int length); RMC->unpack(int *array, int length);

MWWorker Just one method: executeTask(MWTask *t) Also ctor/dtor!

MWDriver get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements); setup_initial_tasks(int num_tasks, MWTask ***init_tasks) act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t) Also ctor/dtor

Putting it all together: new_skel ./new_skel MY_PROJECT Use configure –help for options make

Debugging with Independent Mode Special RMComm for debugging Single process, can run under gdb

Running on the Grid… Just launch the appropriate master condor_q to see it in action

Advice for Large Runs Use personal condor Use checkpointing! Flock, glide-in, schedd-on-side, hobblein Use checkpointing! Set_worker_increment high

User-level Checkpointing MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *) MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)

Example codes with MW Matmul Blackbox knapsack

MW Philosophy Reuse either code or concept Key idea: Late binding

Other resources http://www.cs.wisc.edu/condor/mw Online manual MW-users mailing list

Thank You! Questions? MW Home page: http://www.cs.wisc.edu/condor/mw