MWDriver: An Object-Oriented Library for Master-Worker Applications Mike Yoder, Jeff Linderoth, Jean-Pierre Goux June 3rd, 1999.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Support for Fault Tolerance (Dynamic Process Control) Rich Graham Oak Ridge National Laboratory.
MPI Message Passing Interface
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki.
Chapter 1 Object Oriented Programming. OOP revolves around the concept of an objects. Objects are crated using the class definition. Programming techniques.
Reference: Message Passing Fundamentals.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Distributed Computations
MPICH-V: Fault Tolerant MPI Rachit Chawla. Outline  Introduction  Objectives  Architecture  Performance  Conclusion.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Programming Languages and Paradigms Object-Oriented Programming.
Introduction to .Net Framework
MWDriver: An Object-Oriented Library for Master-Worker Applications Mike Yoder, Jeff Linderoth, Jean-Pierre Goux February 26, 1999.
PVM. PVM - What Is It? F Stands for: Parallel Virtual Machine F A software tool used to create and execute concurrent or parallel applications. F Operates.
1. Outline 4 functions of a typical operating system of a PC(4) Resource management Operating systems organise how to: Load programs from backing storage.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
chap13 Chapter 13 Programming in the Large.
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
10/16/ Realizing Concurrency using the thread model B. Ramamurthy.
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Distributed Databases
C++ Programming Basic Learning Prepared By The Smartpath Information systems
1 Introduction to Software Testing. Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Chapter 1 2.
Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor.
Support Across The Board ™ Visual DSP Kernel (VDK)
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.
12/14/2015 Concept of Test Driven Development applied to Embedded Systems M. Smith University of Calgary, Canada 1 Automated Testing Environment Concepts.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen Department of Computer Sciences University of Wisconsin-Madison Jeff Linderoth, Argonne.
Jichuan Chang Computer Sciences Department University of Wisconsin-Madison MW – A Framework to Support.
Data Structures and Algorithms in Parallel Computing Lecture 4.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.
JAVA: An Introduction to Problem Solving & Programming, 6 th Ed. By Walter Savitch ISBN © 2012 Pearson Education, Inc., Upper Saddle River,
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Copyright © 2009 Ericsson, Made available under the Eclipse Public License.
8/25/2005IEEE PacRim The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
PVM and MPI.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Designing and Implementing an ETL Framework
Introduction to Operating Systems
Automated Testing Environment
Realizing Concurrency using Posix Threads (pthreads)
MAPREDUCE TYPES, FORMATS AND FEATURES
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MWDriver: An Object-Oriented Library for Master-Worker Applications Mike Yoder, Jeff Linderoth, Jean-Pierre Goux June 3rd, 1999

Talk Outline MWDriver review New features –Logical checkpointing of the master –Task list management –Miscellaneous Future work

Introduction to MW Provide an object-oriented framework to develop master/worker applications Use Condor-PVM to handle acquiring / releasing nodes, message passing MW is fault tolerant –Handles workers arriving / leaving –Handles workers getting suspended and resumed Assigns tasks to workers

More MW User must implement three classes –MWDriver Setup; packing initial data; acting on completed tasks –MWTask Definition of one unit of work Holds the work to be done and the results –MWWorker How to execute some work, given a task

Master Wid 1 Wid 2Wid 3 Wid 4 W1 W2W3 W4 Workers T2 T3 T4 T5 Running To Do T6 T7 T8... Global Data

Checkpointing - Goals Save progress in case of failure Ease of use Hide details from the user Automatic restart from a checkpoint file

Checkpointing the Master Virtual functions to implement: –In the MWDriver: write_master_state() read_master_state() –In the MWTask: write_ckpt_info() read_ckpt_info() Use of set_checkpoint_frequency()

Checkpointing - Internals Write periodic ckpt. Upon restart: –Check for ckpt. file –If it exists, restart from it. –All tasks on to-do list at start. Checkpoint file: MWDriver internal info write_master_state() write_ckpt_info() of all tasks on the running list write_ckpt_info() of all tasks on the to-do list

Task List Management - Goals Allow users to specify how to access the task list Hide implementation details Not efficiency! –Implemented as singly linked list –Sorting done in O(n 2 ) time

Task List Management - Implementation Each task is assigned a “MWKey” through a user specified function. –MWKey is simply a double –Lower is better –The function takes an MWTask and returns an MWkey The user can change the key at any time –set_task_key_function( MWKey(*)( MWTask * ) );

Using MWKeys Adding tasks –ADD_AT_BEGIN, ADD_AT_END, ADD_BY_KEY Retrieving tasks –GET_FROM_BEGIN, GET_BY_KEY Sort the task list: –int sort_task_list() The task list can be “fathomed” –int delete_tasks_worse_than( MWKey );

Example 1 Implementing “Best Bound” MWKey taskkey1( MWTask *t ) { BBTask *bt = dynamic_cast ( t ); return (MWKey) bt->lower_bound; } // In get_userinfo() set_task_add_mode( ADD_BY_KEY ); set_task_retrieval_mode( GET_FROM_BEGIN ); set_task_key_function( &taskKey1 );

Example 2 A branch and bound strategy that goes depth-first, but occasionally chooses to evaluate the node with the best lower bound value. // In get_userinfo() … set_task_add_mode( ADD_TO_BEGIN ); set_task_retrieval_mode ( GET_FROM_BEGIN ); set_task_key_function ( &taskKey1 ); // In act_on_completed_task() if ( best_bound_move ) set_task_retrieval_mode( GET_BY_KEY ); else set_task_retrieval_mode( GET_FROM_BEGIN );

Other MW Goodies rematch_task_to_workers() {un}pack_driver_task_data() New statistics class –Stats now collected across checkpoints MWprintf() –Just like printf(), but has “debug levels” Code skeleton

Future Work Heterogeneous worker support Gathering workers faster Ranking workers by abilities –Processor speed, memory, etc –Communication bandwidth/latency More…. suggestions?