Work Queue: A Scalable Master/Worker Framework Peter Bui June 29, 2010.

Slides:



Advertisements
Similar presentations
MQ Series Cross Platform Dominant Messaging sw – 70% of market Messaging API same on all platforms Guaranteed one-time delivery Two-Phase Commit Wide EAI.
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Overview and Demonstration of declarative workflows in SharePoint using Microsoft SharePoint Designer 2007 Kevin Hughes MCT, MCITP, MCSA, MCTS, MCP, Network+,
MPI Message Passing Interface
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Practical techniques & Examples
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Why python? Automate processes Batch programming Faster Open source Easy recognition of errors Good for data management What is python? Scripting programming.
Parasol Architecture A mild case of scary asynchronous system stuff.
FILE TRANSFER PROTOCOL Short for File Transfer Protocol, the protocol for exchanging files over the Internet. FTP works in the same way as HTTP for transferring.
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
Introduction to Makeflow Li Yu University of Notre Dame 1.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.
Lecture 8 Configuring a Printer-using Magic Filter Introduction to IP Addressing.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Christopher Jeffers August 2012
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
June 6 th – 8 th 2005 Deployment Tool Set Synergy 2005.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Python – Part 1 Python Programming Language 1. What is Python? High-level language Interpreted – easy to test and use interactively Object-oriented Open-source.
MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.
Google’s MapReduce Connor Poske Florida State University.
Software Sustainability Institute Online reconstruction (Manchego) Status report 09/02/12 Mike Jackson
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Computer Science 320 Load Balancing with Clusters.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Linux Operations and Administration
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Wednesday NI Vision Sessions
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
GRID COMPUTING.
OpenPBS – Distributed Workload Management System
Integration of Singularity With Makeflow
Introduction to Makeflow and Work Queue
MapReduce Simplied Data Processing on Large Clusters
Introduction to Makeflow and Work Queue
What’s New in Work Queue
Creating Custom Work Queue Applications
Working in The IITJ HPC System
Presentation transcript:

Work Queue: A Scalable Master/Worker Framework Peter Bui June 29, 2010

Master/Worker Model Central Master application o Divides work into tasks o Sends tasks to Workers o Gathers results Distributed collection of Workers o Receives input and executable files o Runs executable files o Returns output files

Work Queue versus MPI Work Queue – Number of workers dynamic –Scale up to large number of workers (100s s) –Reliable and fault tolerant at the task level –Allows for heterogeneous deployment environments –Workers communicate only with Master MPI – Number of workers static –Scale up to limited number of workers (16, 32, 64) –Reliable at application level but no fault tolerance –Requires homogeneous deployment environment –Workers can communicate with anyone

Success Stories Makeflow SAND Wavefront All-Pairs

Architecture (Overview)

Architecture (Master) Uses Work Queue library o Creates a Queue o Submits Tasks  Command  Input files  Output files o Library keeps tracks of Tasks  When a Worker is available, the library sends Tasks o When Tasks complete  Retrieve output files

Architecture (Workers) User start workers on any machine Contact Master and request work When Task is received, perform commutation, return results After set idle timeout, quit and cleanup

API Overview (Work Queue) Simple C API Work Queue o work_queue_create(int port) Create a new work queue. o work_queue_delete(struct work_queue *q) Delete a work queue. o work_queue_empty(struct work_queue *q) Determine whether there are any known tasks queued, running, or waiting to be collected.

API Overview (Task) Simple C API Task o work_queue_task_create(const char *command) Create a new task specification. o work_queue_task_delete(struct work_queue_task *t) Delete a task specification. o work_queue_task_specify_input_file(struct work_queue_task *t, const char *fname, const char *rname); Add input file specification. o work_queue_task_specify_output_file(struct work_queue_task *t, const char *rname, const char *fname); Add output file specification.

API Overview (Execution) Simple C API Execution o work_queue_submit(struct work_queue *q, struct work_queue_task *t) Submit a job to a work queue. o work_queue_wait(struct work_queue *q, int timeout) Wait for tasks to complete.

Software Configuration Web Information AFS $ setenv PATH ~ccl/software/cctools/bin:$PATH $ setenv PATH ~condor/software/bin:$PATH CRC $ module use /afs/nd.edu/user37/ccl/software/modulefiles $ module load cctools $ module load condor

Example 1: DConvert Goal: convert set of input images to specified format in parallel o Input:... o Output: converted images in specified format Skeleton: o ~pbui/www/scratch/workqueue-tutorial.tar.gz

DConvert (Preparation) Setup scratch workspace $ mkdir /tmp/$USER-scratch $ cd /tmp/$USER-scratch $ pwd Copy source tarball and extract it $ cp ~pbui/www/scratch/workqueue-tutorial.tar.gz. $ tar xzvf workqueue-tutorial.tar.gz $ cd workqueue-tutorial $ ls Open dconvert.c source file for editting $ gedit dconvert.c &

DConvert (TODO 1, 2, and 3) // TODO 1: include work queue header file #include "work_queue.h" // TODO 2: declare work queue and task structs struct work_queue *q; struct work_queue_task *t; // TODO 3: create work queue using default port q = work_queue_create(0);

DConvert (TODO 4, 5, 6) // TODO 4: create task, specify input and output file, submit task t = work_queue_task_create(command); work_queue_task_specify_input_file(t, input_file, input_file); work_queue_task_specify_output_file(t, output_file, output_file); work_queue_submit(q, t); // TODO 5: while work queue is empty wait for task, then delete returned task while (!work_queue_empty(q)) { t = work_queue_wait(q, 10); if (t) work_queue_task_delete(t); } // TODO 6: delete work queue work_queue_delete(q);

DConvert (Demonstration) Build and prepare application $ make $ cp /usr/share/pixmaps/*.png. Start batch of workers $ condor_submit_workers `hostname` Start application $./dconvert jpg *.png

Tips and Tricks (Debugging) Debugging Enable cctools debugging system o In master application:  debug_flags_set("wq");  debug_flags_set("debug"); o In workers:  work_queue_worker -d debug -d wq Incrementally test number of workers Failed Execution Include executable and dependencies as input files Right target platform (32-bit vs 64-bit, OS, etc.)

Tips and Tricks (Tasks) Tag Tasks Give a task an identifying tag so Master can keep track of it Use input and output buffers work_queue_task_specify_input_buf o Contents of buffer will be materialized as a file at worker task->output o Buffer that contains standard output of task Check task results task->result: result of task task->return_status: exit code of command line

Tips and Tricks (Batch) Custom Worker Environment Modify batch system specific submit scripts o condor_submit_workers  Set requirements o sge_submit_workers  Set environment  Set modules

Tips and Tricks (CRC) Submit master, find host, submit workers qsub myscript.sh #!/bin/csh master qstat -u | grep myscript.sh sge_submit_workers

Example 2: Mandelbrot Generator Goal: generate mandelbrot image o Input: o Output: mandelbrot image in PPM format Skeleton: o ~pbui/www/scratch/workqueue-tutorial.tar.gz

Mandelbrot (Overview) z(n+1) = z^2 + c Escape Time Algorithm For each pixel (r, c) in image calculate if corresponding point (x, y) escapes boundary Iterative algorithm where each pixel computation is independent Application design Master partitions image into tasks Workers compute Escape Time Algorithm on partitions

Mandelbrot (Naive Approach) Master For each pixel (r, c) in image (width x height) o Computer corresponding x, y o Submit task with for pixel with x, y  Pass x, y parameters as input buffer  Tag task with r, c values Wait for each task to complete: o Retrieve output of worker from task->output o Retrieve r, c from task->tag o Store pixel[r, c] = output Output pixels in PPM format

Mandelbrot (Naive Approach) Worker Read in parameters from input file: o x0, y0, max_iterations, black_value Perform Mandelbrot computation as specified from Wikipedia: o Output result ( iterations ) to standard out

Mandelbrot (Analysis) Problem Processing each pixel as a single task is inefficient o Too-fine grained o Overhead of sending parameters, running tasks, and retrieving results > than computation time Work Queue Golden Rule: Computation Time > Data Transfer Time + Task setup overhead

Mandelbrot (Better Approach) Send Rows Process groups of pixels rather than individual ones: o Send a row and have the worker return a series of results o Perhaps send multiple rows? Should take execution time from minutes to seconds

Mandelbrot (Demonstration) Build application $ make Start batch of workers $ condor_submit_workers `hostname` Start application $./mandelbrot_master > output.ppm $ display output.ppm

Advanced Features Fast Abort Allow Work Queue to pre-emptively kill slow tasks work_queue_activate_fast_abort(q, X) o X is the fast abort multiplier o if (runtime >= average_runtime * X ) fast_abort Scheduling Change how workers are selected o FCFS: first come, first serve o FILES: has the most cached files o TIME: fastest average turn around time Can be set for queue or for task

Advanced Features (More) Automatic Master Detection Start master with a project name: o setenv WORK_QUEUE_NAME="project_name" Enable master auto selection mode with workers o work_queue_worker -a -N "project_name" o work_queue_pool -T condor -a -N "project_name" Checkout master at Shut down workers work_queue_shut_down_workers

Web Resources Website User manual and C API documentation Bug Reports and Suggestions Python-API Experimental Python binding