What’s New in Work Queue

Slides:



Advertisements
Similar presentations
1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Introduction to Makeflow Li Yu University of Notre Dame 1.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.
Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Design of an Active Storage Cluster File System for DAG Workflows Patrick Donnelly and Douglas Thain University of Notre Dame 2013 November 18 th DISCS-2013.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
Fabien Viale 1 Matlab & Scilab Applications to Finance Fabien Viale, Denis Caromel, et al. OASIS Team INRIA -- CNRS - I3S.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Introduction to Makeflow and Work Queue Prof. Douglas Thain, University of Notre Dame
Analyzing LHC Data on 10K Cores with Lobster and Work Queue Douglas Thain (on behalf of the Lobster Team)
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Next Generation of Apache Hadoop MapReduce Owen
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Introduction to Makeflow and Work Queue
Elastic Computing Resource Management Based on HTCondor
Working With Azure Batch AI
Integrated genome analysis using
Scaling Up Scientific Workflows with Makeflow
Introduction to Makeflow and Work Queue
NGS computation services: APIs and Parallel Jobs
Integration of Singularity With Makeflow
Grid Means Business OGF-20, Manchester, May 2007
Introduction to Makeflow and Work Queue
Applications SPIDAL MIDAS ABDS
Introduction to Makeflow and Work Queue
Weaving Abstractions into Workflows
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue with Containers
Creating Custom Work Queue Applications
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

What’s New in Work Queue Michael Albrecht, University of Notre Dame CCL Workshop, June 2012

Overview New “Batch Job” Systems Work Queue Enhancements Moab / Cluster MPI Queue Work Queue Enhancements Hierarchical Work Queue

Batch Job Systems Abstraction Layer Library for generic task submission Used by Makeflow

Moab & other Clusters Similar to “SGE” Explicitly support Moab scheduler Support other, similar schedulers Set name, submit and remove commands moab

MPI Queue Workflow W W W W W W W W W W W W W W

MPI Queue Workflow W W W W F W W W W W

MPI Queue Enables arbitrary computation on “MPI-only” clusters Ranks 1-N talk to Rank 0, which acts as a “foreman” Assumes shared parallel filesystem for cluster API very similar to Work Queue Fully supported by Batch Job/Makeflow

Work Queue

Work Queue is Wonderful Easily harness 100’s-1000’s of cores Combine multiple resources for one project Dynamically scale computational resources Private Cluster Campus Condor Pool Public Cloud Provider Shared SGE Makefile Makeflow Local Files and Programs sge_submit_workers W ssh Wv condor_submit_workers Hundreds of Workers in a Personal Cloud submit tasks

Work Queue has Limits Bandwidth, file size, and computation length constrain potential number of workers

Work Queue has Limits T0 ??? T1 T2 T3 T4 W1 W2 W3 W4 W5 T5 Natural parallelism of a workflow leaves extra workers idle

Work Queue has Limits Synchronous transfer leaves network resources idle, increases dispatch time

“…except for the problem of too many layers of indirection” Add more indirection! “All problems in computer science can be solved by another layer of indirection” -David Wheeler “…except for the problem of too many layers of indirection” Kevlin Henney

Hierarchical Work Queue Master (Makeflow) Worker Worker Worker Worker Worker Worker Worker Worker

Hierarchical Work Queue Master (Makeflow) Foreman Foreman Worker Worker Worker Worker Worker Worker Worker Worker

Hierarchical Work Queue Master (Makeflow) Shared FS Foreman Foreman Worker Worker Worker Worker Worker Worker Worker Worker

Hierarchical Work Queue Master (Makeflow) Shared FS Shared FS Foreman Foreman Worker Worker Worker Worker Worker Worker Worker Worker

Hierarchical Work Queue Master (Makeflow) Shared FS Foreman Foreman Shared FS Shared FS Foreman Foreman Worker Worker Worker Worker Worker Worker Worker Worker

Hierarchical Work Queue Coming Soon! http://www.nd.edu/~ccl 19