Scaling Up Scientific Workflows with Makeflow

Slides:

Advertisements

Similar presentations

Cpt S 223 – Advanced Data Structures Graph Algorithms: Introduction

Advertisements

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.

Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.

Computations have to be distributed !

Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.

1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.

Distributed Computations

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

Introduction to Makeflow Li Yu University of Notre Dame 1.

Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.

Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.

Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.

Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.

Pregel: A System for Large-Scale Graph Processing

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.

Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.

1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.

Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,

Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.

Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.

CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.

The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.

Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.

Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.

Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.

VIPIN VIJAYAN 11/11/03 A Performance Analysis of Two Distributed Computing Abstractions.

Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.

Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.

Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.

Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.

HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.

Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox

Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,

Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.

Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.

1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.

Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.

Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.

HPC In The Cloud Case Study: Proteomics Workflow

HPC In The Cloud Case Study: Proteomics Workflow

Introduction to Makeflow and Work Queue

Some slides adapted from those of Yuan Yu and Michael Isard

Spark Presentation.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.

PREGEL Data Management in the Cloud

Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.

Integration of Singularity With Makeflow

Grid Means Business OGF-20, Manchester, May 2007

Introduction to Makeflow and Work Queue

Unweighted Shortest Path Neil Tang 3/11/2010

Cloud Distributed Computing Environment Hadoop

Haiyan Meng and Douglas Thain

Introduction to Makeflow and Work Queue

Weaving Abstractions into Workflows

Introduction to Makeflow and Work Queue

Methodology & Current Results

Introduction to Makeflow and Work Queue with Containers

What’s New in Work Queue

Creating Custom Work Queue Applications

High Throughput Computing for Astronomers

Chaitali Gupta, Madhusudhan Govindaraju

Presentation transcript:

Scaling Up Scientific Workflows with Makeflow Li Yu University of Notre Dame

Overview Distributed systems are hard to use! An abstraction is a regular structure that can be efficiently scaled up to large problem sizes. We have implemented abstractions such as AllPairs and Wavefront. Today – Makeflow and Work Queue: Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds. Work Queue is Master/Worker framework. Together they are compact, portable, data oriented, good at lots of small jobs and familiar syntax.

Specific Abstractions: AllPairs & Wavefront 0.27 0.55 1.00 0.67 0.75 0.33 0.19 0.14 0.56 0.73 0.12 A3 A2 A1 A0 0.84 B3 B2 B1 B0 F R[4,2] R[3,2] R[4,3] R[4,4] R[3,4] R[2,4] R[4,0] R[3,0] R[2,0] R[1,0] R[0,0] R[0,1] R[0,2] R[0,3] R[0,4] F x y d

Makeflow Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds. Can express any arbitrary Directed Acyclic Graph (DAG). Good at lots of small jobs. Data is treated as a first class citizen. Has a syntax similar to traditional UNIX Make It is fault-tolerant.

Don’t We Already Have DAGMan? DAGMan is great! But Makeflow… Workflow specification in just ONE file. Uses Master/Worker model. Treats data as a first class citizen Experiment: Create 1M Job DAG DAGMan: 6197 s just to write the files Makeflow: 69 s to write the Makeflow.

Makeflow ≈ DAGMan + Master/Worker

An Example – Image Processing Internet 1. Download 2. Convert 3. Combine into Movie

An Example - Makeflow Script # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL=http://www.cse.nd.edu/~ccl/images/a.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL

An Example - Makeflow Script # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL=http://www.cse.nd.edu/~ccl/images/a.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL

Running the Makeflow Just use the local machine: % makeflow sample.makeflow Use a distributed system with ‘-T’ option: ‘-T condor’: uses the Condor batch system % makeflow -T condor sample.makeflow Take advantage of Condor MatchMaker BATCH_OPTIONS=Requirements =(Memory>1024)\n Arch = x86_64 ‘-T sge’: uses the Sun Grid Engine % makeflow -T sge sample.makeflow ‘-T wq’: uses the Work Queue framework % makeflow -T wq sample.makeflow

Makeflow with Work Queue put App put Input work “App < Input > Output” get Output Makeflow Worker exec DAG Input App App Output Worker Start workers on local machines, clusters, via campus grid, etc. Worker Worker Worker Worker Worker Worker Worker

Application – Data Mining Betweenness Centrality Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not. Application: social network analysis. Complexity: O(n3) where ‘n’ is the number of vertices. Highest Betweenness

The Workflow … … …… … Vertex Credits V1 23 V2 2355 V5.5M 46923 Output1 algr Output1 Vertex Neighbors V1 V2, V5… V2 V10, V13 …… V5500000 V1000, … algr Output2 Add Final Output … … algr OutputN

Size of the Problem …… … About 5.5 million vertices About 20 million edges Each job computes 50 vertices (110K jobs) Input Data Format Output Data Format Vertex Neighbors V1 V2, V5… V2 V10, V13 …… V5500000 V1000, … Vertex Credits V1 23 V2 2355 … V5.5M 46923 Raw : 250MB Gzipped: 93MB Raw : 30MB Gzipped: 13MB

The Result 2000 CPU Days -> 4 Days 500X speedup! Resource used: 300 Condor CPU cores 250 SGE CPU cores Runtime: 2000 CPU Days -> 4 Days 500X speedup!

Make Your Own Cloud Cloud SGE Condor Makeflow –T wq script.makeflow 4000 cores (but you can only have 250) Cloud Condor 1100 cores unlimited

Making a Cloud is as simple as: $>condor_submit_workers master.nd.edu 9012 300 $>sge_submit_workers master.nd.edu 9012 250 $>makeflow –T wq script.makeflow

Application - Biocompute Sequence Search and Alignment by Hashing Algorithm (SSAHA) Short Read Mapping Package (SHRiMP) Genome Alignment: CGGAAATAATTATTAAGCAA | | | | | | | | | GTCAAATAATTACTGGATCG Single nucleotide polymorphism (SNP) discovery

The Workflow … … … Reference Query Read1 Matches1 Reference Read1 Align Read1 Matches1 Reference Split Align Combine Read1 Matches2 … … … Reference MatchesN Align All Matches Read1

Sizes of some real workloads Anopheles gambiae: 273 million bases 2.5 million reads consisting of 1.5 billion bases were aligned using SSAHA Sorghum bicolor: 738.5 million bases 11.5 million sequences consisting of 11 billion bases were aligned using SSAHA 7 million query reads of Oryza rufipogon to the genome Oryza sativa using SHRiMP

Performance

Google “Makeflow”

Extra Slides

A Quick Review of Abstractions 1CPU Multicore Cluster Grids Supercomputer Here is my function: F(x,y) Here is a folder of files: set S set S of files binary function F

Sometimes … … … D7 D6 D10 F2 D1 F1 D2 D5 D16 D17 D18 D13 D12 D11 D10 Final Output D15 D14 F4