Introduction to Makeflow Li Yu University of Notre Dame 1.

Slides:



Advertisements
Similar presentations
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
Advertisements

Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
1 Scaling Up Data Intensive Science to Campus Grids Douglas Thain Clemson University 25 Septmber 2009.
Reliability and Troubleshooting with Condor Douglas Thain Condor Project University of Wisconsin PPDG Troubleshooting Workshop 12 December 2002.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
Work Queue: A Scalable Master/Worker Framework Peter Bui June 29, 2010.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.
Understanding the Basics of Computational Informatics Summer School, Hungary, Szeged Methos L. Müller.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Pregel: A System for Large-Scale Graph Processing
Design of an Active Storage Cluster File System for DAG Workflows Patrick Donnelly and Douglas Thain University of Notre Dame 2013 November 18 th DISCS-2013.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
Grid job submission using HTCondor Andrew Lahiff.
Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
VIPIN VIJAYAN 11/11/03 A Performance Analysis of Two Distributed Computing Abstractions.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Introduction to Makeflow and Work Queue Prof. Douglas Thain, University of Notre Dame
Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie, and Haiyan Meng.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
+ Introduction to Unix Joey Azofeifa Dowell Lab Short Read Class Day 2 (Slides inspired by David Knox)
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
Condor DAGMan: Managing Job Dependencies with Condor
Operations Support Manager - Open Science Grid
Introduction to Makeflow and Work Queue
Intermediate HTCondor: Workflows Monday pm
Scaling Up Scientific Workflows with Makeflow
Introduction to Makeflow and Work Queue
Integration of Singularity With Makeflow
Introduction to Makeflow and Work Queue
Troubleshooting Your Jobs
Haiyan Meng and Douglas Thain
Introduction to Makeflow and Work Queue
Weaving Abstractions into Workflows
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue with Containers
What’s New in Work Queue
Creating Custom Work Queue Applications
Troubleshooting Your Jobs
Presentation transcript:

Introduction to Makeflow Li Yu University of Notre Dame 1

Overview 2  Distributed systems are hard to use!  An abstraction is a regular structure that can be efficiently scaled up to large problem sizes.  Today – Makeflow and Work Queue: ◦ Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds. ◦ Work Queue is Master/Worker framework. ◦ Together they are compact, portable, data oriented, good at lots of small jobs and familiar syntax.

General Workflow 3 D13D12 D11D10 F3 D14 F4 D15 D16D17D18 F5 Final Output D1 F1 D2D5 … D7D6D10 F2 …

Makeflow 4  Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds.  Can express any arbitrary Directed Acyclic Graph (DAG).  Good at lots of small jobs.  Data is treated as a first class citizen.  Has a syntax similar to traditional UNIX Make  It is fault-tolerant.

Application – Data Mining 5  Betweenness Centrality ◦ Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not. ◦ Application: social network analysis. ◦ Complexity: O(n 3 ) where ‘n’ is the number of vertices. Highest Betweenness

The Workflow 6 VertexNeighbors V1V2, V5… V2V10, V13 …… V V1000, … algr Output1 VertexCredits V123 V22355 …… V5.5M46923 Output2 OutputN Final Output Add ……

Size of the Problem 7  About 5.5 million vertices  About 20 million edges  Each job computes 50 vertices (110K jobs) VertexNeighbors V1V2, V5… V2V10, V13 …… V V1000, … VertexCredits V123 V22355 …… V5.5M46923 Raw : 250MB Gzipped: 93MB Raw : 30MB Gzipped: 13MB Input Data FormatOutput Data Format

The Result 8  Resource used:  300 Condor CPU cores  250 SGE CPU cores  Runtime:  2000 CPU Days -> 4 Days  500X speedup!

Application - Biocompute 9  Sequence Search and Alignment by Hashing Algorithm (SSAHA)  Short Read Mapping Package (SHRiMP)  Genome Alignment:  CGGAAATAATTATTAAGCAA | | | | | | | | | GTCAAATAATTACTGGATCG  Single nucleotide polymorphism (SNP) discovery

The Workflow 10 Align Matches1 Matches2 MatchesN All Matches Combine … … Query Split Read1 Reference Read1 Reference …

Sizes of some real workloads 11  Anopheles gambiae: 273 million bases  2.5 million reads consisting of 1.5 billion bases were aligned using SSAHA  Sorghum bicolor: million bases  11.5 million sequences consisting of 11 billion bases were aligned using SSAHA  7 million query reads of Oryza rufipogon to the genome Oryza sativa using SHRiMP

Performance 12

Makeflow Example 13 part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

Makeflow Syntax  A Makeflow script consists of a set of rules.  Each rule specifies:  a set of target files to create;  a set of source files needed to create them;  a command that generates the target files from the source files. 14 Out1 : part1 mysim.exe./mysim.exe part1 >out1 Target file(s)Source file(s) Command

No Phony Rules  A correct rule: out1: part1 mysim.exe./mysim.exe part1 >out1  An incorrect rule: out1:./mysim.exe part1 >out1  Another incorrect rule: clean: rm –rf *.o 15

16 part1 part2 part3: input.data split.py./split.py input.data out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

A Real Example – Image Processing 17 Internet 1. Download 2. Convert 3. Combine into Movie

Image Processing - Makeflow Script 18 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL= a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL Comments start with ‘#’

Image Processing - Makeflow Script 19 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL= a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL Stands for: /afs/nd.edu/user37/ ccl/software/extern al/imagemagick/bin/ convert

Image Processing - Makeflow Script 20 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL= a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL Forces this job to run on the controlling machine.

Image Processing - Makeflow Script 21 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL= a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL

Image Processing - Makeflow Script 22 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL= a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL

Get the example.makeflow script 23 % mkdir /tmp/makeflow % cd /tmp/makeflow % cp ~lyu2/Public/example.makeflow. % cat example.makeflow # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL= a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif ……

Setup the cctools environment (in csh) 24  Set the PATH to use cctools: % setenv PATH ~ccl/software/cctools/bin:$PATH  If the PATH is set correctly: % makeflow -h Use: makeflow [options] Where options are: -c Clean up: remove logfile and all targets. …… ……  If the PATH is NOT set correctly: % makeflow –h makeflow: Command not found.

Run the Makeflow Script 25  Just use the local machine: % makeflow example.makeflow  Output: makeflow: checking for duplicate targets... makeflow: DAG created. makeflow: checking rules for consistency... makeflow: Width of DAG: 4 ……………… ……………… makeflow: nothing left to do.  Now we can check if the target file - a.montage.gif is successfully created. % display a.montage.gif

Re-run a Makeflow Script  If you run it a second time, nothing would happen, because all of the target files are already created: % makeflow example.makeflow makeflow: nothing left to do  Use the -c option to clean everything up before trying it again: % makeflow -c example.makeflow 26

Run the Makeflow Script with a Distributed System 27  Use a distributed system with ‘-T’ option: ◦ ‘-T condor’: uses the Condor batch system % makeflow -T condor example.makeflow ◦ Take advantage of Condor MatchMaker BATCH_OPTIONS=Requirements=(Memory>1024)\n Arch= x86_64 ◦ ‘-T sge’: uses the Sun Grid Engine % makeflow -T sge example.makeflow ◦ ‘-T wq’: uses the Work Queue framework % makeflow -T wq example.makeflow

Makeflow with Work Queue 28 Start workers on local machines, clusters, via campus grid, etc. Worker Makeflow InputApp Output App put App put Input work “App Output” get Output exec DAG

Ways of starting workers  Start one worker on your local machine work_queue_worker hostname port  Start some Condor workers condor_submit_workers hostname port  Start some SGE workers sge_submit_workers hostname port 29

Make Your Own Cloud 30 Condor SGE Makeflow –T wq example.makeflow Cloud 1100 cores unlimited 4000 cores (but you can only have 250)

Setup Condor environment 31  Set the PATH to use condor: % setenv PATH ~condor/software/bin:$PATH  If the PATH is set correctly: % condor_q -- Submitter: cclsubmit00.cse.nd.edu: : cclsubmit00.cse.nd.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held  If the PATH is NOT set correctly: % condor_q condor_q: Command not found.

Re-run the makeflow with Work Queue 32 Go to the experiment directory and clean things up: % cd /tmp/makeflow % makeflow –c example.makeflow Run the example with Work Queue: % condor_submit_workers `hostname` % makeflow –T wq example.makeflow

Google “Makeflow” 33

Contact us  Li Yu   Peter Bui   Prof. Douglas Thain   Cooperative Computing Lab 