Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.

Slides:



Advertisements
Similar presentations
1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.
Advertisements

Lobster: Personalized Opportunistic Computing for CMS at Large Scale Douglas Thain (on behalf of the Lobster team) University of Notre Dame CVMFS Workshop,
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
A Computation Management Agent for Multi-Institutional Grids
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
Introduction to Makeflow Li Yu University of Notre Dame 1.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.
Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Design of an Active Storage Cluster File System for DAG Workflows Patrick Donnelly and Douglas Thain University of Notre Dame 2013 November 18 th DISCS-2013.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Introduction to Makeflow and Work Queue Prof. Douglas Thain, University of Notre Dame
Analyzing LHC Data on 10K Cores with Lobster and Work Queue Douglas Thain (on behalf of the Lobster Team)
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Introduction to Makeflow and Work Queue
Working With Azure Batch AI
Scaling Up Scientific Workflows with Makeflow
Introduction to Makeflow and Work Queue
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Integration of Singularity With Makeflow
Introduction to Makeflow and Work Queue
Haiyan Meng and Douglas Thain
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue with Containers
Module 01 ETICS Overview ETICS Online Tutorials
What’s New in Work Queue
Creating Custom Work Queue Applications
Using and Building Infrastructure Clouds for Science
Presentation transcript:

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame

Cooperative Computing Lab University of Notre Dame

The Cooperative Computing Lab 3 We collaborate with people who have large scale computing problems in science, engineering, and other fields. We operate computer systems on the O(10,000) cores: clusters, clouds, grids. We conduct computer science research in the context of real people and problems. We develop open source software for large scale distributed computing. The Cooperative Computing Lab We collaborate with people who have large scale computing problems in science, engineering, and other fields. We operate computer systems on the O(10,000) cores: clusters, clouds, grids. We develop open source software for large scale distributed computing.

Plan for Today’s Tutorial 1.Our CCTools Software i.Makeflow, Work Queue, Parrot, Chirp 2.Makeflow i.Lecture: Overview, features ii.Tutorial: Write simple Makeflows 3.Work Queue i.Lecture: Overview, features ii.Tutorial: Write simple WQ programs

Science Depends on Computing!

The Good News: Computing is Plentiful! 6

Superclusters by the Hour 9

10 I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour. A real problem will take a month (I think.) Can I get a single result faster? Can I get more results in the same time? Last year, I heard about this grid thing. This year, I heard about this cloud thing.

I have allocations on clusters (unlimited) + grids (limited) + clouds ($)! How do I run my application on those machines?

Should I port my program to MPI or Hadoop? Learn MPI / Hadoop Learn C / Java Re-architect Re-write Re-test Re-debug Re-certify

And my application looks like this…

Makeflow & Work Queue Easy to scale from one desktop to national scale infrastructure. Harness all available resources: desktops, clusters, clouds, grids. Portable across operating systems, storage systems, batch systems. No special privileges required.

Makeflow 15 part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

16 Work Queue Library #include “work_queue.h” while( not done ) { while (more work ready) { task = work_queue_task_create(); // add some details to the task work_queue_submit(queue, task); } task = work_queue_wait(queue); // process the completed task }

17 Parrot Virtual File System LocalHTTPCVMFSChirpiRODS Ordinary Appl Filesystem Interface: open/read/write/close Web Servers iRODS Server CVMFS Network Chirp Server Parrot and Chirp

Source code in GitHub

Makeflow & Work Queue Federate/harness all available resources: desktops, clusters, clouds, grids. Simple interfaces & API Part of CCTools software – No special privileges required to install.

Makeflow Lecture: Outline 1.What is Makeflow? – Portable: One Makeflow program for SGE, Condor, PBS 2.How to write an application using Makeflow? – Simple rule-based syntax 3.How to run Makeflow? – Features, commands, using Work Queue

An Old Idea: Makefiles 22 part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

Makeflow Language - Rules Each rule specifies: – a set of target files to create; – a set of source files needed to create them; – a command that generates the target files from the source files. part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result out1 : part1 mysim.exe mysim.exe part1 > out1

You must state all the files needed by the command.

sims.mf out.10 : in.dat calib.dat sim.exe sim.exe –p 10 in.data > out.10 out.20 : in.dat calib.dat sim.exe sim.exe –p 20 in.data > out.20 out.30 : in.dat calib.dat sim.exe sim.exe –p 30 in.data > out.30

Makeflow = Make + Workflow Provides portability across batch systems. Enable parallelism (but not too much!) Fault tolerance at multiple scales. Data and resource management. 26 Makeflow LocalCondor SGE Work Queue

Private Cluster Campus Condor Pool Public Cloud Provider XSEDE Cluster Makefile Makeflow Local Files and Programs Makeflow + Batch System makeflow –T sge makeflow –T condor Work Queue

How to run a Makeflow Run a workflow local % makeflow -T local sims.mf Run the workflow on SGE: % makeflow -T sge sims.mf Run the workflow on Condor: % makeflow -T condor sims.mf Clean up the workflow outputs: % makeflow -c sims.mf

Makeflow can verify if your Makeflow file is syntactically correct % makeflow -k sims.mf Makeflow: Syntax OK. Makeflow will point out syntax errors if any % makeflow -k sims.mf makeflow: out10 is defined multiple times at out.10:1 and out.10:4 Makeflow Syntax Checker

Makeflow can output a makeflow file as a Dot graph. % makeflow -D dot sims.mf digraph { node [shape=ellipse,color = green,style = unfilled,fixedsize = false]; N2 [label="sim.exe"]; N1 [label="sim.exe"]; N0 [label="sim.exe"]; node [shape=box,color=blue,style=unfilled,fixedsize=false]; F3 [label = "out.30"]; F0 [label = "sim.exe"]; F5 [label = "out.10"]; F2 [label = "in.dat"]; F1 [label = "calib.dat"]; F4 [label = "out.20"];.. Makeflow Visualization

Example App: Biocompute Portal Generate Makefile Make flow Run Workflow Progress Bar Transaction Log Update Status Condor Pool Submit Tasks BLAST SSAHA SHRIMP EST MAKER …

Makeflow + Work Queue

Private Cluster Campus Condor Pool Public Cloud Provider XSEDE Cluster Makefile Makeflow Local Files and Programs Makeflow + Batch System makeflow –T sge makeflow –T condor ???

XSEDE Cluster Campus Condor Pool Public Cloud Provider Private Cluster Makefile Makeflow Local Files and Programs Makeflow + Work Queue W W W ssh WW WW sge_submit_workers W W W condor_submit_workers W W W Thousands of Workers in a Personal Cloud submit tasks

Advantages of Work Queue Scalability: Harness multiple infrastructure simultaneously. Elasticity: Scale resources up & down as needed. Data Management: Remote data caching. Data Locality: Matches tasks to nodes with data.

Fault Tolerance MF +WQ is fault tolerant : – If Makeflow crashes (or killed), it recovers by reading log and continues where it left off. – If a worker crashes, the master will detect and restart the task elsewhere. – Workers can be added and removed any time during execution.

Makeflow and Work Queue To start the Makeflow % makeflow -T wq sims.mf Could not create work_queue on port % makeflow -T wq -p 0 sims.mf Listening for workers on port 8374… To start one worker: % work_queue_worker ccl.cse.nd.edu 8374

Start Workers Everywhere! Submit workers to SGE: % sge_submit_workers ccl.cse.nd.edu Submit workers to Condor: % condor_submit_workers ccl.cse.nd.edu Submit workers to Torque: % torque_submit_workers ccl.cse.nd.edu

Keeping track of port numbers gets old fast…

Project Names Worker work_queue_worker -a –N myproject Catalog connect to ccl.cse.nd.edu:4057 advertise “myproject” is at ccl.cse.nd.edu:4057 query Makeflow (port 4057) makeflow … –a –N myproject

Makeflow with Project Names Start Makeflow with a project name: % makeflow -T wq -p 0 -a -N xsede-tutorial sims.mf Listening for workers on port XYZ… Start one worker: % work_queue_worker -N xsede-tutorial Start many workers: % sge_submit_workers -N ccgrid-tutorial 5

The Cooperative Computing Lab 43 The Cooperative Computing Lab We collaborate with people who have large scale computing problems in science, engineering, and other fields. We operate computer systems on the O(10,000) cores: clusters, clouds, grids. We conduct computer science research in the context of real people and problems. We develop open source software for large scale distributed computing. Makeflow Portable: One program for clusters, grids, clouds Simple syntax: inputs, outputs, command All files needed by command must be specified Makeflow with Work Queue Federation, Elasticity, Data management Project Names Easy to remember locations of Makeflow masters

Chris Hempel (TACC) David Gignac (TACC) Acknowledgements

Go to: Click on “Tutorial at XSEDE 2013”

Click on “Tutorial” under Makeflow