Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.

Slides:



Advertisements
Similar presentations
1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.
Advertisements

Lobster: Personalized Opportunistic Computing for CMS at Large Scale Douglas Thain (on behalf of the Lobster team) University of Notre Dame CVMFS Workshop,
Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Scaling Up Without Blowing Up Douglas Thain University of Notre Dame.
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
1 High Throughput Scientific Computing with Condor: Computer Science Challenges in Large Scale Parallelism Douglas Thain University of Notre Dame UAB 27.
1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.
1 Scaling Up Data Intensive Science with Application Frameworks Douglas Thain University of Notre Dame Michigan State University September 2011.
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
1 Scaling Up Data Intensive Science to Campus Grids Douglas Thain Clemson University 25 Septmber 2009.
Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Getting Beyond the Filesystem: New Models for Data Intensive Scientific Computing Douglas Thain University of Notre Dame HEC FSIO Workshop 6 August 2009.
Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Introduction to Makeflow Li Yu University of Notre Dame 1.
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.
Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Design of an Active Storage Cluster File System for DAG Workflows Patrick Donnelly and Douglas Thain University of Notre Dame 2013 November 18 th DISCS-2013.
1 1 Hybrid Cloud Solutions (Private with Public Burst) Accelerate and Orchestrate Enterprise Applications.
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
1 Swiss Grid Day 2009 Campus Grid Working Group Update Pascal Jermini.
Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Review of Condor,SGE,LSF,PBS
Introduction to Makeflow and Work Queue Prof. Douglas Thain, University of Notre Dame
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 Automate your way to.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Introduction to Makeflow and Work Queue
Hadoop Aakash Kag What Why How 1.
Working With Azure Batch AI
Scaling Up Scientific Workflows with Makeflow
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue
Haiyan Meng and Douglas Thain
Introduction to Makeflow and Work Queue
Weaving Abstractions into Workflows
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue with Containers
What’s New in Work Queue
Creating Custom Work Queue Applications
Presentation transcript:

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame

The Cooperative Computing Lab

The Cooperative Computing Lab We collaborate with people who have large scale computing problems in science, engineering, and other fields. We operate computer systems on the O(10,000) cores: clusters, clouds, grids. We conduct computer science research in the context of real people and problems. We develop open source software for large scale distributed computing. 3

Science Depends on Computing! AGTCCGTACGATGCTATTAGCGAGCGTGA…

The Good News: Computing is Plentiful! 5

7 I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour. A real problem will take a month (I think.) Can I get a single result faster? Can I get more results in the same time? Last year, I heard about this grid thing. What do I do next? This year, I heard about this cloud thing.

Should I port my program to MPI or Hadoop? Learn C / Java Learn MPI / Hadoop Re-architect Re-write Re-test Re-debug Re-certify

What if my application looks like this?

Our Philosophy: Harness all the resources that are available: desktops, clusters, clouds, and grids. Make it easy to scale up from one desktop to national scale infrastructure. Provide familiar interfaces that make it easy to connect existing apps together. Allow portability across operating systems, storage systems, middleware… Make simple things easy, and complex things possible. No special privileges required.

I can get as many machines on the cluster/grid/cloud as I want! How do I organize my application to run on those machines?

Makeflow: A Portable Workflow System

An Old Idea: Makefiles 13 part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

Makeflow = Make + Workflow Provides portability across batch systems. Enable parallelism (but not too much!) Fault tolerance at multiple scales. Data and resource management. 14 Makeflow LocalCondorSGE Work Queue

Makeflow Language - Rules Each rule specifies: – a set of target files to create; – a set of source files needed to create them; – a command that generates the target files from the source files. part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result out1 : part1 mysim.exe mysim.exe part1 > out1

You must state all the files needed by the command.

sims.mf out.10 : in.dat calib.dat sim.exe sim.exe –p 10 in.data > out.10 out.20 : in.dat calib.dat sim.exe sim.exe –p 20 in.data > out.20 out.30 : in.dat calib.dat sim.exe sim.exe –p 30 in.data > out.30

Private Cluster Campus Condor Pool Public Cloud Provider CRC SGE Cluster Makefile Makeflow Local Files and Programs Makeflow + Batch System makeflow –T sge makeflow –T condor Work Queue

Drivers Local Condor SGE Batch Hadoop WorkQueue Torque MPI-Queue XGrid Moab

How to run a Makeflow Run a workflow locally (multicore?) – makeflow -T local sims.mf Clean up the workflow outputs: – makeflow –c sims.mf Run the workflow on Torque: – makeflow –T torque sims.mf Run the workflow on Condor: – makeflow –T condor sims.mf

Example: Biocompute Portal Generate Makefile Make flow Run Workflow Progress Bar Transaction Log Update Status Condor Pool Submit Tasks BLAST SSAHA SHRIMP EST MAKER …

Makeflow Documentation