1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.

Slides:

Advertisements

Similar presentations

1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.

Advertisements

BXGrid: A Data Repository and Computing Grid for Biometrics Research Hoang Bui University of Notre Dame 1.

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.

Scaling Up Without Blowing Up Douglas Thain University of Notre Dame.

Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.

1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.

1 High Throughput Scientific Computing with Condor: Computer Science Challenges in Large Scale Parallelism Douglas Thain University of Notre Dame UAB 27.

1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.

1 Scaling Up Data Intensive Science with Application Frameworks Douglas Thain University of Notre Dame Michigan State University September 2011.

1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.

1 Science in the Clouds: History, Challenges, and Opportunities Douglas Thain University of Notre Dame GeoClouds Workshop 17 September 2009.

1 Scaling Up Data Intensive Science to Campus Grids Douglas Thain Clemson University 25 Septmber 2009.

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.

Using Small Abstractions to Program Large Distributed Systems Douglas Thain University of Notre Dame 11 December 2008.

Getting Beyond the Filesystem: New Models for Data Intensive Scientific Computing Douglas Thain University of Notre Dame HEC FSIO Workshop 6 August 2009.

Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.

An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame

Introduction to Makeflow Li Yu University of Notre Dame 1.

Workload Management Massimo Sgaravatto INFN Padova.

Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009.

Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.

Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.

Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.

Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.

Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Massively Parallel Ensemble Methods Using Work Queue Badi’ Abdul-Wahid Department of Computer Science University of Notre Dame CCL Workshop 2012.

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008.

Threads, Thread management & Resource Management.

Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.

Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.

Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.

Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.

Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.

Introduction to Makeflow and Work Queue Prof. Douglas Thain, University of Notre Dame

Analyzing LHC Data on 10K Cores with Lobster and Work Queue Douglas Thain (on behalf of the Lobster Team)

Error Scope on a Computational Grid Douglas Thain University of Wisconsin 4 March 2002.

Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.

Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,

Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.

1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.

Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.

Massively Parallel Molecular Dynamics Using Adaptive Weighted Ensemble Badi’ Abdul-Wahid PI: Jesús A. Izaguirre CCL Workshop 2013.

Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.

HPC In The Cloud Case Study: Proteomics Workflow

Workload Management Workpackage

Scaling Up Scientific Workflows with Makeflow

Introduction to Makeflow and Work Queue

Haiyan Meng and Douglas Thain

Weaving Abstractions into Workflows

Introduction to Makeflow and Work Queue

BXGrid: A Data Repository and Computing Grid for Biometrics Research

What’s New in Work Queue

Creating Custom Work Queue Applications

Presentation transcript:

1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University of Chicago 27 July 2012

The Cooperative Computing Lab

We collaborate with people who have large scale computing problems in science, engineering, and other fields. We operate computer systems on the O(10,000) cores: clusters, clouds, grids. We conduct computer science research in the context of real people and problems. We release open source software for large scale distributed computing. 3

Our Collaborators AGTCCGTACGATGCTATTAGCGAGCGTGA…

Why Work with Science Apps? Highly motivated to get a result that is bigger, faster, or higher resolution. Willing to take risks and move rapidly, but don’t have the effort/time for major retooling. Often already have access to thousands of machines in various forms. Keep us CS types honest about what solutions actually work! g5

Today’s Message: Large scale computing is plentiful. Scaling up is a real pain (even for experts!) Strategy: Computational abstractions. Examples: –All-Pairs for combinatorial problems. –Wavefront for dynamic programming. –Makeflow for irregular graphs. –Work Queue for iterative algorithms. 6

What this talk is not: How to use our software. What this talk is about: How to think about designing a large scale computation. 7

The Good News: Computing is Plentiful! 8

9

10

greencloud.crc.nd.edu 11

Superclusters by the Hour 12

The Bad News: It is inconvenient. 13

14 I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour. A real problem will take a month (I think.) Can I get a single result faster? Can I get more results in the same time? Last year, I heard about this grid thing. What do I do next? This year, I heard about this cloud thing.

What you want. 15 What you get.

What goes wrong? Everything! Scaling up from 10 to 10,000 tasks violates ten different hard coded limits in the kernel, the filesystem, the network, and the application. Failures are everywhere! Exposing error messages is confusing, but hiding errors causes unbounded delays. User didn’t know that program relies on 1TB of configuration files, all scattered around the home filesystem. User discovers that the program only runs correctly on Blue Sock Linux ! User discovers that program generates different results when run on different machines.

17 Example: Biometrics Research Goal: Design robust face comparison function. F 0.05 F 0.97

18

19 Similarity Matrix Construction Challenge Workload: 60,000 images 1MB each.02s per F 833 CPU-days 600 TB of I/O

This is easy, right? for all a in list A for all b in list B for all b in list B qsub compare.exe a b >output qsub compare.exe a b >output 20

This is easy, right? Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. HN CPU FFFF F Try 2: Each row is a batch job. Failure: Too many small ops on FS. HN CPU FFFF F F F F F F F F F F F F F F F F Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. HN CPU FFFF F F F F F F F F F F F F F F F F Try 4: User gives up and attempts to solve an easier or smaller problem.

Distributed systems always have unexpected costs/limits that are not exposed in the programming model. 22

Strategy: Identify an abstraction that solves a specific category of problems very well. Plug your computational kernel into that abstraction. 23

24 All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j B1 B2 B3 A1A2A3 FFF A1 An B1 Bn F AllPairs(A,B,F) F FF FF F allpairs A B F.exe

25 How Does the Abstraction Help? The custom workflow engine: –Chooses right data transfer strategy. –Chooses the right number of resources. –Chooses blocking of functions into jobs. –Recovers from a larger number of failures. –Predicts overall runtime accurately. All of these tasks are nearly impossible for arbitrary workloads, but are tractable (not trivial) to solve for a specific abstraction.

26

27 Choose the Right # of CPUs

28 All-Pairs in Production Our All-Pairs implementation has provided over 57 CPU-years of computation to the ND biometrics research group in the first year. Largest run so far: 58,396 irises from the Face Recognition Grand Challenge. The largest experiment ever run on publically available data. Competing biometric research relies on samples of images, which can miss important population effects. Reduced computation time from 833 days to 10 days, making it feasible to repeat multiple times for a graduate thesis. (We can go faster yet.)

29 All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j B1 B2 B3 A1A2A3 FFF A1 An B1 Bn F AllPairs(A,B,F) F FF FF F allpairs A B F.exe

Division of Concerns The end user provides an ordinary program that contains the algorithmic kernel that they care about. (Scholarship) The abstraction provides the coordination, parallelism, and resource management. (Plumbing) Keep the scholarship and the plumbing separate wherever possible! 30

Strategy: Identify an abstraction that solves a specific category of problems very well. Plug your computational kernel into that abstraction. 31

32 Are there other abstractions?

33 M[4,2] M[3,2]M[4,3] M[4,4]M[3,4]M[2,4] M[4,0]M[3,0]M[2,0]M[1,0]M[0,0] M[0,1] M[0,2] M[0,3] M[0,4] F x yd F x yd F x yd F x yd F x yd F x yd F F y y x x d d x FF x ydyd Wavefront( matrix M, function F(x,y,d) ) returns matrix M such that M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] ) F Wavefront(M,F) M

The Performance Problem Dispatch latency really matters: a delay in one holds up all of its children. If we dispatch larger sub-problems: –Concurrency on each node increases. –Distributed concurrency decreases. If we dispatch smaller sub-problems: –Concurrency on each node decreases. –Spend more time waiting for jobs to be dispatched. So, model the system to choose the block size. And, build a fast-dispatch execution system.

worker work queue F In.txtout.txt put F.exe put in.txt exec F.exe out.txt get out.txt 100s of workers dispatched via Condor/SGE/SSH wavefront queue tasks done

500x500 Wavefront on ~200 CPUs

Wavefront on a 200-CPU Cluster

Wavefront on a 32-Core CPU

39 What if you don’t have a regular graph? Use a directed graph abstraction.

40 An Old Idea: Make part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

41 Makeflow = Make + Workflow Makeflow LocalCondorTorque Work Queue Provides portability across batch systems. Enable parallelism (but not too much!) Fault tolerance at multiple scales. Data and resource management.

Makeflow Applications

Why Users Like Makeflow Use existing applications without change. Use an existing language everyone knows. (Some apps are already in Make.) Via Workers, harness all available resources: desktop to cluster to cloud. Transparent fault tolerance means you can harness unreliable resources. Transparent data movement means no shared filesystem is required. 43

44 What if you have a dynamic algorithm? Use a submit-wait abstraction.

45 Work Queue API #include “work_queue.h” while( not done ) { while (more work ready) { task = work_queue_task_create(); // add some details to the task work_queue_submit(queue, task); } task = work_queue_wait(queue); // process the completed task }

46 worker P In.txtout.txt put P.exe put in.txt exec P.exe out.txt get out.txt 1000s of workers dispatched to clusters, clouds, and grids Work Queue System Work Queue Library Work Queue Program C / Python / Perl

Adaptive Weighted Ensemble 47 Proteins fold into a number of distinctive states, each of which affects its function in the organism. How common is each state? How does the protein transition between states? How common are those transitions?

48 Simplified Algorithm: –Submit N short simulations in various states. –Wait for them to finish. –When done, record all state transitions. –If too many are in one state, redistribute them. –Stop if enough data has been collected. –Continue back at step 2. AWE Using Work Queue

Private Cluster Campus Condor Pool Public Cloud Provider Shared SGE Cluster Work Queue App Work Queue API Local Files and Programs AWE on Clusters, Clouds, and Grids sge_submit_workers W W W ssh WW WW W WvWv W condor_submit_workers W W W Hundreds of Workers in a Personal Cloud submit tasks

AWE on Clusters, Clouds, and Grids 50

New Pathway Found! 51 Credit: Joint work in progress with Badi Abdul-Wahid, Dinesh Rajan, Haoyun Feng, Jesus Izaguirre, and Eric Darve.

Private Cluster Campus Condor Pool Public Cloud Provider Shared SGE Cluster Cooperative Computing Tools W W W W W W W W WvWv Work Queue Library All-PairsWavefrontMakeflow Custom Apps Hundreds of Workers in a Personal Cloud

Ruminations 53

I would like to posit that computing’s central challenge how not to make a mess of it has not yet been met. - Edsger Djikstra 54

The Most Common Programming Model? 55 Every program attempts to grow until it can read mail. - Jamie Zawinski

56 An Old Idea: The Unix Model input output

Advantages of Little Processes Easy to distribute across machines. Easy to develop and test independently. Easy to checkpoint halfway. Easy to troubleshoot and continue. Easy to observe the dependencies between components. Easy to control resource assignments from an outside process. 57

Avoid writing new code! Instead, create coordinators that organize multiple existing programs. (Keeps the scholarly logic separate from the plumbing.) 58

Distributed Computing is a Social Activity 59 System Operators End User System Designer M[4,2] M[3,2]M[4,3] M[4,4]M[3,4]M[2,4] M[4,0]M[3,0]M[2,0]M[1,0]M[0,0] M[0,1] M[0,2] M[0,3] M[0,4] F x yd F x yd F x yd F x yd F x yd F x yd F F y y x x d d x FF x ydyd

In allocating resources, strive to avoid disaster, rather than obtain an optimum. - Butler Lampson 60

Strategy: Identify an abstraction that solves a specific category of problems very well. Plug your computational kernel into that abstraction. 61

Research is a Team Sport Faculty Collaborators: Patrick Flynn (ND) Scott Emrich (ND) Jesus Izaguirre (ND) Eric Darve (Stanford) Vijay Pande (Stanford) Sekou Remy (Clemson) 62 Current Graduate Students: Michael Albrecht Patrick Donnelly Dinesh Rajan Peter Sempolinski Li Yu Recent CCL PhDs Peter Bui (UWEC) Hoang Bui (Rutgers) Chris Moretti (Princeton) Summer REU Students: Chris Bauschka Iheanyi Ekechuku Joe Fetsch

Papers, Software, Manuals, … 63 This work was supported by NSF Grants CCF , CNS and CNS