Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University.

Slides:

Advertisements

Similar presentations

1 Scaling Up Data Intensive Scientific Applications to Campus Grids Douglas Thain University of Notre Dame LSAP Workshop Munich, June 2009.

Advertisements

P3- Represent how data flows around a computer system

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

BXGrid: A Data Repository and Computing Grid for Biometrics Research Hoang Bui University of Notre Dame 1.

Parasol Architecture A mild case of scary asynchronous system stuff.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Spark: Cluster Computing with Working Sets

Clydesdale: Structured Data Processing on MapReduce Jackie.

1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.

1 High Throughput Scientific Computing with Condor: Computer Science Challenges in Large Scale Parallelism Douglas Thain University of Notre Dame UAB 27.

1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.

1 Scaling Up Data Intensive Science with Application Frameworks Douglas Thain University of Notre Dame Michigan State University September 2011.

1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

1 Science in the Clouds: History, Challenges, and Opportunities Douglas Thain University of Notre Dame GeoClouds Workshop 17 September 2009.

Xyleme A Dynamic Warehouse for XML Data of the Web.

1 Scaling Up Data Intensive Science to Campus Grids Douglas Thain Clemson University 25 Septmber 2009.

Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.

Using Small Abstractions to Program Large Distributed Systems Douglas Thain University of Notre Dame 11 December 2008.

1 Scaling Up Classifiers to Cloud Computers Christopher Moretti, Karsten Steinhaeuser, Douglas Thain, Nitesh V. Chawla University of Notre Dame.

Getting Beyond the Filesystem: New Models for Data Intensive Scientific Computing Douglas Thain University of Notre Dame HEC FSIO Workshop 6 August 2009.

Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.

An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009.

SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.

Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.

1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.

Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

Introduction to Hadoop and HDFS

1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.

The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.

SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.

Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.

1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He.

VIPIN VIJAYAN 11/11/03 A Performance Analysis of Two Distributed Computing Abstractions.

Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.

Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

Virtualization and Databases Ashraf Aboulnaga University of Waterloo.

Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie, and Haiyan Meng.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.

Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Fundamental Operations Scalability and Speedup

Examples Example: UW-Madison CHTC Example: Global CMS Pool

Scaling Up Scientific Workflows with Makeflow

Building Analytics At Scale With USQL and C#

Haiyan Meng and Douglas Thain

CSE 451: Operating Systems Spring 2005 Module 20 Distributed Systems

BXGrid: A Data Repository and Computing Grid for Biometrics Research

What’s New in Work Queue

Experiences with Hadoop and MapReduce

CSE 451: Operating Systems Winter 2004 Module 19 Distributed Systems

5/7/2019 Map Reduce Map reduce.

CSE 451: Operating Systems Winter 2007 Module 21 Distributed Systems

Presentation transcript:

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University of Chicago 23 October 2008

An Assembly Language of Distributed Computing Fundamental Operations –TransferFile( source, destination ) –ExecuteJob( host, exe, input, output ) –AllocateVM( cpu, mem, disk, opsys ) Semantics of Assembly are Subtle: –When do instructions commit? –Delay slots before control transfers? –What exceptions are valid for each opcode? –Precise or imprecise exceptions? –What is the cost of each instruction?

Programming in Assembly Stinks You know the problems: –Stack management. –Garbage collection. –Type checking. –Co-location of data and computation. –Query optimizations. –Function shipping or data shipping? –How many nodes should I harness?

Abstractions for Distributed Computing Abstraction: a declarative specification of the computation and data of a workload. A restricted pattern, not meant to be a general purpose programming language. Avoid the really terrible cases. Provide users with a bright path. Data structures instead of file systems.

All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j B1 B2 B3 A1A2A3 FFF A1 An B1 Bn F AllPairs(A,B,F) F FF FF F Moretti, Bulosan, Flynn, Thain, AllPairs: An Abstraction… IPDPS 2008

Example Application Goal: Design robust face comparison function. F 0.05 F 0.97

Similarity Matrix Construction F Current Workload: 4000 images 256 KB each 10s per F 1851 CPU-days Future Workload: images 1MB each 1s per F 114 CPU-years

Non-Expert User Using 500 CPUs Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. HN CPU FFFF F Try 2: Each row is a batch job. Failure: Too many small ops on FS. HN CPU FFFF F F F F F F F F F F F F F F F F Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. HN CPU FFFF F F F F F F F F F F F F F F F F Try 4: User gives up and attempts to solve an easier or smaller problem.

All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j B1 B2 B3 A1A2A3 FFF A1 An B1 Bn F AllPairs(A,B,F) F FF FF F

What is the right metric? Speedup? –Seq Runtime / Parallel Runtime Parallel Efficiency? –Speedup / N CPUs? Neither works, because the number of CPUs varies over time and between runs. Cost Efficiency –Work Completed / Resources Consumed –Person-Miles / Gallon –Results / CPU-hours –Results / $$$

All-Pairs Abstraction

T2 Classify Abstraction Classify( T, R, N, P, F ) T = testing setR = training set N = # of partitionsF = classifier P T1 T3 F F F T R V1 V2 V3 CV Moretti, Steinhauser, Thain, Chawla, Scaling up Classifiers to Cloud Computers, ICDM 2008.

BXGrid Abstractions B1 B2 B3 A1A2A3 FFF F FF FF F Lbrown Lblue Rbrown R S1 S2 S3 eyecolor F F F ROC Curve S = Select( color=“brown” ) B = Transform( S,F ) M = AllPairs( A, B, F ) Bui, Thomas, Kelly, Lyon, Flynn, Thain BXGrid: A Repository and Experimental Abstraction… in review 2008.

Implementing Abstractions S = Select( color=“brown” ) B = Transform( S,F ) M = AllPairs( A, B, F ) DBMS Relational Database (2x) Active Storage Cluster (16x) CPU Relational Database CPU Condor Pool (500x)

Compatibility of Abstractions? Assembly Language Map-ReduceAll-PairsClassify

Compatibility of Abstractions? Assembly Language Map-Reduce All-Pairs Classify ??? Mismatch: MR relies on data partition. AP relies on data re-use. Mismatch: Classify partitions logically. MR partitions physically.

Compatibility of Abstractions? Assembly Language Map-ReduceAll-PairsClassify SwiftDryad More General, Less Optimized?

From Clouds to Multicore Next Step: AP Implementation that runs well on Single CPU, Multicore, Cloud, or Cloud of Multicores. Assembly Language Map-ReduceAll-PairsClassify DryadSwift CPU Assembly Language Map-ReduceAll-PairsClassify DryadSwift CPU $$$ RAM

Acknowledgments Cooperative Computing Lab – Grad Students: –Chris Moretti –Hoang Bui –Michael Albrecht –Li Yu NSF Grants CCF , CNS Undergraduate Students –Mike Kelly –Rory Carmichael –Mark Pasquier –Christopher Lyon –Jared Bulosan