Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008.

Slides:



Advertisements
Similar presentations
1 Scaling Up Data Intensive Scientific Applications to Campus Grids Douglas Thain University of Notre Dame LSAP Workshop Munich, June 2009.
Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
BXGrid: A Data Repository and Computing Grid for Biometrics Research Hoang Bui University of Notre Dame 1.
SALSA HPC Group School of Informatics and Computing Indiana University.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Spark: Cluster Computing with Working Sets
Clydesdale: Structured Data Processing on MapReduce Jackie.
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
1 High Throughput Scientific Computing with Condor: Computer Science Challenges in Large Scale Parallelism Douglas Thain University of Notre Dame UAB 27.
1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.
AME: An Any-scale many-task computing Engine Zhao Zhang, University of Chicago Daniel S. Katz, CI University of Chicago.
1 Scaling Up Data Intensive Science with Application Frameworks Douglas Thain University of Notre Dame Michigan State University September 2011.
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
1 Science in the Clouds: History, Challenges, and Opportunities Douglas Thain University of Notre Dame GeoClouds Workshop 17 September 2009.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Scaling Up Data Intensive Science to Campus Grids Douglas Thain Clemson University 25 Septmber 2009.
Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University.
Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.
Using Small Abstractions to Program Large Distributed Systems Douglas Thain University of Notre Dame 11 December 2008.
1 Scaling Up Classifiers to Cloud Computers Christopher Moretti, Karsten Steinhaeuser, Douglas Thain, Nitesh V. Chawla University of Notre Dame.
Getting Beyond the Filesystem: New Models for Data Intensive Scientific Computing Douglas Thain University of Notre Dame HEC FSIO Workshop 6 August 2009.
Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009.
Ch 4. The Evolution of Analytic Scalability
Computer System Architectures Computer System Software
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
Introduction to Hadoop and HDFS
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
VIPIN VIJAYAN 11/11/03 A Performance Analysis of Two Distributed Computing Abstractions.
Chapter 4 Memory Management Virtual Memory.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness? Douglas Thain, Peter Ivie, and Haiyan Meng.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.
Background Computer System Architectures Computer System Software.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Lecture 10 Page 1 CS 111 Online Memory Management CS 111 On-Line MS Program Operating Systems Peter Reiher.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Fundamental Operations Scalability and Speedup
Scaling Up Scientific Workflows with Makeflow
Building Analytics At Scale With USQL and C#
Haiyan Meng and Douglas Thain
The Globus Toolkit™: Information Services
Ch 4. The Evolution of Analytic Scalability
Support for ”interactive batch”
CSE 451: Operating Systems Spring 2005 Module 20 Distributed Systems
BXGrid: A Data Repository and Computing Grid for Biometrics Research
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
What’s New in Work Queue
Experiences with Hadoop and MapReduce
5/7/2019 Map Reduce Map reduce.
CSE 451: Operating Systems Winter 2007 Module 21 Distributed Systems
Presentation transcript:

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008

Distributed Systems Scale: 2 – 100s – 1000s – millions Domains:Single or Multi Users: 1 – 10 – 100 – 1000 – Naming:Direct, Virtual Scheduling:Timesharing / Space Sharing Interface:Allocate CPU / Execute Job Security:None / IP / PKI / KRB … Storage: Embedded / External

Cloud Computing? Scale: 2 – 100s – 1000s – 10000s Domains:Single or Multi Users: 1 – 10 – 100 – 1000 – Naming:Direct, Virtual Scheduling:Timesharing / Spacesharing Interface:Allocate CPU / Execute Job Security:None / IP / PKI / KRB … Storage: Embedded / External

Grid Computing? Scale: 2 – 100s – 1000s – 10000s Domains:Single or Multi Users: 1 – 10 – 100 – 1000 – Naming:Direct, Virtual Scheduling:Timesharing / Spacesharing Interface:Allocate CPU / Execute Job Security:None / IP / PKI / KRB … Storage: Embedded / External

An Assembly Language of Distributed Computing Fundamental Operations –TransferFile( source, destination ) –ExecuteJob( host, exe, input, output ) –AllocateVM( cpu, mem, disk, opsys ) Semantics of Assembly are Subtle: –When do instructions commit? –Delay slots before control transfers? –What exceptions are valid for each opcode? –Precise or imprecise exceptions? –What is the cost of each instruction?

Programming in Assembly Stinks You know the problems: –Stack management. –Garbage collection. –Type checking. –Co-location of data and computation. –Query optimizations. –Function shipping or data shipping? –How many nodes should I harness?

Abstractions for Distributed Computing Abstraction: a declarative specification of the computation and data of a workload. A restricted pattern, not meant to be a general purpose programming language. Avoid the really terrible cases. Provide users with a bright path. Data structures instead of file systems.

All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j B1 B2 B3 A1A2A3 FFF A1 An B1 Bn F AllPairs(A,B,F) F FF FF F Moretti, Bulosan, Flynn, Thain, AllPairs: An Abstraction… IPDPS 2008

Example Application Goal: Design robust face comparison function. F 0.05 F 0.97

Similarity Matrix Construction F Current Workload: 4000 images 256 KB each 10s per F (five days) Future Workload: images 1MB each 1s per F (three months)

Non-Expert User Using 500 CPUs Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. HN CPU FFFF F Try 2: Each row is a batch job. Failure: Too many small ops on FS. HN CPU FFFF F F F F F F F F F F F F F F F F Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. HN CPU FFFF F F F F F F F F F F F F F F F F Try 4: User gives up and attempts to solve an easier or smaller problem.

All-Pairs Abstraction AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j B1 B2 B3 A1A2A3 FFF A1 An B1 Bn F AllPairs(A,B,F) F FF FF F

What is the right metric? Speedup? –Seq Runtime / Parallel Runtime Parallel Efficiency? –Speedup / N CPUs? Neither works, because the number of CPUs varies over time and between runs. Cost Efficiency –Work Completed / Resources Consumed –Person-Miles / Gallon –Results / CPU-hours –Results / $$$

All-Pairs Abstraction

T2 Classify Abstraction Classify( T, R, N, P, F ) T = testing setR = training set N = # of partitionsF = classifier P T1 T3 F F F T R V1 V2 V3 CV Moretti, Steinhauser, Thain, Chawla, Scaling up Classifiers to Cloud Computers, ICDM 2008.

BXGrid Abstractions B1 B2 B3 A1A2A3 FFF F FF FF F Lbrown Lblue Rbrown R S1 S2 S3 eyecolor F F F ROC Curve S = Select( color=“brown” ) B = Transform( S,F ) M = AllPairs( A, B, F ) Bui, Thomas, Kelly, Lyon, Flynn, Thain BXGrid: A Repository and Experimental Abstraction… in review 2008.

Implementing Abstractions S = Select( color=“brown” ) B = Transform( S,F ) M = AllPairs( A, B, F ) DBMS Relational Database (2x) Active Storage Cluster (16x) CPU Relational Database CPU Condor Pool (500x)

Compatibility of Abstractions? Assembly Language Map-ReduceAll-PairsClassify

Compatibility of Abstractions? Assembly Language Map-Reduce All-Pairs Classify ??? Mismatch: MR relies on data partition. AP relies on data re-use. Mismatch: Classify partitions logically. MR partitions physically.

Compatibility of Abstractions? Assembly Language Map-ReduceAll-PairsClassify SwiftDryad More General, Less Optimized?

From Clouds to Multicore Next Step: AP Implementation that runs well on Single CPU, Multicore, Cloud, or Cloud of Multicores. Assembly Language Map-ReduceAll-PairsClassify DryadSwift CPU Assembly Language Map-ReduceAll-PairsClassify DryadSwift CPU $$$ RAM

Acknowledgments Cooperative Computing Lab – Grad Students: –Chris Moretti –Hoang Bui –Michael Albrecht –Li Yu NSF Grants CCF , CNS Undergraduate Students –Mike Kelly –Rory Carmichael –Mark Pasquier –Christopher Lyon –Jared Bulosan