Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.

Slides:



Advertisements
Similar presentations
Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Shark:SQL and Rich Analytics at Scale
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
epiC: an Extensible and Scalable System for Processing Big Data
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
SkewTune: Mitigating Skew in MapReduce Applications
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Distributed Computations
From LINQ to DryadLINQ Michael Isard Workshop on Data-Intensive Scientific Computing Using DryadLINQ.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Distributed Computations MapReduce
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Dryad and dataflow systems
Hyracks: A new partitioned-parallel platform for data-intensive computation Vinayak Borkar UC Irvine (joint work with M. Carey, R. Grover, N. Onose, and.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Introduction to Hadoop and HDFS
Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
BIG DATA/ Hadoop Interview Questions.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Image taken from: slideshare
Some slides adapted from those of Yuan Yu and Michael Isard
Hadoop.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
CSCI5570 Large Scale Data Processing Systems
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Distributed Computations MapReduce/Dryad
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Applying Twister to Scientific Applications
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
DryadInc: Reusing work in large-scale computations
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley

Distributed Data-Parallel Computing Cloud – Transparent scaling – Resource virtualization Commodity clusters – Fault tolerance with good performance Workloads beyond standard SQL, HPC – Data-mining, graph analysis, … – Semi-structured/unstructured data

Execution layer This talk: system-level middleware – Yuan Yu will describe DryadLINQ programming model on Saturday Algorithm -> execution plan by magic

Problem domain Large inputs – Tens of GB “small test dataset” – Single job up to hundreds of TB – Semi-structured data is common Not latency sensitive – Overhead of seconds for trivial job – Large job could take days – Batch computation not online queries Simplifies fault tolerance, caching, etc.

Talk overview Some typical computations DAG implementation choices The Dryad execution engine Comparison with MapReduce Discussion

Map Independent transformation of dataset – for each x in S, output x’ = f(x) E.g. simple grep for word w – output line x only if x contains w

Map Independent transformation of dataset – for each x in S, output x’ = f(x) E.g. simple grep for word w – output line x only if x contains w S f S’

Map Independent transformation of dataset – for each x in S, output x’ = f(x) E.g. simple grep for word w – output line x only if x contains w S1S1 f S1’S1’ S2S2 f S2’S2’ S3S3 f S3’S3’

Reduce Grouping plus aggregation – 1) Group x in S according to key selector k(x) – 2) For each group g, output r(g) E.g. simple word count – group by k(x) = x – for each group g output key (word) and count of g

Reduce Grouping plus aggregation – 1) Group x in S according to key selector k(x) – 2) For each group g, output r(g) E.g. simple word count – group by k(x) = x – for each group g output key (word) and count of g S G S’ r

Reduce S G S’ r

Reduce S1S1 S1’S1’ GD S2’S2’ G S2S2 D S3S3 D D is distribute, e.g. by hash or range r r S3’S3’ Gr

Reduce S1S1 G S1’S1’ G D S2’S2’ G S2S2 GD S3S3 GD ir is initial reduce, e.g. compute a partial sum r r ir

K-means Set of points P, initial set of cluster centres C Iterate until convergence: – For each c in C Initialize count c, centre c to 0 – For each p in P Find c in C that minimizes dist(p,c) Update: count c += 1, centre c += p – For each c in C Replace c <- centre c /count c

K-means C0C0 accc P accc accc C1C1 C2C2 C3C3

K-means C0C0 ac P1P1 P2P2 P3P3 cc C1C1 ac cc C2C2 ac cc C3C3

Graph algorithms Set N of nodes with data (n,x) Set E of directed edges (n,m) Iterate until convergence: – For each node (n,x) in N For each outgoing edge n->m in E, n m = f(x,n,m) – For each node (m,x) in N Find set of incoming updates i m = {n m : n->m in E} Replace (m,x) <- (m,r(i m )) E.g. power iteration (PageRank)

PageRank N0N0 aede E aede aede N1N1 N2N2 N3N3

PageRank N01N01 ae E1E1 E2E2 E3E3 cc N02N02 N03N03 D D D N11N11 N12N12 N13N13 ae cc D D D N21N21 N22N22 N23N23 ae cc D D D N31N31 N32N32 N33N33

DAG abstraction Absence of cycles – Allows re-execution for fault-tolerance – Simplifies scheduling: no deadlock Cycles can often be replaced by unrolling – Unsuitable for fine-grain inner loops Very popular – Databases, functional languages, …

Rewrite graph at runtime Loop unrolling with convergence tests Adapt partitioning scheme at run time – Choose #partitions based on runtime data volume – Broadcast Join vs. Hash Join, etc. Adaptive aggregation and distribution trees – Based on data skew and network topology Load balancing – Data/processing skew (cf work-stealing)

Push vs Pull Databases typically ‘pull’ using iterator model – Avoids buffering – Can prevent unnecessary computation But DAG must be fully materialized – Complicates rewriting – Prevents resource virtualization in shared cluster S1S1 S1’S1’ GD S2’S2’ G S2S2 D r r

Fault tolerance Buffer data in (some) edges Re-execute on failure using buffered data Speculatively re-execute for stragglers ‘Push’ model makes this very simple

Dryad General-purpose execution engine – Batch processing on immutable datasets – Well-tested on large clusters Automatically handles – Fault tolerance – Distribution of code and intermediate data – Scheduling of work to resources

Dryad System Architecture R Scheduler

Dryad System Architecture R R Scheduler

Dryad System Architecture R R Scheduler

Dryad Job Model Directed acyclic graph (DAG) Clean abstraction – Hides cluster services – Clients manipulate graphs Flexible and expressive – General-purpose programs – Complicated execution plans

Dryad Inputs and Outputs Partitioned data set – Records do not cross partition boundaries – Data on compute machines: NTFS, SQLServer, … Optional semantics – Hash-partition, range-partition, sorted, etc. Loading external data – Partitioning “automatic” – File system chooses sensible partition sizes – Or known partitioning from user

Channel abstraction S1S1 G S1’S1’ G D S2’S2’ G S2S2 GD S3S3 GD r r ir

Push vs Pull Channel types define connected component – Shared-memory or TCP must be gang-scheduled Pull within gang, push between gangs

MapReduce (Hadoop) MapReduce restricts – Topology of DAG – Semantics of function in compute vertex Sequence of instances for non-trivial tasks G S1’S1’ G D S2’S2’ G GD GD r r ir S1S1 S2S2 S3S3 f f f

MapReduce complexity Simple to describe MapReduce system Can be hard to map algorithm to framework – cf k-means: combine C+P, broadcast C, iterate, … – HIVE, PigLatin etc. mitigate programming issues Implementation not uniform – Different fault-tolerance for mappers, reducers – Add more special cases for performance Hadoop introducing TCP channels, pipelines, … – Dryad has same state machine everywhere

Discussion DAG abstraction supports many computations – Can be targeted by high-level languages! – Run-time rewriting extends applicability DAG-structured jobs scale to large clusters – Over 10k computers in large Dryad clusters – Transient failures common, disk failures daily Trade off fault-tolerance against performance – Buffer vs TCP, still manual choice in Dryad system – Also external vs in-memory working set

Conclusion Dryad well-tested, scalable – Daily use supporting Bing for over 3 years Applicable to large number of computations – 250 computer cluster at MSR SVC, Mar->Nov distinct users (~50 lab members + interns) 15k jobs (tens of millions of processes executed) Hundreds of distinct programs – Network trace analysis, privacy-preserving inference, light- transport simulation, decision-tree training, deep belief network training, image feature extraction, …