Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance.

Slides:



Advertisements
Similar presentations
Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.
Advertisements

Alan F. Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan M. Narayanamurthy, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh.
Distributed Data-Parallel Computing Using a High-Level Programming Language Yuan Yu Michael Isard Joint work with: Andrew Birrell, Mihai Budiu, Jon Currey,
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Pig Latin: A Not-So-Foreign Language for Data Processing Christopher Olsten, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins Acknowledgement.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep.
Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Distributed Computations
Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Chris Olston Benjamin Reed Utkarsh Srivastava Ravi Kumar Andrew Tomkins Pig Latin: A Not-So-Foreign Language For Data Processing Research Shimin Chen Big.
Chris Olston Benjamin Reed Utkarsh Srivastava Ravi Kumar Andrew Tomkins Pig Latin: A Not-So-Foreign Language For Data Processing Research.
Dryad / DryadLINQ Slides adapted from those of Yuan Yu and Michael Isard.
Utkarsh Srivastava Pig : Building High-Level Dataflows over Map-Reduce Research & Cloud Computing.
Pig Latin Olston, Reed, Srivastava, Kumar, and Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD Shahram Ghandeharizadeh.
The Pig Experience: Building High-Level Data flows on top of Map-Reduce The Pig Experience: Building High-Level Data flows on top of Map-Reduce DISTRIBUTED.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
HADOOP ADMIN: Session -2
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
Chris Olston Benjamin Reed Utkarsh Srivastava Ravi Kumar Andrew Tomkins Pig Latin: A Not-So-Foreign Language For Data Processing Research.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Pig Latin: A Not-So-Foreign Language for Data Processing Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins Yahoo! Research.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
CSE 486/586 CSE 486/586 Distributed Systems Data Analytics Steve Ko Computer Sciences and Engineering University at Buffalo.
Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.
Storage and Analysis of Tera-scale Data : 2 of Database Class 11/24/09
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
Chris Olston Benjamin Reed Utkarsh Srivastava Ravi Kumar Andrew Tomkins Pig Latin: A Not-So-Foreign Language For Data Processing Research.
Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Definition DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Apache Tez : Accelerating Hadoop Query Processing Page 1.
Some slides adapted from those of Yuan Yu and Michael Isard
Hadoop.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
CSCI5570 Large Scale Data Processing Systems
Spark SQL.
Pig : Building High-Level Dataflows over Map-Reduce
Spark SQL.
RDDs and Spark.
Pig Latin - A Not-So-Foreign Language for Data Processing
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
Pig Latin: A Not-So-Foreign Language for Data Processing
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Pig : Building High-Level Dataflows over Map-Reduce
Cloud Computing MapReduce in Heterogeneous Environments
5/7/2019 Map Reduce Map reduce.
Fast, Interactive, Language-Integrated Cluster Computing
COS 518: Distributed Systems Lecture 11 Mike Freedman
Pig and pig latin: An Introduction
Presentation transcript:

Presenters: Abhishek Verma, Nicolas Zea

 Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance difficult  Google → MapReduce, Sawzall  Yahoo → Hadoop, Pig Latin  Microsoft → Dryad, DryadLINQ  Improving MapReduce in heterogeneous environment

k1k1 v1v1 k2k2 v2v2 k1k1 v3v3 k2k2 v4v4 k1k1 v5v5 map k1k1 v1v1 k1k1 v3v3 k1k1 v5v5 k2k2 v2v2 k2k2 v4v4 Output records map reduce Input records Split shuffle k1k1 v1v1 k1k1 v3v3 k2k2 v2v2 Local QSort k1k1 v5v5 k2k2 v4v4

 Extremely rigid data flow  Other flows hacked in Stages Joins Splits  Common operations must be coded by hand  Join, filter, projection, aggregates, sorting,distinct  Semantics hidden inside map-reduce fns  Difficult to maintain, extend, and optimize M M R R M M R R M M R R

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins Research

 Pigs Eat Anything  Can operate on data w/o metadata : relational, nested, or unstructured.  Pigs Live Anywhere  Not tied to one particular parallel framework  Pigs Are Domestic Animals  Designed to be easily controlled and modified by its users.  UDFs : transformation functions, aggregates, grouping functions, and conditionals.  Pigs Fly  Processes data quickly(?)‏ 6

 Dataflow language  Procedural : different from SQL  Quick Start and Interoperability  Nested Data Model  UDFs as First-Class Citizens  Parallelism Required  Debugging Environment 7

 Data Model  Atom : 'cs'  Tuple: ('cs', 'ece', 'ee')‏  Bag: { ('cs', 'ece'), ('cs')}  Map: [ 'courses' → { ('523', '525', '599'}]  Expressions  Fields by position $0  Fields by name f1,  Map Lookup # 8

Find the top 10 most visited pages in each category URLCategoryPageRank cnn.comNews0.9 bbc.comNews0.8 flickr.comPhotos0.7 espn.comSports0.9 VisitsURL Info UserURLTime Amycnn.com8:00 Amybbc.com10:00 Amyflickr.com10:05 Fredcnn.com12:00

Load Visits Group by url Foreach url generate count Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top10 urls Foreach category generate top10 urls

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’;

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; Operates directly over files

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; Schemas 0ptional can be assigned dynamically

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; UDFs can be used in every construct

 LOAD: specifying input data  FOREACH: per-tuple processing  FLATTEN: eliminate nesting  FILTER: discarding unwanted data  COGROUP: getting related data together  GROUP, JOIN  STORE: asking for output  Other: UNION, CROSS, ORDER, DISTINCT 15

Every group or join operation forms a map-reduce boundary Other operations pipelined into map and reduce phases Load Visits Group by url Foreach url generate count Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top10 urls Foreach category generate top10 urls Map 1 Reduce 1 Map 2 Reduce 2 Map 3 Reduce 3

 Write-run-debug cycle  Sandbox dataset  Objectives:  Realism  Conciseness  Completeness  Problems:  UDFs 18

 Optional “safe” query optimizer  Performs only high-confidence rewrites  User interface  Boxes and arrows UI  Promote collaboration, sharing code fragments and UDFs  Tight integration with a scripting language  Use loops, conditionals of host language

Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey

Files, TCP, FIFO, Network job schedule data plane control plane NS PD V VV Job managercluster

Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Partition Collection C# objects  Partitioning: Hash, Range, RoundRobin  Apply, Fork  Hints

Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; C# collection results C# Vertex code Query plan (Dryad job) Data

DryadLINQ Client machine (11) Distributed query plan C# Query Expr Data center Output Tables Results Input Tables Invoke Query Output DryadTabl e Dryad Execution C# Objects JM ToDryadTable foreach

 LINQ expressions converted to execution plan graph (EPG)  similar to database query plan  DAG  annotated with metadata properties  EPG is skeleton of Dryad DFG  as long as native operations are used, properties can propagate helping optimization

 Pipelining  Multiple operations in a single process  Removing redundancy  Eager Aggregation  Move aggregations in front of partitionings  I/O Reduction  Try to use TCP and in-memory FIFO instead of disk space

 As information from job becomes available, mutate execution graph  Dataset size based decisions ▪ Intelligent partitioning of data

 Aggregation can turn into tree to improve I/O based on locality  Example if part of computation is done locally, then aggregated before being sent across network

 TeraSort - scalability  240 computer cluster of 2.6Ghz dual core AMD Opterons  Sort 10 billion 100- byte records on 10- byte key  Each computer stores 3.87 GBs

 DryadLINQ vs Dryad - SkyServer  Dryad is hand optimized  No dynamic optimization overhead  DryadLINQ is 10% native code

 High level and data type transparent  Automatic optimization friendly  Manual optimizations using Apply operator  Leverage any system running LINQ framework  Support for interacting with SQL databases  Single computer debugging made easy  Strong typing, narrow interface  Deterministic replay execution

 Dynamic optimizations appear data intensive  What kind of overhead?  EPG analysis overhead -> high latency  No real comparison with other systems  Progress tracking is difficult  No speculation  Will Solid State Drives diminish advantages of MapReduce?  Why not use Parallel Databases?  MapReduce Vs Dryad  How different from Sawzall and Pig?

LanguageSawzallPig LatinDryadLINQ Built byGoogleYahooMicrosoft ProgrammingImperative Imperative & Declarative Hybrid Resemblance to SQLLeastModerateMost Execution Engine Google MapReduce HadoopDryad Performance *Very Efficient5-10 times slower1.3-2 times slower Implementation Internal, inside Google Open Source Apache-License Internal, inside Microsoft ModelOperate per recordSequence of MRDAGs UsageLog Analysis + Machine Learning + Iterative computations

Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley

 Speculative tasks executed only if no failed or waiting avail.  Notion of progress  3 phases of execution 1. Copy phase 2. Sort phase 3. Reduce phase  Each phase weighted by % data processed  Determines whether a job failed or is a straggler and available for speculation

1. Nodes can perform work at exactly the same rate 2. Tasks progress at a constant rate throughout time 3. There is no cost to launching a speculative task on an idle node 4. The three phases of execution take approximately same time 5. Tasks with a low progress score are stragglers 6. Maps and Reduces require roughly the same amount of work

 Virtualization breaks down homogeneity  Amazon EC2 - multiple vm’s on same physical host  Compete for memory/network bandwidth  Ex: two map tasks can compete for disk bandwidth, causing one to be a straggler

 Progress threshold in Hadoop is fixed and assumes low progress = faulty node  Too Many speculative tasks executed  Speculative execution can harm running tasks

 Task’s phases are not equal  Copy phase typically the most expensive due to network communication cost  Causes rapid jump from 1/3 progress to 1 of many tasks, creating fake stragglers  Real stragglers get usurped  Unnecessary copying due to fake stragglers  Progress score means anything with >80% never speculatively executed

 Longest Approximate Time to End  Primary assumption: best task to execute is the one that finishes furthest into the future  Secondary: tasks make progress at approx. constant rate  Progress Rate = ProgressScore/T*  T = time task has run for  Time to completion = (1-ProgressScore)/T

 Launch speculative jobs on fast nodes  best chance to overcome straggler vs using first available node  Cap on total number of speculative tasks  ‘Slowness’ minimum threshold  Does not take into account data locality

Sort  EC2 test cluster  Ghz Opteron/Xeon w/1.7 GB mem

Sort  Manually slowed down 8 VM’s with background processes

Grep WordCount

1. Make decisions early 2. Use finishing times 3. Nodes are not equal 4. Resources are precious

 Focusing work on small vm’s fair?  Would it be better to pay for large vm and implement system with more customized control?  Could this be used in other systems?  Progress tracking is key  Is this a fundamental contribution? Or just an optimization?  “Good” research?