Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance.

Presenters: Abhishek Verma, Nicolas Zea

 Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance difficult  Google → MapReduce, Sawzall  Yahoo → Hadoop, Pig Latin  Microsoft → Dryad, DryadLINQ  Improving MapReduce in heterogeneous environment

k1k1 v1v1 k2k2 v2v2 k1k1 v3v3 k2k2 v4v4 k1k1 v5v5 map k1k1 v1v1 k1k1 v3v3 k1k1 v5v5 k2k2 v2v2 k2k2 v4v4 Output records map reduce Input records Split shuffle k1k1 v1v1 k1k1 v3v3 k2k2 v2v2 Local QSort k1k1 v5v5 k2k2 v4v4

 Extremely rigid data flow  Other flows hacked in Stages Joins Splits  Common operations must be coded by hand  Join, filter, projection, aggregates, sorting,distinct  Semantics hidden inside map-reduce fns  Difficult to maintain, extend, and optimize M M R R M M R R M M R R

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins Research

 Pigs Eat Anything  Can operate on data w/o metadata : relational, nested, or unstructured.  Pigs Live Anywhere  Not tied to one particular parallel framework  Pigs Are Domestic Animals  Designed to be easily controlled and modified by its users.  UDFs : transformation functions, aggregates, grouping functions, and conditionals.  Pigs Fly  Processes data quickly(?)‏ 6

 Dataflow language  Procedural : different from SQL  Quick Start and Interoperability  Nested Data Model  UDFs as First-Class Citizens  Parallelism Required  Debugging Environment 7

 Data Model  Atom : 'cs'  Tuple: ('cs', 'ece', 'ee')‏  Bag: { ('cs', 'ece'), ('cs')}  Map: [ 'courses' → { ('523', '525', '599'}]  Expressions  Fields by position $0  Fields by name f1,  Map Lookup # 8

Find the top 10 most visited pages in each category URLCategoryPageRank cnn.comNews0.9 bbc.comNews0.8 flickr.comPhotos0.7 espn.comSports0.9 VisitsURL Info UserURLTime Amycnn.com8:00 Amybbc.com10:00 Amyflickr.com10:05 Fredcnn.com12:00

Load Visits Group by url Foreach url generate count Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top10 urls Foreach category generate top10 urls

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’;

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; Operates directly over files

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; Schemas 0ptional can be assigned dynamically

visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; UDFs can be used in every construct

 LOAD: specifying input data  FOREACH: per-tuple processing  FLATTEN: eliminate nesting  FILTER: discarding unwanted data  COGROUP: getting related data together  GROUP, JOIN  STORE: asking for output  Other: UNION, CROSS, ORDER, DISTINCT 15

Every group or join operation forms a map-reduce boundary Other operations pipelined into map and reduce phases Load Visits Group by url Foreach url generate count Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top10 urls Foreach category generate top10 urls Map 1 Reduce 1 Map 2 Reduce 2 Map 3 Reduce 3

 Write-run-debug cycle  Sandbox dataset  Objectives:  Realism  Conciseness  Completeness  Problems:  UDFs 18

 Optional “safe” query optimizer  Performs only high-confidence rewrites  User interface  Boxes and arrows UI  Promote collaboration, sharing code fragments and UDFs  Tight integration with a scripting language  Use loops, conditionals of host language

Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey

Files, TCP, FIFO, Network job schedule data plane control plane NS PD V VV Job managercluster

Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Partition Collection C# objects  Partitioning: Hash, Range, RoundRobin  Apply, Fork  Hints

Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; C# collection results C# Vertex code Query plan (Dryad job) Data

DryadLINQ Client machine (11) Distributed query plan C# Query Expr Data center Output Tables Results Input Tables Invoke Query Output DryadTabl e Dryad Execution C# Objects JM ToDryadTable foreach

 LINQ expressions converted to execution plan graph (EPG)  similar to database query plan  DAG  annotated with metadata properties  EPG is skeleton of Dryad DFG  as long as native operations are used, properties can propagate helping optimization

 Pipelining  Multiple operations in a single process  Removing redundancy  Eager Aggregation  Move aggregations in front of partitionings  I/O Reduction  Try to use TCP and in-memory FIFO instead of disk space

 As information from job becomes available, mutate execution graph  Dataset size based decisions ▪ Intelligent partitioning of data

 Aggregation can turn into tree to improve I/O based on locality  Example if part of computation is done locally, then aggregated before being sent across network

 TeraSort - scalability  240 computer cluster of 2.6Ghz dual core AMD Opterons  Sort 10 billion 100- byte records on 10- byte key  Each computer stores 3.87 GBs

 DryadLINQ vs Dryad - SkyServer  Dryad is hand optimized  No dynamic optimization overhead  DryadLINQ is 10% native code

 High level and data type transparent  Automatic optimization friendly  Manual optimizations using Apply operator  Leverage any system running LINQ framework  Support for interacting with SQL databases  Single computer debugging made easy  Strong typing, narrow interface  Deterministic replay execution

 Dynamic optimizations appear data intensive  What kind of overhead?  EPG analysis overhead -> high latency  No real comparison with other systems  Progress tracking is difficult  No speculation  Will Solid State Drives diminish advantages of MapReduce?  Why not use Parallel Databases?  MapReduce Vs Dryad  How different from Sawzall and Pig?

LanguageSawzallPig LatinDryadLINQ Built byGoogleYahooMicrosoft ProgrammingImperative Imperative & Declarative Hybrid Resemblance to SQLLeastModerateMost Execution Engine Google MapReduce HadoopDryad Performance *Very Efficient5-10 times slower1.3-2 times slower Implementation Internal, inside Google Open Source Apache-License Internal, inside Microsoft ModelOperate per recordSequence of MRDAGs UsageLog Analysis + Machine Learning + Iterative computations

Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley

 Speculative tasks executed only if no failed or waiting avail.  Notion of progress  3 phases of execution 1. Copy phase 2. Sort phase 3. Reduce phase  Each phase weighted by % data processed  Determines whether a job failed or is a straggler and available for speculation

1. Nodes can perform work at exactly the same rate 2. Tasks progress at a constant rate throughout time 3. There is no cost to launching a speculative task on an idle node 4. The three phases of execution take approximately same time 5. Tasks with a low progress score are stragglers 6. Maps and Reduces require roughly the same amount of work

 Virtualization breaks down homogeneity  Amazon EC2 - multiple vm’s on same physical host  Compete for memory/network bandwidth  Ex: two map tasks can compete for disk bandwidth, causing one to be a straggler

 Progress threshold in Hadoop is fixed and assumes low progress = faulty node  Too Many speculative tasks executed  Speculative execution can harm running tasks

 Task’s phases are not equal  Copy phase typically the most expensive due to network communication cost  Causes rapid jump from 1/3 progress to 1 of many tasks, creating fake stragglers  Real stragglers get usurped  Unnecessary copying due to fake stragglers  Progress score means anything with >80% never speculatively executed

 Longest Approximate Time to End  Primary assumption: best task to execute is the one that finishes furthest into the future  Secondary: tasks make progress at approx. constant rate  Progress Rate = ProgressScore/T*  T = time task has run for  Time to completion = (1-ProgressScore)/T

 Launch speculative jobs on fast nodes  best chance to overcome straggler vs using first available node  Cap on total number of speculative tasks  ‘Slowness’ minimum threshold  Does not take into account data locality

Sort  EC2 test cluster  1.0-1.2 Ghz Opteron/Xeon w/1.7 GB mem

Sort  Manually slowed down 8 VM’s with background processes

Grep WordCount

1. Make decisions early 2. Use finishing times 3. Nodes are not equal 4. Resources are precious

 Focusing work on small vm’s fair?  Would it be better to pay for large vm and implement system with more customized control?  Could this be used in other systems?  Progress tracking is key  Is this a fundamental contribution? Or just an optimization?  “Good” research?

Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance.

Similar presentations

Presentation on theme: "Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance.

Similar presentations

Presentation on theme: "Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance."— Presentation transcript:

Similar presentations

About project

Feedback