Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance.

Similar presentations


Presentation on theme: "Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance."— Presentation transcript:

1 Presenters: Abhishek Verma, Nicolas Zea

2  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance difficult  Google → MapReduce, Sawzall  Yahoo → Hadoop, Pig Latin  Microsoft → Dryad, DryadLINQ  Improving MapReduce in heterogeneous environment

3 k1k1 v1v1 k2k2 v2v2 k1k1 v3v3 k2k2 v4v4 k1k1 v5v5 map k1k1 v1v1 k1k1 v3v3 k1k1 v5v5 k2k2 v2v2 k2k2 v4v4 Output records map reduce Input records Split shuffle k1k1 v1v1 k1k1 v3v3 k2k2 v2v2 Local QSort k1k1 v5v5 k2k2 v4v4

4  Extremely rigid data flow  Other flows hacked in Stages Joins Splits  Common operations must be coded by hand  Join, filter, projection, aggregates, sorting,distinct  Semantics hidden inside map-reduce fns  Difficult to maintain, extend, and optimize M M R R M M R R M M R R

5 Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins Research

6  Pigs Eat Anything  Can operate on data w/o metadata : relational, nested, or unstructured.  Pigs Live Anywhere  Not tied to one particular parallel framework  Pigs Are Domestic Animals  Designed to be easily controlled and modified by its users.  UDFs : transformation functions, aggregates, grouping functions, and conditionals.  Pigs Fly  Processes data quickly(?)‏ 6

7  Dataflow language  Procedural : different from SQL  Quick Start and Interoperability  Nested Data Model  UDFs as First-Class Citizens  Parallelism Required  Debugging Environment 7

8  Data Model  Atom : 'cs'  Tuple: ('cs', 'ece', 'ee')‏  Bag: { ('cs', 'ece'), ('cs')}  Map: [ 'courses' → { ('523', '525', '599'}]  Expressions  Fields by position $0  Fields by name f1,  Map Lookup # 8

9 Find the top 10 most visited pages in each category URLCategoryPageRank cnn.comNews0.9 bbc.comNews0.8 flickr.comPhotos0.7 espn.comSports0.9 VisitsURL Info UserURLTime Amycnn.com8:00 Amybbc.com10:00 Amyflickr.com10:05 Fredcnn.com12:00

10 Load Visits Group by url Foreach url generate count Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top10 urls Foreach category generate top10 urls

11 visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’;

12 visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; Operates directly over files

13 visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; Schemas 0ptional can be assigned dynamically

14 visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits by url; visitCounts = foreach gVisits generate url, count(visits); urlInfo = load ‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCounts by url, urlInfo by url; gCategories = group visitCounts by category; topUrls = foreach gCategories generate top(visitCounts,10); store topUrls into ‘/data/topUrls’; UDFs can be used in every construct

15  LOAD: specifying input data  FOREACH: per-tuple processing  FLATTEN: eliminate nesting  FILTER: discarding unwanted data  COGROUP: getting related data together  GROUP, JOIN  STORE: asking for output  Other: UNION, CROSS, ORDER, DISTINCT 15

16

17 Every group or join operation forms a map-reduce boundary Other operations pipelined into map and reduce phases Load Visits Group by url Foreach url generate count Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top10 urls Foreach category generate top10 urls Map 1 Reduce 1 Map 2 Reduce 2 Map 3 Reduce 3

18  Write-run-debug cycle  Sandbox dataset  Objectives:  Realism  Conciseness  Completeness  Problems:  UDFs 18

19  Optional “safe” query optimizer  Performs only high-confidence rewrites  User interface  Boxes and arrows UI  Promote collaboration, sharing code fragments and UDFs  Tight integration with a scripting language  Use loops, conditionals of host language

20 Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey

21 Files, TCP, FIFO, Network job schedule data plane control plane NS PD V VV Job managercluster

22 Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

23 Partition Collection C# objects  Partitioning: Hash, Range, RoundRobin  Apply, Fork  Hints

24 Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; C# collection results C# Vertex code Query plan (Dryad job) Data

25 DryadLINQ Client machine (11) Distributed query plan C# Query Expr Data center Output Tables Results Input Tables Invoke Query Output DryadTabl e Dryad Execution C# Objects JM ToDryadTable foreach

26  LINQ expressions converted to execution plan graph (EPG)  similar to database query plan  DAG  annotated with metadata properties  EPG is skeleton of Dryad DFG  as long as native operations are used, properties can propagate helping optimization

27  Pipelining  Multiple operations in a single process  Removing redundancy  Eager Aggregation  Move aggregations in front of partitionings  I/O Reduction  Try to use TCP and in-memory FIFO instead of disk space

28  As information from job becomes available, mutate execution graph  Dataset size based decisions ▪ Intelligent partitioning of data

29  Aggregation can turn into tree to improve I/O based on locality  Example if part of computation is done locally, then aggregated before being sent across network

30  TeraSort - scalability  240 computer cluster of 2.6Ghz dual core AMD Opterons  Sort 10 billion 100- byte records on 10- byte key  Each computer stores 3.87 GBs

31  DryadLINQ vs Dryad - SkyServer  Dryad is hand optimized  No dynamic optimization overhead  DryadLINQ is 10% native code

32  High level and data type transparent  Automatic optimization friendly  Manual optimizations using Apply operator  Leverage any system running LINQ framework  Support for interacting with SQL databases  Single computer debugging made easy  Strong typing, narrow interface  Deterministic replay execution

33  Dynamic optimizations appear data intensive  What kind of overhead?  EPG analysis overhead -> high latency  No real comparison with other systems  Progress tracking is difficult  No speculation  Will Solid State Drives diminish advantages of MapReduce?  Why not use Parallel Databases?  MapReduce Vs Dryad  How different from Sawzall and Pig?

34 LanguageSawzallPig LatinDryadLINQ Built byGoogleYahooMicrosoft ProgrammingImperative Imperative & Declarative Hybrid Resemblance to SQLLeastModerateMost Execution Engine Google MapReduce HadoopDryad Performance *Very Efficient5-10 times slower1.3-2 times slower Implementation Internal, inside Google Open Source Apache-License Internal, inside Microsoft ModelOperate per recordSequence of MRDAGs UsageLog Analysis + Machine Learning + Iterative computations

35 Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley

36  Speculative tasks executed only if no failed or waiting avail.  Notion of progress  3 phases of execution 1. Copy phase 2. Sort phase 3. Reduce phase  Each phase weighted by % data processed  Determines whether a job failed or is a straggler and available for speculation

37 1. Nodes can perform work at exactly the same rate 2. Tasks progress at a constant rate throughout time 3. There is no cost to launching a speculative task on an idle node 4. The three phases of execution take approximately same time 5. Tasks with a low progress score are stragglers 6. Maps and Reduces require roughly the same amount of work

38  Virtualization breaks down homogeneity  Amazon EC2 - multiple vm’s on same physical host  Compete for memory/network bandwidth  Ex: two map tasks can compete for disk bandwidth, causing one to be a straggler

39  Progress threshold in Hadoop is fixed and assumes low progress = faulty node  Too Many speculative tasks executed  Speculative execution can harm running tasks

40  Task’s phases are not equal  Copy phase typically the most expensive due to network communication cost  Causes rapid jump from 1/3 progress to 1 of many tasks, creating fake stragglers  Real stragglers get usurped  Unnecessary copying due to fake stragglers  Progress score means anything with >80% never speculatively executed

41  Longest Approximate Time to End  Primary assumption: best task to execute is the one that finishes furthest into the future  Secondary: tasks make progress at approx. constant rate  Progress Rate = ProgressScore/T*  T = time task has run for  Time to completion = (1-ProgressScore)/T

42  Launch speculative jobs on fast nodes  best chance to overcome straggler vs using first available node  Cap on total number of speculative tasks  ‘Slowness’ minimum threshold  Does not take into account data locality

43 Sort  EC2 test cluster  1.0-1.2 Ghz Opteron/Xeon w/1.7 GB mem

44 Sort  Manually slowed down 8 VM’s with background processes

45 Grep WordCount

46

47

48 1. Make decisions early 2. Use finishing times 3. Nodes are not equal 4. Resources are precious

49  Focusing work on small vm’s fair?  Would it be better to pay for large vm and implement system with more customized control?  Could this be used in other systems?  Progress tracking is key  Is this a fundamental contribution? Or just an optimization?  “Good” research?


Download ppt "Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance."

Similar presentations


Ads by Google