Incremental Recomputations in MapReduce

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

LEUCEMIA MIELOIDE AGUDA TIPO 0
Delta Confidential 1 5/29 – 6/6, 2001 SAP R/3 V4.6c PP Module Order Change Management(OCM)
Warm up Translate following expression a. three less than x. b. Product of a number and 4 2. a. -24+(-13) b. 34 – 42 3.a. -3/7 + 2/3 b. 2/3.
Kapitel S3 Astronomie Autor: Bennett et al. Raumzeit und Gravitation Kapitel S3 Raumzeit und Gravitation © Pearson Studium 2010 Folie: 1.
Kapitel 21 Astronomie Autor: Bennett et al. Galaxienentwicklung Kapitel 21 Galaxienentwicklung © Pearson Studium 2010 Folie: 1.
Slide 1 Insert your own content. Slide 2 Insert your own content.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Factors, Primes & Composite Numbers
7.5 Glide Reflections and Compositions
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Introduction to SQL 1 Lecture 5. Introduction to SQL 2 Note in different implementations the syntax might slightly differ different features might be.
ZMQS ZMQS
İsmail Arı, A. Taylan Cemgil, Lale Akarun Bogazici University, İstanbul Sep 2012, IEEE MLSP.
Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc.
Richmond House, Liverpool (1) 26 th January 2004.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Data Structures: A Pseudocode Approach with C
ABC Technology Project
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
© S Haughton more than 3?
TU e technische universiteit eindhoven / department of mathematics and computer science 1 Empirical Evaluation of Learning Styles Adaptation Language Natalia.
25 July, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
5 August, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Twenty Questions Subject: Twenty Questions
Linking Verb? Action Verb or. Question 1 Define the term: action verb.
Squares and Square Root WALK. Solve each problem REVIEW:
Introduction to Data Center Computing Derek Murray October 2010.
© 2012 National Heart Foundation of Australia. Slide 2.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
The Properties of Mathematics
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 First EMRAS II Technical Meeting IAEA Headquarters, Vienna, 19–23 January 2009.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Figure Essential Cell Biology (© Garland Science 2010)
©2004 Brooks/Cole FIGURES FOR CHAPTER 12 REGISTERS AND COUNTERS Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter.
1 Unit 1 Kinematics Chapter 1 Day
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
How Cells Obtain Energy from Food
Copyright © 2011 Pearson Education, Inc. Logarithmic Functions Chapter 11.
Incremental Update for a Compositional SDN Hypervisor Xin Jin Jennifer Rexford, David Walker.
Presentation transcript:

Incremental Recomputations in MapReduce Thomas Jörg University of Kaiserslautern

Motivation MapReduce Program Base data Result data Bigtable / HBase

Motivation  View Definition Base data Materialized view

incrementalMapReduce Program Motivation incrementalMapReduce Program  MapReduce Program Base data Result data Bigtable / HBase

Agenda Related Work Case study Incremental view maintenance Summary Delta Algorithm Conclusion and future work

Related Work Caching intermediate results DryadInc Incoop Incremental programming models Google Percolator Continuous bulk processing (CBP) L. Popa, et al.: DryadInc: Reusing work in large-scale computations. HotCloud 2009 P. Bhatotia, et al.: Incoop: MapReduce for Incremental Computations. SoCC 2011 D. Peng and F. Dabek: Large-scale Incremental Processing Using Distributed Transactions and Notifications. OSDI 2010 D. Logothetis et al.: Stateful Bulk Processing for Incremental Analytics. SoCC 2010

Challenges Programming model Efficient access paths SQL / relational algebra vs. MapReduce Efficient access paths No secondary indexes in Hbase Support for transactions Only single-row transactions in Hbase

Case Study Word histograms Reverse web-link graphs Term-vectors per host Count of URL access frequency Inverted Indexes J. Dean and S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004

Computing Reverse Web-Link Graphs <html> ... </html> Computing Reverse Web-Link Graphs <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> <html> ... </html> Thomas Jörg, Technische Universität Kaiserslautern 9 <html> ... </html> <html> ... </html> <html> ... </html>

Sample Web-Link Graph a.htm b.htm <html> <a href="b.htm"> <a href="a.htm"> ...</a> <a href="b.htm"> </html>

Computing Reverse Web-Link Graphs Map Shuffle Reduce a.htm <html> <a href="b.htm"> ...</a> </html> b.htm, a.htm b.htm, {a.htm, b.htm} b.htm, a.htm b.htm <html> <a href="a.htm"> ...</a> <a href="b.htm"> </html> a.htm, b.htm a.htm, {b.htm} b.htm, b.htm

Summary Delta Algorithm CREATE VIEW Parts AS SELECT partID, SUM(qty*price) AS revenue, COUNT(*) AS tplcnt FROM Orders GROUP BY partID SELECT partID, SUM(revenue) AS revenue, SUM(tplcnt) AS tplcnt FROM ( (SELECT partID, SUM(qty*price) AS revenue, COUNT(*) as tplcnt FROM Orders_Insertions GROUP BY partID) UNION ALL (SELECT partID, -SUM(qty*price) AS revenue, -COUNT(*) as tplcnt FROM Orders_Deletions ) GROUP BY partID I. S. Mumick et al.: Maintenance of Data Cubes and Summary Tables in a Warehouse. SIGMOD Conference 1997 W. Labio et al.: Performance Issues in Incremental Warehouse Maintenance. VLDB 2000

Computing Reverse Web-Link Graphs Map Shuffle Reduce a.htm <html> <a href="b.htm"> ...</a> </html> b.htm, a.htm b.htm, {a.htm, b.htm} b.htm, a.htm b.htm <html> <a href="a.htm"> ...</a> <a href="b.htm"> </html> a.htm, b.htm a.htm, {b.htm} b.htm, b.htm

Achieving Self-Maintainability Map Shuffle Reduce a.htm <html> <a href="b.htm"> ...</a> </html> b.htm, [a.htm, 1] b.htm, {[a.htm, 2], [b.htm, 1]} b.htm, [a.htm, 1] b.htm <html> <a href="a.htm"> ...</a> <a href="b.htm"> </html> a.htm, [b.htm, 1] a.htm, {[b.htm, 1]} b.htm, [b.htm, 1]

Sample Web-Link Graph a.htm b.htm <html> <a href="b.htm"> <a href="a.htm"> </html> <html> <a href="b.htm"> ...</a> </html> <html> <a href="a.htm"> ...</a> <a href="b.htm"> </html>

Summary Delta Algorithm in MapReduce a.htm (deleted) Map Shuffle Reduce <html> <a href="b.htm"> ...</a> </html> b.htm, [a.htm, -1] b.htm, [a.htm, -1] b.htm, {[a.htm, -1]} a.htm, {[a.htm, +1]} a.htm (inserted) <html> <a href="b.htm"> ...</a> <a href="a.htm"> </html> b.htm, [a.htm, +1] a.htm, [a.htm, +1]

Delta Installation Approaches MapReduce   Base deltas Materialized view Increment Installation Materialized view MapReduce  Base deltas Materialized view Overwrite Installation

Case Study – Lessons Learned Numerical aggregation Word histogram URL access frequency Set aggregation Reverse web-link graph Inverted index Multiset aggregation Term-vector per host

General Solution Self-maintainable aggregates Computed in three steps Translation Grouping Aggregation commutative and associative binary function inverse elements Abelian group

Case Study – Lessons Learned Numerical aggregation Word histogram URL access frequency Set aggregation Reverse web-link graph Inverted index Multiset aggregation Term-vector per host Translation function: Translate web pages into (word, 1) Aggregation function: Abelian group (Natural numbers, +) Translation function: Translate web pages into (link target, link source) Aggregation function: Abelian group (Power-multiset of URLs, multiset union)

Evaluation y-axis: Elapsed time [min] x-axis: Updates in base documents [%]

Conclusion & Future Work View Maintenance in MapReduce Case study Summary delta algorithm Self-maintainable aggregations Future Work Broader class of MapReduce programs High-level MapReduce languages, e.g. Jaql or PigLatin