Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)

Slides:



Advertisements
Similar presentations
1. Find the cost of each of the following using the Nearest Neighbor Algorithm. a)Start at Vertex M.
Advertisements

Overview of this week Debugging tips for ML algorithms
Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University)
epiC: an Extensible and Scalable System for Processing Big Data
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, February 21, 2013 Session 5: Graph Processing This work is licensed.
大规模数据处理 / 云计算 Lecture 6 – Graph Algorithm 彭波 北京大学信息科学技术学院 4/26/2011 This work is licensed under a Creative Commons.
Thanks to Jimmy Lin slides
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Cloud Computing Lecture #3 More MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, September 10, 2008 This work is licensed under a Creative.
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Cloud Computing Lecture #5 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, October 1, 2008 This work is licensed.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
MapReduce Algorithms CSE 490H. Algorithms for MapReduce Sorting Searching TF-IDF BFS PageRank More advanced algorithms.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
(hyperlink-induced topic search)
PageRank Identifying key users in social networks Student : Ivan Todorović, 3231/2014 Mentor : Prof. Dr Veljko Milutinović.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
CSC 2300 Data Structures & Algorithms March 30, 2007 Chapter 9. Graph Algorithms.
Cloud Computing Lecture #4 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, February 6, 2008 This work is licensed.
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Concurrent Algorithms. Summing the elements of an array
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Graph Algorithms. Graph Algorithms: Topics  Introduction to graph algorithms and graph represent ations  Single Source Shortest Path (SSSP) problem.
MapReduce Algorithm Design Based on Jimmy Lin’s slides
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
Distributed Computing Seminar Lecture 5: Graph Algorithms & PageRank Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Big Data Infrastructure Week 5: Analyzing Graphs (2/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
大规模数据处理 / 云计算 05 – Graph Algorithm 闫宏飞 北京大学信息科学技术学院 7/22/2014 Jimmy Lin University of Maryland SEWMGroup This work.
Csinparallel.org Workshop 307: CSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts, Hands-On Dick Brown, St. Olaf College Libby Shoop,
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Mathematics of the Web Prof. Sara Billey University of Washington.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Big Data Infrastructure
The PageRank Citation Ranking: Bringing Order to the Web
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
PREGEL Data Management in the Cloud
Concurrent Algorithms
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Data Structures and Algorithms in Parallel Computing
Distributed Systems CS
MapReduce and Data Management
Cloud Computing Lecture #4 Graph Algorithms with MapReduce
Concurrent Algorithms
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Graph Algorithms Ch. 5 Lin and Dyer.
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Local Clustering Coefficient
MapReduce Algorithm Design
Concurrent Algorithms
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Concurrent Algorithms
Concurrent Algorithms
Graph Algorithms Ch. 5 Lin and Dyer.
Presentation transcript:

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)

What is MapReduce? Definition: Programming model and an associated implementation for processing and generating large datasets with a parallel, distributed algorithm on a cluster 2 main parts - Mapper - Reducer 2 sub parts - Combiner - Partitioner

What is MapReduce? 1)Mappers applied to input 2)Combiners perform local aggregation 3)Partitioners send data to reducers 4)Reducers aggregate results Very parallelizable

Example Step through the MapReduce function to return the # of times a certain word length appears in the following sentence: We should all take summer classes this year. Write and label the outputs of the mapper, combiner, and reducer (no need for a partitioner with an example this small)

Example (continued) “We should all take summer classes this year.” Mapper- 2: We 5: should 3: all 4: take 6: summer 7: classes 4: this 4: year

Example (continued) “We should all take summer classes this year.” Mapper- 2: We 3: all 4: take 4: this 4: year 5: should 6: summer 7: classes

Example (continued) “We should all take summer classes this year.” Combiner- 2: [We] 3: [all] 4: [take, this, year] 5: [should] 6: [summer] 7: [classes]

Example (continued) “We should all take summer classes this year.” Reducer- 2: 1 3: 1 4: 3 5: 1 7: 1

“Message Passing” Graphs G = (V, E) -Graph = (Vertices, Edges) -directed graphs In-degree - how many vertices point to me Out-degree - how many vertices do I point to Metadata

PageRank Definition: Google’s main algorithm that works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. Assumption - Really one big popularity contest Graph Topology - “physical” layout of the graph - what points to what

PageRank At each iteration… - Computations occur at every vertex as a function of the vertex’s internal state and the LOCAL graph structure - Partial results in the form of messages are “passed” via DIRECTED edges to each vertex’s neighbors - Computations occur at every vertex based on incoming partial results, potentially altering the vertex’s internal state

PageRank

Basic PageRank Algorithm

Basic Example Say A has a link to B, B has links to C and A, C has a link to A, and D has a link to A B and C…

Basic Example (continued) Each page has starting rank of 0.25 PR(A) = (0.25 / L(B)) + (0.25 / L(C)) + (0.25 / L(D)) B has 2 links, C has 1 link, D has 3 links PR(A) = (0.25 / 2) + (0.25 / 1) + (0.25 / 3) PR(A) = = …

Complications Need a way to deal with… - Random hops - Sinks

Dampening Factor Probability that at any step, the surfer will continue on as he has been (1 – 0.85) / N

Dampening Factor

Tying It All Together Why MapReduce - good for this type of calculation - Exploit shuffle and sort phase to aid info passing Parallelization of PageRank - Only care about local topology and dampening factor - No need to worry about entire picture - create adjacency list representation of the graph where key is id of vertex and value is vertex’s structure and metadata -metadata probably include out-degree and internal state

Bibliography "PageRank." Wikipedia. Wikimedia Foundation, 26 Apr Web. 03 May 2015 "MapReduce." Wikipedia. Wikimedia Foundation, 01 May Web. 03 May Lin, Jimmy, and Michael Schatz. "Design Patterns for Efficient Graph Algorithms in MapReduce." Thesis. University of Maryland, College Park, Web. 1 May

Questions?