Cloud Computing Lecture #4 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, February 6, 2008 This work is licensed.

Slides:



Advertisements
Similar presentations
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Advertisements

Map-Reduce Graph Processing Adapted from UMD Jimmy Lin’s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
CS 206 Introduction to Computer Science II 03 / 27 / 2009 Instructor: Michael Eckmann.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, February 21, 2013 Session 5: Graph Processing This work is licensed.
大规模数据处理 / 云计算 Lecture 6 – Graph Algorithm 彭波 北京大学信息科学技术学院 4/26/2011 This work is licensed under a Creative Commons.
Thanks to Jimmy Lin slides
Graph A graph, G = (V, E), is a data structure where: V is a set of vertices (aka nodes) E is a set of edges We use graphs to represent relationships among.
Graph Traversals Visit vertices of a graph G to determine some property: Is G connected? Is there a path from vertex a to vertex b? Does G have a cycle?
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Distributed Graph Processing Abhishek Verma CS425.
ITEC200 – Week 12 Graphs. 2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study.
Cloud Computing Lecture #3 More MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, September 10, 2008 This work is licensed under a Creative.
Graph & BFS.
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Cloud Computing Lecture #5 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, October 1, 2008 This work is licensed.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Introduction to Graphs
Lecture 14: Graph Algorithms Shang-Hua Teng. Undirected Graphs A graph G = (V, E) –V: vertices –E : edges, unordered pairs of vertices from V  V –(u,v)
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
MapReduce Algorithms CSE 490H. Algorithms for MapReduce Sorting Searching TF-IDF BFS PageRank More advanced algorithms.
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
CSE 780 Algorithms Advanced Algorithms Graph Algorithms Representations BFS.
Cloud Computing Lecture #1 Parallel and Distributed Computing Jimmy Lin The iSchool University of Maryland Monday, January 28, 2008 This work is licensed.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
CS 206 Introduction to Computer Science II 03 / 30 / 2009 Instructor: Michael Eckmann.
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Social Media Mining Graph Essentials.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
Lecture 12-2: Introduction to Computer Algorithms beyond Search & Sort.
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
Graph Algorithms. Graph Algorithms: Topics  Introduction to graph algorithms and graph represent ations  Single Source Shortest Path (SSSP) problem.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Distributed Computing Seminar Lecture 5: Graph Algorithms & PageRank Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
1 Directed Graphs Chapter 8. 2 Objectives You will be able to: Say what a directed graph is. Describe two ways to represent a directed graph: Adjacency.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
1 Algorithms CSCI 235, Fall 2015 Lecture 32 Graphs I.
Big Data Infrastructure Week 5: Analyzing Graphs (2/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United.
Big Data Infrastructure Week 5: Analyzing Graphs (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
大规模数据处理 / 云计算 05 – Graph Algorithm 闫宏飞 北京大学信息科学技术学院 7/22/2014 Jimmy Lin University of Maryland SEWMGroup This work.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Big Data Infrastructure
Big Data Infrastructure
Graphs Lecture 19 CS2110 – Spring 2013.
Lecture 11 Graph Algorithms
Link-Based Ranking Seminar Social Media Mining University UC3M
I206: Lecture 15: Graphs Marti Hearst Spring 2012.
MapReduce and Data Management
Graphs & Graph Algorithms 2
Cloud Computing Lecture #4 Graph Algorithms with MapReduce
Graph Algorithms Ch. 5 Lin and Dyer.
Graph Algorithms Adapted from UMD Jimmy Lin’s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Lecture 10 Graph Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 32 Graphs I
Graph Algorithms Ch. 5 Lin and Dyer.
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

Cloud Computing Lecture #4 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, February 6, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See for detailshttp://creativecommons.org/licenses/by-nc-sa/3.0/us/ Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

iSchool Today’s Topics Introduction to graph algorithms and graph representations Single Source Shortest Path (SSSP) problem Refresher: Dijkstra’s algorithm Breadth-First Search with MapReduce PageRank Graphs SSSP PageRank

iSchool What’s a graph? G = (V,E), where V represents the set of vertices (nodes) E represents the set of edges (links) Both vertices and edges may contain additional information Different types of graphs: Directed vs. undirected edges Presence or absence of cycles Graphs are everywhere: Hyperlink structure of the Web Physical structure of computers on the Internet Interstate highway system Social networks Graphs SSSP PageRank

iSchool Some Graph Problems Finding shortest paths Routing Internet traffic and UPS trucks Finding minimum spanning trees Telco laying down fiber Finding Max Flow Airline scheduling Identify “special” nodes and communities Breaking up terrorist cells, spread of avian flu Bipartite matching Monster.com, Match.com And of course... PageRank Graphs SSSP PageRank

iSchool Graphs and MapReduce Graph algorithms typically involve: Performing computation at each node Processing node-specific data, edge-specific data, and link structure Traversing the graph in some manner Key questions: How do you represent graph data in MapReduce? How do you traverse a graph in MapReduce? Graphs SSSP PageRank

iSchool Representation Graphs G = (V, E) A poor representation for computational purposes Two common representations Adjacency matrix Adjacency list Graphs SSSP PageRank

iSchool Adjacency Matrices Represent a graph as an n x n square matrix M n = |V| M ij = 1 means a link from node i to j Graphs SSSP PageRank

iSchool Adjacency Matrices: Critique Advantages: Naturally encapsulates iteration over nodes Rows and columns correspond to inlinks and outlinks Disadvantages: Lots of zeros for sparse matrices Lots of wasted space Graphs SSSP PageRank

iSchool Adjacency Lists Take adjacency matrices… and throw away all the zeros Represent only outlinks from a node : 2, 4 2: 1, 3, 4 3: 1 4: 1, 3 Graphs SSSP PageRank

iSchool Adjacency Lists: Critique Advantages: Much more compact representation Easy to compute over outlinks Graph structure can be broken up and distributed Disadvantages: Much more difficult to compute over inlinks Graphs SSSP PageRank

iSchool Single Source Shortest Path Problem: find shortest path from a source node to one or more target nodes First, a refresher: Dijkstra’s Algorithm Graphs SSSP PageRank

iSchool Dijkstra’s Algorithm Example 0     Graphs SSSP PageRank

iSchool Dijkstra’s Algorithm Example   Graphs SSSP PageRank

iSchool Dijkstra’s Algorithm Example Graphs SSSP PageRank

iSchool Dijkstra’s Algorithm Example Graphs SSSP PageRank

iSchool Dijkstra’s Algorithm Example Graphs SSSP PageRank

iSchool Dijkstra’s Algorithm Example Graphs SSSP PageRank

iSchool Single Source Shortest Path Problem: find shortest path from a source node to one or more target nodes Single processor machine: Dijkstra’s Algorithm MapReduce: parallel Breadth-First Search (BFS) Graphs SSSP PageRank

iSchool Finding the Shortest Path First, consider equal edge weights Solution to the problem can be defined inductively Here’s the intuition: DistanceTo(startNode) = 0 For all nodes n directly reachable from startNode, DistanceTo(n) = 1 For all nodes n reachable from some other set of nodes S, DistanceTo(n) = 1 + min(DistanceTo(m), m  S) Graphs SSSP PageRank

iSchool From Intuition to Algorithm A map task receives Key: node n Value: D (distance from start), points-to (list of nodes reachable from n)  p  points-to: emit (p, D+1) The reduce task gathers possible distances to a given p and selects the minimum one Graphs SSSP PageRank

iSchool Multiple Iterations Needed This MapReduce task advances the “known frontier” by one hop Subsequent iterations include more reachable nodes as frontier advances Multiple iterations are needed to explore entire graph Feed output back into the same MapReduce task Preserving graph structure: Problem: Where did the points-to list go? Solution: Mapper emits (n, points-to) as well Graphs SSSP PageRank

iSchool Visualizing Parallel BFS Graphs SSSP PageRank

iSchool Termination Does the algorithm ever terminate? Eventually, all nodes will be discovered, all edges will be considered (in a connected graph) When do we stop? Graphs SSSP PageRank

iSchool Weighted Edges Now add positive weights to the edges Simple change: points-to list in map task includes a weight w for each pointed-to node emit (p, D+w p ) instead of (p, D+1) for each node p Does this ever terminate? Yes! Eventually, no better distances will be found. When distance is the same, we stop Mapper should emit (n, D) to ensure that “current distance” is carried into the reducer Graphs SSSP PageRank

iSchool Comparison to Dijkstra Dijkstra’s algorithm is more efficient At any step it only pursues edges from the minimum- cost path inside the frontier MapReduce explores all paths in parallel Divide and conquer Throw more hardware at the problem Graphs SSSP PageRank

iSchool General Approach MapReduce is adapt at manipulating graphs Store graphs as adjacency lists Graph algorithms with for MapReduce: Each map task receives a node and its outlinks Map task compute some function of the link structure, emits value with target as the key Reduce task collects keys (target nodes) and aggregates Iterate multiple MapReduce cycles until some termination condition Remember to “pass” graph structure from one iteration to next Graphs SSSP PageRank

iSchool Random Walks Over the Web Model: User starts at a random Web page User randomly clicks on links, surfing from page to page What’s the amount of time that will be spent on any given page? This is PageRank Graphs SSSP PageRank

iSchool PageRank: Visually Graphs SSSP PageRank

iSchool PageRank: Defined Given page x with in-bound links t 1 …t n, where C(t) is the out-degree of t  is probability of random jump N is the total number of nodes in the graph We can define PageRank as: X titi t1t1 tntn … Graphs SSSP PageRank

iSchool Computing PageRank Properties of PageRank Can be computed iteratively Effects at each iteration is local Sketch of algorithm: Start with seed PR i values Each page distributes PR i “credit” to all pages it links to Each target page adds up “credit” from multiple in- bound links to compute PR i+1 Iterate until values converge Graphs SSSP PageRank

iSchool PageRank in MapReduce Map: distribute PageRank “credit” to link targets... Reduce: gather up PageRank “credit” from multiple sources to compute new PageRank value Iterate until convergence Graphs SSSP PageRank

iSchool PageRank: Issues Is PageRank guaranteed to converge? How quickly? What is the “correct” value of , and how sensitive is the algorithm to it? What about dangling links? How do you know when to stop? Graphs SSSP PageRank

iSchool Assignment Implement PageRank! Graphs SSSP PageRank