Graph Algorithms Ch. 5 Lin and Dyer.

Slides:



Advertisements
Similar presentations
§3 Shortest Path Algorithms Given a digraph G = ( V, E ), and a cost function c( e ) for e  E( G ). The length of a path P from source to destination.
Advertisements

CSE 390B: Graph Algorithms Based on CSE 373 slides by Jessica Miller, Ruth Anderson 1.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Thanks to Jimmy Lin slides
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
Graph II MST, Shortest Path. Graph Terminology Node (vertex) Edge (arc) Directed graph, undirected graph Degree, in-degree, out-degree Subgraph Simple.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Cloud Computing Lecture #5 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, October 1, 2008 This work is licensed.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Chapter 7 Network Flow Models.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
DAST 2005 Tirgul 12 (and more) sample questions. DAST 2005 Q.We’ve seen that solving the shortest paths problem requires O(VE) time using the Belman-Ford.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
CS 253: Algorithms Chapter 24 Shortest Paths Credit: Dr. George Bebis.
Cloud Computing Lecture #4 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, February 6, 2008 This work is licensed.
More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
Graphs CS 400/600 – Data Structures. Graphs2 Graphs  Used to represent all kinds of problems Networks and routing State diagrams Flow and capacity.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Dijkstras Algorithm Named after its discoverer, Dutch computer scientist Edsger Dijkstra, is an algorithm that solves the single-source shortest path problem.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Common final examinations When: Wednesday, 12/11, 3:30-5:30 PM Where: Ritter Hall - Walk Auditorium 131 CIS 1166 final When: Wednesday, 12/11, 5:45-7:45.
Dijkstra’s Algorithm Supervisor: Dr.Franek Ritu Kamboj
Graph Algorithms. Graph Algorithms: Topics  Introduction to graph algorithms and graph represent ations  Single Source Shortest Path (SSSP) problem.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
Graphs Part II Lecture 7. Lecture Objectives  Topological Sort  Spanning Tree  Minimum Spanning Tree  Shortest Path.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘length’ c ij (cost, time, distance, …). Determine a path of shortest.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
大规模数据处理 / 云计算 05 – Graph Algorithm 闫宏飞 北京大学信息科学技术学院 7/22/2014 Jimmy Lin University of Maryland SEWMGroup This work.
Proof of correctness of Dijkstra’s algorithm: Basically, we need to prove two claims. (1)Let S be the set of vertices for which the shortest path from.
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
CSE 373: Data Structures and Algorithms Lecture 21: Graphs V 1.
Big Data Infrastructure
The PageRank Citation Ranking: Bringing Order to the Web
Graphs Lecture 19 CS2110 – Spring 2013.
Graphs Representation, BFS, DFS
Minimum Spanning Tree 8/7/2018 4:26 AM
Link-Based Ranking Seminar Social Media Mining University UC3M
Introduction to Graphs
Common final examinations
I206: Lecture 15: Graphs Marti Hearst Spring 2012.
Graphs Representation, BFS, DFS
Graphs Lecture 18 CS2110 – Fall 2009.
Cloud Computing Lecture #4 Graph Algorithms with MapReduce
Algorithms (2IL15) – Lecture 5 SINGLE-SOURCE SHORTEST PATHS
Minimum Spanning Tree Algorithms
Graph Algorithms Adapted from UMD Jimmy Lin’s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Algorithms (2IL15) – Lecture 7
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
CS 584 Project Write up Poster session for final Due on day of final
CSE 373 Data Structures and Algorithms
Parallel Graph Algorithms
Spanning Trees Lecture 20 CS2110 – Spring 2015.
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
Graphs: Shortest path and mst
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Graph Algorithms Ch. 5 Lin and Dyer.
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

Graph Algorithms Ch. 5 Lin and Dyer

Graphs Are everywhere Manifest in the flow of emails Connections on social network Bus or flight routes Social graphs: twitter friends and followers Take a look at Jon Kleinger’s page and book on Networks, Crowds and Markets Reasoning about a highly connected world.

Graph algorithms Graph search and path planning:: shortest path to a node Graph clustering:: diving the graphs into smaller related clusters Minimum spanning tree:: graph that covers the nodes in an efficient way Bipartite graph match:: div graph into two mapping sets: job seekers and employers Maximum flow:: designate source and sink; determine max flow between the two: transportation Identifying special nodes: authoritative nodes: containment of spread of diseases; Broad street water pump in London, cholera and beginnings of epidemiology

Graph Representations How do you represent this visual diagram as data?

Simple, Baseline Data Structure 1 n1 n2 n3 n4 n5 n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5] n5 [n1,n2,n3] Adjacency matrix – this is good for linear algebra; But most web links and social Networks are sparse x/ 1000000000 Space req. is O(n2) (ii) Adjacency lists

Problem definition: intuition Input: graph adjacency list with edges and vertices, w edges distances, starting vertex Output(goal): label the nodes/vertices with the shortest distance value from the starting node

single source shortest path problem Sequential solution: Dijkstra’s algorithm 5.2 Dijkstra (G, w, s) // w edge distances list, s starting node, G graph d[s]  0 for all other vertices d[v] ∞ Q  {V} // Q is priority queue based on distances while Q # 0 u  min(Q) // node with min d value for all vertex v in u.adjacencyList if d[v] > d[u] + w[u,v] d[v]  d[u] + w[u,v] mark u and remove from Q At each iteration of while loop, the algorithm expands the node with the shortest distance and updates distances to all reachable nodes

Sample graph : lets apply the algorithm 5.2 n2 n4 ∞ 1 ∞ 10 9 3 2 6 4 7 n1 5 ∞ ∞ 2 n3 n5

Issues Sequential Need to keep global state: not possible with MR Lets see how we can handle this graph problem for parallel processing with MR

Parallel Breadth First Assume distance of 1 for all edges (simplifying assumption): later we will expand it to other distances

Issues in processing a graph in MR Goal: start from a given node and label all the nodes in the graph so that we can determine the shortest distance Representation of the graph (of course, generation of a synthetic graph) Determining the <key,value> pair Iterating through various stages of processing and intermediate data When to terminate the execution

Input data format for MR Node: nodeId, distanceLabel, adjancency list {nodeId, distance} This is one split Input as text and parse it to determine <key, value> From mapper to reducer two types of <key, value> pairs <nodeid n, Node N> <nodeid n, distance until now label> Need to keep the termination condition in the Node class Terminate MR iterations when none of the labels change, or when the graph has reached a steady state or all the nodes have been labeled with min distance or other conditions using the counters can be used. Now lets look at the algorithm given in the book

Mapper Class Mapper method map (nid n, Node N) d  N.distance emit(nid n, N) // type 1 for all m in N. Adjacencylist emit(nid m, d+1) // type 2

Reducer Class Reducer method Reduce(nid m, [d1, d2, d3..]) dmin = ∞; // or a large # Node M  null for all d in [d1,d2, ..] { if IsNode(d) then M  d else if d < dmin then dmin  d} M.distance  dmin // update the shortest distance in M emit (nid m, Node M)

Trace with sample Data 1 0 2:3: 2 10000 3:4: 3 10000 2:4:5 4 10000 5: 5 10000 1:4

Intermediate data 1 0 2:3: 2 1 3:4: 3 1 2:4:5: 4 10000 5: 5 10000 1:4:

Intermediate Data 1 0 2:3: 2 1 3:4: 3 1 2:4:5: 4 2 5: 5 2 1:4:

Final Data 1 0 2:3: 2 1 3:4: 3 1 2:4:5: 4 2 5: 5 2 1:4:

Sample Data 1 0 2:3: 2 10000 3:4: 3 10000 2:4:5 4 10000 5: 5 10000 1:4 2 1 3:4: 3 1 2:4:5 4 2 5: 5 2 1:4

PageRank Original algorithm (huge matrix and Eigen vector problem.) Larry Page and Sergei Brin (Standford Ph.D. students) Rajeev Motwani and Terry Winograd (Standford Profs)

General idea Consider the world wide web with all its links. Now imagine a random web surfer who visits a page and clicks a link on the page Repeats this to infinity Pagerank is a measure of how frequently will a page will be encountered. In other words it is a probability distribution over nodes in the graph representing the likelihood that a random walk over the linked structure will arrive at a particular node.

PageRank Formula P(n) = α 1 𝐺 +(1−𝛼) 𝑚∈𝐿(𝑛) 𝑃 𝑚 𝐶 𝑚 α randomness factor G is the total number of nodes in the graph L(n) is all the pages that link to n C(m) is the number of outgoing links of the page m Note that PageRank is recursively defined. It is implemented by iterative MRs.

Example Figure 5.7 Lets assume alpha as zero Lets look at the MR

Mapper for PageRank Class Mapper method map (nid n, Node N) p  N.Pagerank/|N.AdajacencyList| emit(nid n, N) for all m in N. AdjacencyList emit(nid m, p) “divider”

Reducer for Pagerank Class Reducer method Reduce(nid m, [p1, p2, p3..]) node M  null; s = 0; for all p in [p1,p2, ..] { if p is a Node then M  p else s  s+p } M.pagerank  s emit (nid m, node M) “aggregator”

Lets trace with sample data 1 2 4 3

Discussion How to account for dangling nodes: one that has many incoming links and no outgoing links Simply redistributes its pagerank to all One iteration requires pagerank computation + redistribution of “unused” pagerank Pagerank is iterated until convergence: when is convergence reached? Probability distribution over a large network means underflow of the value of pagerank.. Use log based computation MR: How do PRAM alg. translate to MR? how about other math algorithms?