1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.

Slides:



Advertisements
Similar presentations
Network biology Wang Jie Shanghai Institutes of Biological Sciences.
Advertisements

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Introduction to Network Theory: Modern Concepts, Algorithms
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Graphs, Node importance, Link Analysis Ranking, Random walks
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
A Probabilistic Model for Road Selection in Mobile Maps Thomas C. van Dijk Jan-Henrik Haunert W2GIS, 5 April 2013.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Graphs. Graphs Many interesting situations can be modeled by a graph. Many interesting situations can be modeled by a graph. Ex. Mass transportation system,
Link Analysis, PageRank and Search Engines on the Web
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
Advanced Topics in Data Mining Special focus: Social Networks.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.
WEB SCIENCE: ANALYZING THE WEB. Graph Terminology Graph ~ a structure of nodes/vertices connected by edges The edges may be directed or undirected Distance.
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Representing and Using Graphs
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Slides are modified from Lada Adamic
Topics Paths and Circuits (11.2) A B C D E F G.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
John Lafferty Andrew McCallum Fernando Pereira
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
1 CS 430: Information Discovery Lecture 5 Ranking.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Graphs David Kauchak cs302 Spring Admin HW 12 and 13 (and likely 14) You can submit revised solutions to any problem you missed Also submit your.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Graph clustering to detect network modules
The PageRank Citation Ranking: Bringing Order to the Web
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Lecture 22 SVD, Eigenvector, and Web Search
CS223 Advanced Data Structures and Algorithms
Graph and Link Mining.
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Presentation transcript:

1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too complex for manual analysis

2 Existing Techniques  Web PageRank (Google)  Social Networks ‘Centrality’  All focus on global measures of node importance – we’re interested in importance relative to a set of root nodes R

3 Use Existing Techniques?  Use global algorithm on the subgraph surrounding root nodes?  No preferential treatment of root nodes – just ranking surrounding nodes.

4 Organization: Relative importance Algorithms Notation Problem Formulation General Framework Algorithms

5 Notation  Digraph G = (V, E)  Edges Ordered pair of nodes (u, v)  Graphs are directed, unweighted, simple  Walks from u to v a.k.a. A walk is a path with no repeated nodes

6 Notation  k-short paths  P(u,v) – set of paths between u and v  – set of distinct out-going edges from u   Similarly, we have

7 Problem Formulation 1.Given G and r and t, where, compute the “importance” of t w.r.t. root node r:

8 Problem Formulation 2.Given G and node, rank all vertices in T(G), T V, w.r.t. r.

9 Problem Formulation 3.Given G, a set of nodes T(G) to rank, and a set of root nodes R(G) where R V, rank all vertices in T w.r.t. R. This is similar to the last case, except that we compute rather than Average importance:

10 Problem Formulation (3 cont’d.)  Rather than average each node’s importance score, we could define  This requires ‘important’ nodes to have a high importance score among all nodes in R

11 Problem Formulation 4.Given G, rank all nodes where R=T=V.

12 General Framework: Weighted Paths  Nodes are related according to the paths that connect them  The longer the path, the less importance: is a scalar coefficient, P(r,t) is a set of paths from r to t, p i is the ith path in P. Importance decays exponentially

13 How to choose P(r,t)?  Path examples a.b. Shortest paths from R to T: {R-C-T. R-D-T} which fail to capture much of Connectivity from R to T.

14 Shortest Path  e.g.: Transport cargo from r to t  Shortest path doesn’t always give a good approximation of importance. E.g: the web (graph b)

15 k-Short Paths  Paths of length K  Idea: there might often be longer paths than the shortest ones that are important to take into account  Fixes problem of longer, important paths in Shortest Paths e.g.: graph b., 3-short  Problem: capacity constraints e.g.: network topology

16 k-Short Node-Disjoint Paths  No nodes and no edges are repeated Implicitly enforces capacity constraints Motivated by ‘mass flow’ where importance can ‘flow’ along paths e.g.: graph b.  Breadth-first with some heuristic, with some K and some

17 Markov Chains & Relative Importance  Graph viewed as a stochastic process Explanation of Markov Chains Token traversing Chain… Obviously good for modeling the web

18 Markov Chains & Relative Importance  Markov Centrality Mean First Passage Time : expected number of steps until first arrival at node t starting at node r : probability that the chain first returns to state t in exactly n steps

19 Markov Chains & Relative Importance Bias toward ‘central nodes’ COMPLEX!!  Time: O(|V| 3 ) (inversion of |V|x|V| transition matrix)  Space: O(|V 2 |)

20 Markov Chains & Relative Importance  PageRank Uses backlinks to assign importance to web pages

21 Markov Chains & Relative Importance  PageRank Less complex Converges logarithmically 322 million links processed in 52 iterations

22 Markov Chains & Relative Importance  Retrofit PageRank such that all nodes in R have a uniform bias at the start  ‘Surfer’ begins at a root node, traverses graph, returning to root set R with probability at each time-step  I(t|R) = probability that surfer visits t during a walk

23 Experiments (Simulated Data)

24 Experiments (Simulated Data)  More complex in and out degrees changed Shortest path lengths between nodes changed (e.g.: A-B)  Analysis which follows, R={A,F}

25 Experiments (Simulated Data)  HITSPa A.252 F.241 G.128 C.110 E.099 H.052 D.032 J.025 I.032 B.024  HITSPh F.225 A.186 D.162 B.119 E.090 I.067 H.061 J.050 G.028 C.008

26 Experiments (Simulated Data)  MarkovC J.180 C.133 G.130 H.129 E.111 I.101 F.069 D.051 A.047 B.044  KSMarkov H.146 G.142 E.142 J.140 C.120 I.098 F.087 D.061 A.034 B.024

27 Experiments (9/11 Terrorist Network)  63 nodes (terrorists)  308 edges (interactions)

RankPRankPHITSPWKPathsMarkovCKSMarkov 1Khemais BeghalAttaKhemais 2Beghal KhemaisAl-ShehhiBeghal 3MoussaouiAttaMoussaouiAl-ShibhMoussaoui 4MaaroufiMoussaouiMaaroufiMoussaouiMaaroufi 5QatadaMaaroufiBensakhriaJarrahQatada 6DaoudiQatadaDaoudiHanjourDaoudi 7CourtaillierBensakhriaQatadaAl-OmariBensakhria 8 DaoudiWalidKhemaisCourtaillier 9WalidCourtaillier QatadaWalid 10Khammoun BahajiKhammoun

29 Conclusion  Provides a first-step to addressing ‘relative-importance’  Scaling for algorithms such as Markov Chaining can be an issue  Using different algorithms and comparing results can reveal interesting information  …Paper Analysis…

30 References  White, Smyth. Algorithms for Estimating Relative Importance in Networks. SIGKDD ’03.  Page, Brin, Motwani, Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Stanford University, Computer Science Department Technical Report.  Wikipedia on Markov Chains

31 Weather Markov Chain Example

32 Markov Chain Steady State  The further along the prediction, the less accurate – converges on a steady state We’ll skip the proof in interest of time…  Probabilities derived from gathering experimental data