PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
Computing Structural Similarity of Source XML Schemas against Domain XML Schema Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Jixue Liu 3 Guoren Wang 4 Chi.
1. Find the cost of each of the following using the Nearest Neighbor Algorithm. a)Start at Vertex M.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
Quality Aware Privacy Protection for Location-based Services Zhen Xiao, Xiaofeng Meng Renmin University of China Jianliang Xu Hong Kong Baptist University.
Analysis of Algorithms CS Data Structures Section 2.6.
Analysis and Modeling of Social Networks Foudalis Ilias.
GRAIL: Scalable Reachability Index for Large Graphs VLDB2010 Vineet Chaoji Mohammed J. Zaki.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
ON LINK-BASED SIMILARITY JOIN A joint work with: Liwen Sun, Xiang Li, David Cheung (University of Hong Kong) Jiawei Han (University of Illinois Urbana.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
CS345 Data Mining Web Spam Detection. Economic considerations  Search has become the default gateway to the web  Very high premium to appear on the.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
PageRank Identifying key users in social networks Student : Ivan Todorović, 3231/2014 Mentor : Prof. Dr Veljko Milutinović.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Efficient Common Items Extraction from Multiple Sorted Lists Wei Lu, Cuitian Rong, Jinchuan Chen, Xiaoyong Du, Gabriel Fung, Xiaofang Zhou Renmin University.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Online Social Networks and Media
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Dijkstra-Scholten and Shavit-Francez termination algorithms
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Link Prediction Class Data Mining Technology for Business and Society
Cohesive Subgraph Computation over Large Graphs
Uncovering the Mystery of Trust in An Online Social Network
A Unified Framework for Efficiently Processing Ranking Related Queries
Lecture 11 Graph Algorithms
FORA: Simple and Effective Approximate Single­-Source Personalized PageRank Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, Yin Yang School of Information.
Matrix Sketching over Sliding Windows
CSE 2331/5331 Topic 9: Basic Graph Alg.
Link Prediction Seminar Social Media Mining University UC3M
Community detection in graphs
Sublinear Algorithms for Personalized PageRank, with Applications
ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs Yu Liu , Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai.
Lecture 22 SVD, Eigenvector, and Web Search
كلية المجتمع الخرج البرمجة - المستوى الثاني
Probably Approximately
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Peer-to-Peer and Social Networks
Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Efficient Subgraph Similarity All-Matching
Asymmetric Transitivity Preserving Graph Embedding
GANG: Detecting Fraudulent Users in OSNs
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 10 Graph Algorithms
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Towards Maximum Independent Sets on Massive Graphs
Presentation transcript:

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen Contact: zhewei@ruc.edu.cn Problems Our Results: Graph data Assume a directed graph 𝐺=(𝑉,𝐸) with 𝑛 nodes and 𝑚 edges SimRank: Two objects are similar if they are referenced by similar objects, and an object is most similar to itself. High Level ideas: 𝑠(u,v) = 1 (1− 𝑐 ) 2 𝑙=0 ∞ 𝑤∈𝑉 𝜋 𝑙 (𝑢,𝑤) 𝜋 𝑙 (𝑣,𝑤)𝜂(𝑤) Sort adj list according to in-degrees. Reversely sample backward walks Backward Search on hub nodes (with large PageRanks) to build index Applications Web Mining [Jin01] Social Network Analysis [Liben-Nowell07] Spam Detection [Spirin11] Objectives Single-source query: Given a source node u, returns SimRank s(u,v) for every v Top-k query: return v1,…,vk with highest SimRank Allow an error of predetermined ε Motivations Experiments Taxonomy Datasets and methods: Competitors: READS, SLING and TSF, the state-of-the-art index-based methods; ProbeSim and TopSim, the state-of-the-art index-free methods. 𝑐 -walk: at each step, terminates w.p. 1− 𝑐 , and move to a random in-neighbor w.p. 𝑐 SimRank s(u,v) = Pr[two 𝑐 -walks from u and v meet at the same step] Experiments Results on Real-World Graphs Outperforms competitors by at least one order of magnitudes Sampling-based algorithm Motivation 1: Linear Query Time Sublinear query time is not possible on worst-case graphs Can we achieve sub-linearity on real-world graphs? Motivation 2: SimRank v.s. Graph structure Performance of existing SimRank algorithms vary on graphs with similar number of nodes and edges How does graph structure affect SimRank algorithms? PRSim Algorithm Experiments on Synthetic Graphs All query costs reversely depend on power-law exponent 𝛾 Power-Law Graphs Faction of nodes with degree k: P 𝑘 ∼ 𝑘 −𝛾 , 𝛾>𝟏 Codes: www.weizhewei.com