Fast Algorithms for Top-k Personalized PageRank Queries Manish Gupta Amit Pathak Dr. Soumen Chakrabarti IIT Bombay.

Slides:



Advertisements
Similar presentations
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Advertisements

Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
1 Distributed Deadlock Fall DS Deadlock Topics Prevention –Too expensive in time and network traffic in a distributed system Avoidance.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Dynamic Graph Algorithms - I
The Volcano/Cascades Query Optimization Framework
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Entity Ranking and Relationship Queries Using an Extended Graph Model Ankur Agrawal S. Sudarshan Ajitav Sahoo Adil Sandalwala Prashant Jaiswal IIT Bombay.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
1 Fast Dynamic Reranking in Large Graphs Purnamrita Sarkar Andrew Moore.
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
1 Fast Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar Machine Learning Department Carnegie Mellon University.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Fast Algorithms for Top-k Personalized PageRank Queries Manish Gupta Amit Pathak Dr. Soumen Chakrabarti IIT Bombay.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Lesley Charles November 23, 2009.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
RoundTripRank Graph-based Proximity with Importance and Specificity Yuan FangUniv. of Illinois at Urbana-Champaign Kevin C.-C. ChangUniv. of Illinois at.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
HITS Hypertext-Induced Topic Selection
FORA: Simple and Effective Approximate Single­-Source Personalized PageRank Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, Yin Yang School of Information.
Sublinear Algorithms for Personalized PageRank, with Applications
Declarative Creation of Enterprise Applications
CS 440 Database Management Systems
PageRank algorithm based on Eigenvectors
Keyword Searching and Browsing in Databases using BANKS
Structure and Content Scoring for XML
Keyword Searching and Browsing in Databases using BANKS
Structure and Content Scoring for XML
Efficient Processing of Top-k Spatial Preference Queries
Introduction to XML IR XML Group.
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Fast Algorithms for Top-k Personalized PageRank Queries Manish Gupta Amit Pathak Dr. Soumen Chakrabarti IIT Bombay

Problem: PageRank for ER graph queries Find top-k experts from industry to review a submitted paper p under category “Information Systems” Low index size, low query time 200–1600× faster than whole-graph Pagerank (top-k ranking contributes 4×) 10–20% smaller index; accuracy comparable to ObjectRank Extension to handle hard predicates

Notations Graph G= (V, E) with edges (u, v) Є E Conductance C(v,u) such that Σ v C(v,u) =1 Teleport prob 1-α and vector r, Σ v r(v) =1 Personalized PageRank [5](PPR) for vector r is PPV r = p r = α C p r + (1- α) r= (1- α) (I- α C) -1 r For node v, r(v)=1  its PPV is PPV v H is Hubset; sloppyTopK varies in

Previous work ObjectRank [1] – Graph proximity queries modeled as authority flow originating from match nodes – It requires pre-computation of all word PPVs. Asynchronous Weight-Pushing Algorithm (BCA) [2] HubRank [4] – Based on Personalized PageRank [5] and BCA [2] – Proposes a hubset selection model

Basic top-k Framework For most applications, top-k answers are sufficient. Proposition 1: At any time, for all nodes u,

If u 1, u 2, … are the nodes sorted in non-increasing order of their scores, u 1, u 2, …, u k are the best k answer nodes iff Sloppy top-k Half of the queries terminate via top-K quit check and at k=K* near Proposition 2: At any time, for all nodes u, Need to maintain lower and upper bounds separately Proposition 3: At any time, for all nodes u, Needs less book-keeping; 6% less query time; more queries quit earlier at lower K* Basic top-k Framework

Hard Predicates Find top-k papers related to XML published in 2008 Target nodes (nodes that strictly satisfy the hard predicates) are returned as answer nodes 2 approaches – a. naiveTopk: Modified “basic top-k for soft predicate queries”, such that a node is considered to be put in heap M only if it belongs to target set – b. Node-deletion algorithm No need to rank non-target nodes; delete non- target nodes while executing push

Node Deletion Algorithm Special sink node s with self-loop of C(s, s) = 1. Delete a node u from graph G to create G’=(V’,E’) such that for any teleport r’ |V’|×1 over G’,p’ r’ (v) = p r (v) for all nodes v Є V’−s where p’ r’ (v) is computed over G’, r(v) = r’(v) for v Є V’ and r(v) = 0 for What fraction of q(v) reaches w on path v  u  w?

Ranking only target nodes (Delete -Push) Deleting non-target node avoids further pushes from it and so saves work but can bloat number of edges. Victim selection – Block structure [6] in social network graphs – Indegree and outdegree of nodes in graph follow power law [3] – Aggressive approach: Delete all non-target nodes Simple non-aggressive approach: Local search from node u and delete non-target non-hubset out- neighbours of u if it doesn’t bloat number of edges

Experiments 1994 snapshot of CITESEER corpus has nodes and edges Lucene text indices - 55MB 1.9M CITESEER queries; = [20, 40] Naive one-shot Hubset [4] of size % time invested in quit checks result 4× speed boost

Experiments Target set size was varied by having different hard predicates on publication years DeletePush works better when the target set sizes are not too large

References [1] A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In VLDB, pages 564– 575, [2] P. Berkhin. Bookmark-coloring approach to personalized pagerank computing. Internet Mathematics, 3(1):41–62, Jan [3] A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L. Wiener. Graph structure in the web. Computer Networks, 33(1-6):309–320, [4] S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. In www, Banff, May [5] G. Jeh and J. Widom. Scaling personalized web search. In WWW Conference, pages 271–279, [6] S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing, Mar

Questions? Thanks for your time and attention!