RoundTripRank Graph-based Proximity with Importance and Speciﬁcity Yuan FangUniv. of Illinois at Urbana-Champaign Kevin C.-C. ChangUniv. of Illinois at.

Slides:

Advertisements

Similar presentations

CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009

Advertisements

Fast Algorithms for Top-k Personalized PageRank Queries Manish Gupta Amit Pathak Dr. Soumen Chakrabarti IIT Bombay.

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.

Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

Evaluating Search Engine

ON LINK-BASED SIMILARITY JOIN A joint work with: Liwen Sun, Xiang Li, David Cheung (University of Hong Kong) Jiawei Han (University of Illinois Urbana.

Search Engines and Information Retrieval

Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University

CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.

1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.

Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,

Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:

Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.

Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.

1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.

1 Fast Dynamic Reranking in Large Graphs Purnamrita Sarkar Andrew Moore.

1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.

1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.

Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’

Chapter 5: Information Retrieval and Web Search

1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.

Chapter 4 Research UP B Class.

Fast Algorithms for Top-k Personalized PageRank Queries Manish Gupta Amit Pathak Dr. Soumen Chakrabarti IIT Bombay.

Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.

Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.

Search Engines and Information Retrieval Chapter 1.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.

Using Hyperlink structure information for web search.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.

EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.

Chapter 6: Information Retrieval and Web Search

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.

1 Authors: Glen Jeh, Jennifer Widom (Stanford University) KDD, 2002 Presented by: Yuchen Bian SimRank: a measure of structural-context similarity.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.

1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.

CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A PRESENTATION on What is this Page Known for? Computing Web Page Reputations D. Rafiei.

Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.

Project funded by the Future and Emerging Technologies arm of the IST Programme Search in Unstructured Networks Niloy Ganguly, Andreas Deutsch Center for.

Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

Recommendation in Scholarly Big Data

Evaluation Anisio Lacerda.

Link Prediction Seminar Social Media Mining University UC3M

Probably Approximately

Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.

Jiawei Han Department of Computer Science

CMNS 110: Term paper research

Introduction to Information Retrieval

Presentation transcript:

RoundTripRank Graph-based Proximity with Importance and Speciﬁcity Yuan FangUniv. of Illinois at Urbana-Champaign Kevin C.-C. ChangUniv. of Illinois at Urbana-Champaign Hady W. LauwSingapore Management University ICDE Brisbane, Australia April 10, 2013 arise.adsc.com.sg

Graph-based proximity has many applications with different ranking needs 2 paper 1 paper 2 VLDB “data” “spatio” author 1 paper 3 “xml” ICDE author 2 Citation graph Social network Query log graph “pear” “apple” “apple inc” “temporal” ? ? ? ?

Although various applications involve different needs, ranking by existing graph proximity is limited 3 SIGMOD Intl Conference VLDB Intl Conference ICDE Intl Conference Looks reasonable? What’s missing? “spatio”, “temporal”, “data” favor very popular or important venues Query Matching venues by P-PageRank only categorically related as data topics “schema”, “matching”?

Other venues are needed for different purposes 4 Spatio-Temporal Databases Springer Book Spatio-Temporal Data Mining Intl Workshop Temporal Aspects in Information Systems Working Conference More specific venues? “spatio”, “temporal”, “data” Query VLDB Intl Conference Spatio-Temporal Databases Springer Book ACM SIGSPATIAL/GIS Intl Conference A balanced mixture of venues? important specific balanced quick background studyreport preliminary results

Specificity has been traditionally ignored 5 Closeness Common neighbor Jaccard coefﬁcient [Jaccard1901] AdamicAdar [Adamic2003] Hitting time Escape probability [Koren2006, Tong2007] SimRank [Jeh2002] Reachability Ad-hoc Katz [Katz1953] Semantics Methodology Specificity InvObjectRank Inverse global ObjectRank Inverse node degree [Hristidis2008] Importance P-PageRank [Page1999] ObjectRank [Balmin2004] PopRank [Nie2005]

Applications require varying degrees of trade- off between importance and specificity 6 Observation 1 Most Tasks Require Both Importance and Specificity. Observation 2 The Desirable Trade-off Varies from Task to Task. Finding a Reviewer Overly important: maybe too broad, unaware of details Overly specific: maybe a student, lack authoritativeness Choosing a Venue (to submit best work) important conferences ++ (to build background) specific book chapters ++ Purpose?

Addressing the two observations is challenging 7 Challenge 1: How do we unify importance & specificity into a single proximity measure? Challenge 2: How do we generalize our unified model to accommodate flexible trade-offs? Generalize random walk based importance to integrate specificity. more importance more specificity Challenge 3: How do we efficiently compute the proximity measure? Real-time search is indispensable.

Challenge 1 How do we unify importance & specificity into a single proximity measure?

Let’s first review reachability-based importance for generalization to specificity 9 paper 1 paper 2 cites published in paper 1 ICDE accepts “citations” or “endorsements”

Generalize importance to specificity based on the same citation analogy 10 paper 6 paper 2 ICDE STDB (book) paper 5 “cache” “spatio” “lock” paper 4 ? ? paper 1 paper 3

Unify forward and backward walks into a round trip for both importance & specificity 11 Random walk: Round trip: Target node: RoundTripRank: querytarget

Challenge 2 How do we generalize our unified model to accommodate flexible trade-offs? Based on the same principle of random walk in a round trip.

Further generalize RoundTripRank using hybrid random surfers of different goals 13 Goal: balance b/w importance and specificity …

Generalize the behaviors of hybrid random surfers for customizable trade-offs 14 … … … balance importance specificity Hybrid Surfers Adjusting Composition RoundTripRank+ SIGSPATIAL/GIS VLDB STDB mostly balanced mostly important mostly specific

Challenge 3 How do we efficiently compute the proximity measure?

Compute RoundTripRank by “divide & conquer” 16 “divide” “conquer”

Compute RoundTripRank by “divide & conquer” 17 “divide” “conquer”

Top-K ranking is more practical & scalable 18 Full ranking [over the entire graph] Top- K ranking [based on a neighborhood]

Branch-and-bound algorithm 19 Neighborhood expansion … Bounds

20 … …

Experiments

Experimental Setup 22 paper 2 author 1 author 2 venue term 1 term 2 paper 1 Bibliographic network (BibNet) phrase 1 phrase 2 phrase 3 URL 1 URL 2 Query log graph (QLog) Graphs Evaluation methodology Reserve nodes with known associations to query Remove those associations from the graph Can a proximity measure still rank those nodes highly? ? ? Hide-and-rediscover

Evaluation Tasks 23 more importance more specificity 1. Find authors of a paper 2. Find venues of a paper 3. Find URL of a phrase 4. Find equivalent phrase

Both importance & specificity are needed % ~ 10% Venues matching “spatio temporal data” importantspecificbalanced NDCG Quantitative evaluation (hide-and-rediscover) F-Rank/PPR dell dell com dell computers T-Rank dell c1295 battery for dell inspiron RoundTripRank dell battery battery for dell inspiron 8000 dell Phrases similar to “dell notebook” importantspecificbalanced F-Rank/PPR dell dell com dell computers T-Rank dell c1295 battery for dell inspiron RoundTripRank dell battery battery for dell inspiron 8000 dell

25 1. Find authors of a paper 4. Find equivalent phrase Find venues of a paper 3. Find URL of a phrase

26 Comparison to non-customizable dual-sensed proximity + 6% ~ 7% NDCG

Our top-K method is efficient & scalable 27 Efficiency Scalability 300 ms x 7.4 x 1.9

28 Importance as “Reachability”  Specificity as “Returnability” “Reachability” + “Returnability”  a Round Trip