Download presentation
Presentation is loading. Please wait.
Published byVictor Carr Modified over 9 years ago
1
RoundTripRank Graph-based Proximity with Importance and Specificity Yuan FangUniv. of Illinois at Urbana-Champaign Kevin C.-C. ChangUniv. of Illinois at Urbana-Champaign Hady W. LauwSingapore Management University ICDE 2013 @ Brisbane, Australia April 10, 2013 arise.adsc.com.sg
2
Graph-based proximity has many applications with different ranking needs 2 paper 1 paper 2 VLDB “data” “spatio” author 1 paper 3 “xml” ICDE author 2 Citation graph Social network Query log graph “pear” “apple” “apple inc” “temporal” ? ? ? ?
3
Although various applications involve different needs, ranking by existing graph proximity is limited 3 SIGMOD Intl Conference VLDB Intl Conference ICDE Intl Conference Looks reasonable? What’s missing? “spatio”, “temporal”, “data” favor very popular or important venues Query Matching venues by P-PageRank only categorically related as data topics “schema”, “matching”?
4
Other venues are needed for different purposes 4 Spatio-Temporal Databases Springer Book Spatio-Temporal Data Mining Intl Workshop Temporal Aspects in Information Systems Working Conference More specific venues? “spatio”, “temporal”, “data” Query VLDB Intl Conference Spatio-Temporal Databases Springer Book ACM SIGSPATIAL/GIS Intl Conference A balanced mixture of venues? important specific balanced quick background studyreport preliminary results
5
Specificity has been traditionally ignored 5 Closeness Common neighbor Jaccard coefficient [Jaccard1901] AdamicAdar [Adamic2003] Hitting time Escape probability [Koren2006, Tong2007] SimRank [Jeh2002] Reachability Ad-hoc Katz [Katz1953] Semantics Methodology Specificity InvObjectRank Inverse global ObjectRank Inverse node degree [Hristidis2008] Importance P-PageRank [Page1999] ObjectRank [Balmin2004] PopRank [Nie2005]
6
Applications require varying degrees of trade- off between importance and specificity 6 Observation 1 Most Tasks Require Both Importance and Specificity. Observation 2 The Desirable Trade-off Varies from Task to Task. Finding a Reviewer Overly important: maybe too broad, unaware of details Overly specific: maybe a student, lack authoritativeness Choosing a Venue (to submit best work) important conferences ++ (to build background) specific book chapters ++ Purpose?
7
Addressing the two observations is challenging 7 Challenge 1: How do we unify importance & specificity into a single proximity measure? Challenge 2: How do we generalize our unified model to accommodate flexible trade-offs? Generalize random walk based importance to integrate specificity. more importance more specificity Challenge 3: How do we efficiently compute the proximity measure? Real-time search is indispensable.
8
Challenge 1 How do we unify importance & specificity into a single proximity measure?
9
Let’s first review reachability-based importance for generalization to specificity 9 paper 1 paper 2 cites published in paper 1 ICDE accepts “citations” or “endorsements”
10
Generalize importance to specificity based on the same citation analogy 10 paper 6 paper 2 ICDE STDB (book) paper 5 “cache” “spatio” “lock” paper 4 ? ? paper 1 paper 3
11
Unify forward and backward walks into a round trip for both importance & specificity 11 Random walk: Round trip: Target node: RoundTripRank: querytarget
12
Challenge 2 How do we generalize our unified model to accommodate flexible trade-offs? Based on the same principle of random walk in a round trip.
13
Further generalize RoundTripRank using hybrid random surfers of different goals 13 Goal: balance b/w importance and specificity …
14
Generalize the behaviors of hybrid random surfers for customizable trade-offs 14 … … … balance importance specificity Hybrid Surfers Adjusting Composition RoundTripRank+ SIGSPATIAL/GIS VLDB STDB mostly balanced mostly important mostly specific
15
Challenge 3 How do we efficiently compute the proximity measure?
16
Compute RoundTripRank by “divide & conquer” 16 “divide” “conquer”
17
Compute RoundTripRank by “divide & conquer” 17 “divide” “conquer”
18
Top-K ranking is more practical & scalable 18 Full ranking [over the entire graph] Top- K ranking [based on a neighborhood]
19
Branch-and-bound algorithm 19 Neighborhood expansion … Bounds
20
20 … …
21
Experiments
22
Experimental Setup 22 paper 2 author 1 author 2 venue term 1 term 2 paper 1 Bibliographic network (BibNet) phrase 1 phrase 2 phrase 3 URL 1 URL 2 Query log graph (QLog) Graphs Evaluation methodology Reserve nodes with known associations to query Remove those associations from the graph Can a proximity measure still rank those nodes highly? ? ? Hide-and-rediscover
23
Evaluation Tasks 23 more importance more specificity 1. Find authors of a paper 2. Find venues of a paper 3. Find URL of a phrase 4. Find equivalent phrase
24
Both importance & specificity are needed 24 + 8% ~ 10% Venues matching “spatio temporal data” importantspecificbalanced NDCG Quantitative evaluation (hide-and-rediscover) F-Rank/PPR dell dell com dell computers T-Rank dell c1295 battery for dell inspiron 8000 312 0068 RoundTripRank dell battery battery for dell inspiron 8000 dell Phrases similar to “dell notebook” importantspecificbalanced F-Rank/PPR dell dell com dell computers T-Rank dell c1295 battery for dell inspiron 8000 312 0068 RoundTripRank dell battery battery for dell inspiron 8000 dell
25
25 1. Find authors of a paper 4. Find equivalent phrase 0 1 1 2. Find venues of a paper 3. Find URL of a phrase
26
26 Comparison to non-customizable dual-sensed proximity + 6% ~ 7% NDCG
27
Our top-K method is efficient & scalable 27 Efficiency Scalability 300 ms x 7.4 x 1.9
28
28 Importance as “Reachability” Specificity as “Returnability” “Reachability” + “Returnability” a Round Trip
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.