1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.

Slides:



Advertisements
Similar presentations
Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.
Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Quality Aware Privacy Protection for Location-based Services Zhen Xiao, Xiaofeng Meng Renmin University of China Jianliang Xu Hong Kong Baptist University.
Dynamic Graph Algorithms - I
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
Greed is good. (Some of the time)
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
Chen, Lu Mentor: Zhou, Jian Shanghai University, School of Management Chen213.shu.edu.cn.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
The Greedy Approach Chapter 8. The Greedy Approach It’s a design technique for solving optimization problems Based on finding optimal local solutions.
Chapter 4 The Greedy Approach. Minimum Spanning Tree A tree is an acyclic, connected, undirected graph. A spanning tree for a given graph G=, where E.
Hop Doubling Label Indexing for Point-to-Point Distance Querying on Scale-Free Networks Minhao Jiang 1, Ada Wai-Chee Fu 2, Raymond Chi-Wing Wong 1, Yanyan.
CSCI 3160 Design and Analysis of Algorithms Tutorial 2 Chengyu Lin.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Graph & BFS.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Dijkstra’s Algorithm Slide Courtesy: Uwash, UT 1.
The Design and Analysis of Algorithms
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
CS112A1 Spring 2008 Practice Final. ASYMPTOTIC NOTATION: a)Show that log(n) and ln(n) are the same in terms of Big-Theta notation b)Show that log(n+1)
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Design and Analysis of Algorithms CSC201 Shahid Hussain 1.
Keyword Search on External Memory Data Graphs Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan PVLDB 2008 Reported by: Yiqi Lu.
VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.
Genetic Algorithm for Multicast in WDM Networks Der-Rong Din.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Spring 2015 Lecture 11: Minimum Spanning Trees
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Representing and Using Graphs
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 6 Shortest Path Algorithms.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
CS Data Structures II Review & Final Exam. 2 Topics Review Final Exam.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Efficient Processing of Top-k Spatial Preference Queries
Group 8: Denial Hess, Yun Zhang Project presentation.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Zaiben Chen et al. Presented by Lian Liu. You’re traveling from s to t. Which gas station would you choose?
Main Index Contents 11 Main Index Contents Graph Categories Graph Categories Example of Digraph Example of Digraph Connectedness of Digraph Connectedness.
Da Yan, Raymond Chi-Wing Wong, and Wilfred Ng The Hong Kong University of Science and Technology.
Certifying Algorithms [MNS11]R.M. McConnell, K. Mehlhorn, S. Näher, P. Schweitzer. Certifying algorithms. Computer Science Review, 5(2), , 2011.
Algorithms for Finding Distance-Edge-Colorings of Graphs
The Design and Analysis of Algorithms
Labeling Schemes with Forbidden-Sets
Probabilistic Data Management
CS 3343: Analysis of Algorithms
CS120 Graphs.
Graph & BFS.
Greedy Algorithms / Dijkstra’s Algorithm Yin Tat Lee
Finding Fastest Paths on A Road Network with Speed Patterns
Fast Nearest Neighbor Search on Road Networks
Lecture 6 Shortest Path Problem.
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Diversified Top-k Subgraph Querying in a Large Graph
All pairs shortest path problem
Chapter 16 1 – Graphs Graph Categories Strong Components
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and Technology, ‡ The Chinese University of Hong Kong Prepared by Minhao Jiang Presented by Minhao Jiang

2 Motivation Social network : In DBLP, who are the researchers that study “database” and are closely related to my supervisor? Road network: In Melbourne, where are the nearest cinemas from my hotel showing “3D” movies?

3 Problem Given a weighted undirected graph G(V, E), where each vertex contains a set of keywords, k-Nearest Keyword Search: k-NK(q,w,k) -- what are the k nearest vertices from vertex q that contain keyword w? e.g. k-NK(v2, w0, 3) = {v2, v0, v6} an undirected graph with unit weighted edges

4 Outline 1. Existing Algorithms 2. Our Algorithm 3. Experiments 4. Conclusion

5 1.1 Naive Algorithm Dijkstra-like search : too slow k-NK(v2, w0, 3) = {v2, v0, v6} Optimal Solution

6 1.2 Existing Index-based Algorithms All existing index-based algorithms : efficient, but cannot return the optimal solution. 1. PMI algorithm (WWW’ 12) creates the following index. k-NK PMI (v2, w0, 3)={v2, v1, v0}, which is not correct. 2. pivot algorithm (VLDB’ 13) creates the following index. k-NK pivot (v2, w0, 3)={v2, v6, v1}, which is not correct. k-NK(v2, w0, 3) = {v2, v0, v6} Optimal Solution

7 2. Our Algorithm two-hop labeling index (state-of-the-art distance querying technique [SODA 02, VLDB 13,14, SIGMOD 12,13]) + keyword-aware index (proposed in this paper) the first index-based exact algorithm which is also efficient

8 2.1 Background Knowledge: Two-hop Labeling Index 1. a label set L(v) : {(v  x1, d1), (v  x2, d2), (v  x3, d3)… } 2. any dist(u,v) = min (d1 + d2), where (v  x, d1) ∈ L(v) and (u  x, d2) ∈ L(u) e.g. L(v1) = {(v1  v0, 1), (v1  v1, 0)}, L(v6) = {(v6  v0, 2), (v6  v2, 1), (v6  v6, 0)} dist(v1,v6) = (by a linear scan on each of L(v1) and L(v6))

9 2.2 Forward Search(FS) Component Step 1: For each vertex vi containing the query keyword w, we find dist(q, vi) Step 2: Maintain k nearest vi to q Efficient when w is infrequent

Forward Backward Search(FBS) Component (q  xi, di) ∈ L(q)  (xi  q, di) ∈ LB(xi) Step 1: scan (q  xi, di) in L(q) Step 2: for each xi (a). scan (xi  yij, dij) in LB(xi) (b). find k shortest (xi  yij, dij) such that yij contains w (c). maintain the best-known answers Efficient when w is frequent by KT index priority queue  

KT index Step 2(b): for each xi, find k shortest (xi  yij, dij) in LB(xi) such that yij contains w. Naive method: O(|LB(xi)|) : a linear scan KT index: O( klog(|LB(xi)|/k) ) : (1). sort (xi  yij, dij) by dij, and build a binary tree forest (2). index the keywords of all yij components in all entries in LB(xi) by the hash value (stored in each tree node) e.g. when LB(xi) = {(xi  y0, d0), …, (xi  y12, d12)} by KT index

FS-FBS Algorithm Combine FS and FBS: 1. If the query keyword is frequent, - use the FBS method. 2. If the query keyword is not frequent, - use the FS method.

Extension 1.Disk-based setting 2.Multiple keyword query 3.Dynamic update

14 3. Experiments Datasets: millions of vertices

Querying Efficiency PMI: WWW’12 index-based algorithm pivot-gs: VLDB’13 index-based algorithm FS-FBS: our exact algorithm Dijkstra: naive exact algorithm HR(hit rate): % of reported vertices that are in the optimal solution. S-ρ(spearman’s rho): correlation between the reported ranking and the optimal ranking. Existing index-based algorithms are inaccurate Our exact algorithm is as efficient as existing index-based algorithms value = 1.00  Output is the optimal solution

Indexing Cost Index Size: comparable with existing index-based algorithms Indexing Time: acceptable

17 4. Conclusion 1.Our method can handle k-NK queries in large networks. 2.We propose the first index-based algorithm returning the optimal solution. 3.Our method is as efficient as the best-known index-based algorithms (returning non- optimal answers).

18 END

Forward Backward Search(FBS) Algorithm How to obtain k shortest (xi  yij, dij) in LB(xi) such that yij contains w ? 1.Sort (xi  yij, dij) by non-ascending dij in each LB(xi)  k shortest (xi  yij, dij) with yij containing w are at the end of LB(x) 2. Hierarchy: e.g. when LB(x) = {(x  y0, d0), …, (x  y12, d12)} Project keyword to hash value : e.g. h(w) = h[8..11] = h(w1) bitwiseOR h(w2) bitwiseOR h(w3)…… where wi is in y8, y9, y10 or y11, if h[8..11] bitwiseAND h(w) = 0, w is not contained in y8, y9, y10 and y1, we check h[0..7], otherwise, we check h[10..11]

Forward Backward Search(FBS) Algorithm How to obtain k shortest (xi  yij, dij) in LB(xi) such that yij contains w ? 1.Sort (xi  yij, dij) by non-ascending dij 2.Hierarchy 3.Store hierarchy in array: e.g. [8..11] is in a[19]  compact storage without loss of efficiency in searching One FBS time complexity : where |L| is the size of the 2-hop index, |doc(V)| is the total number of keywords in the graph

Adapt to Disk-based Setting 1.Keyword w related backward index for each w : LB  Lw 2.Partition each Lw into high index and low index e.g. when w is contained in v1, v3 and v4

Adapt to Multiple Keywords Query 1.Trivial in FS 2.Same hierarchy in FBS 3. Modify recursive search by Disjunctive/OR: if h[8..11] bitwiseAND (h(w1) bitwiseOR h(w2) …) = 0, w is not contained in y8, y9, y10 and y1, we check h[0..7], otherwise, we check h[10..11] Conjunctive/AND: if h[8..11] bitwiseAND (h(w1) bitwiseOR h(w2) …) < h[8..11], w is not contained in y8, y9, y10 and y1, we check h[0..7], otherwise, we check h[10..11]

Adapt to Dynamic Update 1.Keyword Update Trivial in FS 2.Keyword Update hierarchy in FBS: When keyword w is inserted into / removed from vertex v, each LB(u) that contains (u  v, d) should update its hierarchy by reconstructing the hash value from root to v 3.Structure Update: 3.1 Update 2-hop by existing algorithms 3.2 Update keyword-related information accordingly