Reverse Spatial and Textual k Nearest Neighbor Search.

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Reverse Spatial and Textual k Nearest Neighbor Search Jiaheng Lu Renmin University of China Sep Presentation in HP Labs China.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.
Introduction to Algorithms
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Winter Education Conference Consequential Validity Using Item- and Standard-Level Residuals to Inform Instruction.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Query optimisation.
13.1 Vis_2003 Data Visualization Lecture 13 Visualization of Very Large Datasets.
ZMQS ZMQS
Voronoi-based Geospatial Query Processing with MapReduce
1 Clustering of location- based data Mohammad Rezaei May 2013.
High-dimensional Similarity Join
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Learning to Question: Leveraging User Preferences for Shopping Advice
CPSC 322, Lecture 7Slide 1 Heuristic Search Computer Science cpsc322, Lecture 7 (Textbook Chpt 3.5) January, 19, 2009.
Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN Presented By: Ying Zhang Joint work with Muhammad Aamir Cheema, Xuemin Lin,
Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
ABC Technology Project
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Squares and Square Root WALK. Solve each problem REVIEW:
Computer Science and Engineering Diversified Spatial Keyword Search On Road Networks Chengyuan Zhang 1,Ying Zhang 2,1,Wenjie Zhang 1, Xuemin Lin 3,1, Muhammad.
1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Week 1.
10 -1 Chapter 10 Amortized Analysis A sequence of operations: OP 1, OP 2, … OP m OP i : several pops (from the stack) and one push (into the stack)
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
We will resume in: 25 Minutes.
Dantzig-Wolfe Decomposition
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
PSSA Preparation.
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
How Cells Obtain Energy from Food
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 30 Induction and Inductance In this chapter we will study the following topics: -Faraday’s law of induction -Lenz’s rule -Electric field induced.
Introduction Distance-based Adaptable Similarity Search
RollCaller: User-Friendly Indoor Navigation System Using Human-Item Spatial Relation Yi Guo, Lei Yang, Bowen Li, Tianci Liu, Yunhao Liu Hong Kong University.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.
Indexing Network Voronoi Diagrams*
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Spatial Queries Nearest Neighbor Queries.
An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)
Answering Similar Region Search Queries Chang Sheng, Yu Zheng.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Spatio-temporal Pattern Queries
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Reverse Spatial and Textual k Nearest Neighbor Search

Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 1

If add a new shop at Q, which shops will be influenced? Influence facts –Spatial Distance Results: D, F –Textual Similarity Services/Products... Results: F, C Motivation food clothes sports food clothes 2

Problems of finding Influential Sets Traditional query Reverse k nearest neighbor query (RkNN) Our new query Reverse spatial and textual k nearest neighbor query (RSTkNN) 3

Problem Statement Spatial-Textual Similarity describe the similarity between such objects based on both spatial proximity and textual similarity. Spatial-Textual Similarity Function 4

Problem Statement (cont) RSTkNN query –is finding objects which have the query object as one of their k spatial-textual similar objects. 5

Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 6

Related Work Pre-computing the kNN for each object (Korn ect, SIGMOD2000, Yang ect, ICDE2001) (Hyper) Voronio cell/planes pruning strategy (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) 60-degree-pruning method (Stanoi ect, SIGMOD2000) Branch and Bound ( based on Lp-norm metric space ) (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) Pre-computing the kNN for each object (Korn ect, SIGMOD2000, Yang ect, ICDE2001) (Hyper) Voronio cell/planes pruning strategy (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) 60-degree-pruning method (Stanoi ect, SIGMOD2000) Branch and Bound ( based on Lp-norm metric space ) (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) 7 Challenging Features: Lose Euclidean geometric properties. High dimension in text space. k and α are different from query to query. Challenging Features: Lose Euclidean geometric properties. High dimension in text space. k and α are different from query to query.

Baseline method Precompute Spatial NNs Textual NNs Threshold Algorithm Spatial-textual kNN o q is no more similar than o Object o q is more similar than o Give query q, k & α Inefficient since lacking a novel data structure For each object o in the database 8

Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 9

Intersection and Union R-tree (IUR-tree) 10

Main idea of Search Strategy Prune an entry E in IUR-Tree, when query q is no more similar than kNN L (E). Report an entry E to be results, when query q is more similar than kNN U (E). 11

How to Compute the Bounds Similarity approximations MinST(E, E): TightMinST(E, E): MaxST(E, E): 12

Example for Computing Bounds Current traveled entries: N1, N2, N3 Given k=2, to compute kNN L (N1) and kNN U (N1). TightMinST(N1, N3) = MinST(N1, N3) = TightMinST(N1, N2) = MinST(N1, N2) = N1N3 effect N1N2 Compute kNN L (N1) decrease kNN L (N1) = Compute kNN U (N1) decrease kNN U (N1) = MaxST(N1, N3) = MaxST(N1, N2) =

Overview of Search Algorithm RSTkNN Algorithm: –Travel from the IUR-tree root –Progressively update lower and upper bounds –Apply search strategy: prune unrelated entries to Pruned; report entries to be results Ans; add candidate objects to Cnd. –FinalVerification For objects in Cnd, check whether to results or not by updating the bounds for candidates using expanding entries in Pruned. 14

N4 N1 p1 N2 p2 p3 N3 p4 p5 EnQueue(U, N4); Initialize N4.CLs; Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U N4, (0, 0) 15

Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U N4(0, 0) DeQueue(U, N4) Mutual-effect N1 N2 N1 N3 N2 N3 N4 N1 p1 N2 p2 p3 N3 p4 p5 EnQueue(U, N2) EnQueue(U, N3) Pruned.add(N1) Pruned N1(0.37, 0.432) N3(0.323, )N2(0.21, ) 16

Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U DeQueue(U, N3) Mutual-effect p4 N2 p5 p4,N2 Answer.add(p4) Candidate.add(p5) Pruned N1(0.37, 0.432) N3(0.323, )N2(0.21, ) Answer Candidate p4(0.21, ) p5(0.374, 0.374) N4 N1 p1 N2 p2 p3 N3 p4 p5 17

Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U DeQueue(U, N2) Mutual-effect p2 p4,p5 p3 p2,p4,p5 Answer.add(p2, p3) Pruned.add(p5) Pruned N1(0.37, 0.432) N2(0.21, ) Answer Candidate p4 p5(0.374, 0.374) N4 N1 p1 N2 p2 p3 N3 p4 p5 p2p3 So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. 18

Cluster IUR-tree: CIUR-tree IUR-tree: Texts in an index node could be very different. CIUR-tree: An enhanced IUR-tree by incorporating textual clusters. 19

Optimizations Motivation –To give a tighter bound during CIUR-tree traversal –To purify the textual description in the index node Outlier Detection and Extraction (ODE-CIUR) –Extract subtrees with outlier clusters –Take the outliers into special account and calculate their bounds separately. Text-entropy based optimization (TE-CIUR) –Define TextEntropy to depict the distribution of text clusters in an entry of CIUR-tree –Travel first for the entries with higher TextEntropy, i.e. more diverse in texts. 20

Experimental Study Experimental Setup –OS: Windows XP;CPU: 2.0GHz; Memory: 4GB –Page size: 4KB;Language: C/C++. Compared Methods –baseline, IUR-tree, ODE-CIUR, TE-CIUR, and ODE-TE. Datasets –ShopBranches(Shop), extended from a small real data –GeographicNames(GN), real data –CaliforniaDBpedia(CD), generated combining location in California and documents from DBpedia. Metric –Total query time –Page access number StatisticsShopCDGN Total # of objects304,0081,555,2091,868,821 Total unique words in dataset393321,578222,409 Average # words per object

Scalability (1) Log-scale version (2) Linear-scale version 22

Effect of k (a) Query time(b) Page access 23

Conclusion Propose a new query problem RSTkNN. Present a hybrid index IUR-Tree. Present the efficient search algorithm to answer the queries. Show the enhancing variant CIUR-Tree and two optimizations ODE-CIUR and TE-CIUR to further improve search processing. Extensive experiments confirm the efficiency and scalability of our algorithms. 24

Reverse Spatial and Textual k Nearest Neighbor Search Thanks! Q & A

A straightforward method 1.Compute RSkNN and RTkNN, respectively; 2.Combine both results of RSkNN and RTkNN get RSTkNN results. No sensible way for combination. (Infeasible)