Download presentation
Presentation is loading. Please wait.
Published byMia miah Maulding Modified over 10 years ago
1
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 1 -ISA: AN INCREMENTAL LOWER BOUND APPROACH FOR EFFICIENTLY FINDING APPROXIMATE NEAREST NEIGHBOR OF COMPLEX VAGUE QUERIES DANG Tran Khanh, KÜNG Josef, WAGNER Roland Institute for Applied Knowledge Processing (FAW) Johannes Kepler University of Linz Austria
2
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 2 OUTLINE Complex Vague Queries in the Vague Query System (VQS) Similarity search problem of the VQS in the conventional DBMSs Incremental hyper-Sphere Approach (ISA) Overcome shortcomings of Incremental hyper-Cube Approach (ICA) -ISA: Finding Approximate Nearest Neighbors of Complex Vague Queries The issue of the dimensionality curse The issue of increasing the query condition number Experimental Results Conclusions
3
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 3 COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM The VQS: Introduced by Kueng and Palkoska 1997 Support similarity search capabilities in the conventional DBMSs: return to users records semantically close to a given query One of the VQSs basic ideas: NCR-Tables (Numeric-Coordinate-Representation-Tables): keep numeric semantic information of non-numeric attributes
4
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 4 NCR-Tables – an example fuzzy field NCR-keyNCR - columns NCR-table COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM SELECT FROM Car WHERE Col IS dark blue INTO myResultTable;
5
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 5 Complex Vague Queries in VQS: A simplified view of the problem NCR-Table 1NCR-Table n … Index 1 … Index n Value_nk…Value_1k... ………… Value_n1…Value_11... Attribute n…Attribute 1...Query relation Vague query processing module COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM
6
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 6 The issue of the dimensionality curse [Weber et al 1998; Beyer et al 1999] NCR-Tables with high-dimensional data: The probability of overlaps between a query and data regions is very high, and thus the performance of multidimensional access methods (MAMs) is decreased significantly A linear scan over the whole data set would perform better than MAMs Approximate nearest neighbor problem: dist(Q, P) (1+ )dist(Q, P)(1) Almost for single data sets: single–feature nearest neighbor (S-FNN) queries [Arya et al 1998, Kleinberg 1997, Amato et al 2000, Ciaccia and Patella 2000, etc.] COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM
7
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 7 Solving Complex Vague Queries in VQS: Random access [Fagin 1996] is impossible …… y1x2 y2x1 y1x1 Attr2Attr1 Query relation …… …y2 …y1 [Values]Domain1Attr1 …… …x2 …x1 [Values]Domain1Attr1 COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM
8
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 8 Incremental hyper-Cube Approach (ICA) [Kueng and Palkoska 1999] Issues with the ICA: see [Dang et al 2002a, Dang et al 2002b] for the details How to determine the initial hyper-cubes ? How to extend the hyper-cubes in necessary case Accessing unnecessary disk pages and objects Repeated disk accesses Only best match record is returned (not top-k records) COMPLEX VAGUE QUERIES IN THE VAGUE QUERY SYSTEM
9
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 9 INCREMENTAL HYPER-SPHERE APPROACH (ISA) Input: A query relation/view S A complex vague query Q with n query conditions q i (i=1, 2… n) Assume each feature space (or NCR-Table) related to Q is managed by a multidimensional index structure F i Output: Best match record/tuple T min for Q, T min S. Ties are arbitrarily broken. Step 1: Search on each F i for the corresponding q i using the adapted incremental algorithm for hyper-sphere range queries. Step 2: Combine the searching results from all q i to find at least an appropriate record in S, which contains the returned NCR-Values with respect to each query condition. If there is no appropriate record found then go back to step 1. Step 3: Compute total distances/scores for the found records using formula 2 below and find a record T min with the minimum total distance TD cur. Ties are arbitrarily broken.
10
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 10 INCREMENTAL HYPER-SPHERE APPROACH (ISA)
11
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 11 INCREMENTAL HYPER-SPHERE APPROACH (ISA) Step 4: Compute the maximum searching radius for each q i with respect to TD cur using formula 3 below and continue doing the search as steps 1, 2 and 3 until one of two following conditions holds: (a) the current searching radius of each q i is greater than or equal to its maximum searching radius; (b) found a new appropriate record T new with the total distance TD new <TD cur Step 5: If condition (a) holds then return T min as the best match for Q. Otherwise, i.e. condition (b) holds, replace T min with T new, i.e. TD cur is also replaced with a smaller value TD new, and go back to step 4
12
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 12 INCREMENTAL HYPER-SPHERE APPROACH (ISA) Modifying ISA to retrieve top-k records: see [Dang et al 2002b] High-dimensional feature spaces and/or Query condition number increases ISA performance is decreased
13
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 13 -ISA: FINDING APPROXIMATE NEAREST NEIGHBORS OF COMPLEX VAGUE QUERIES CVQ = M-FNN (Multi-Feature Nearest Neighbor) query Using lower bound total distance (LBTD)
14
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 14 -ISA: FINDING APPROXIMATE NEAREST NEIGHBORS OF COMPLEX VAGUE QUERIES Input: A query relation/view S A complex vague query Q with n query conditions q i (i=1, 2… n) Assume each feature space (or NCR-Table) related to Q is managed by a multidimensional index structure F i A real >0 used as a tolerant error Output: (1+ )-approximate NN record/tuple T app for Q, T app S. Ties are arbitrarily broken. Step 1: Search on each F i for the corresponding q i using the adapted incremental algorithm for hyper-sphere range queries. Step 2: Combine the searching results from all q i to find at least an appropriate record in S, which contains the returned NCR-Values with respect to each query condition. If there is no appropriate record found then go back to step 1. Step 3: Compute total distances/scores for the found records using formula 2 and find a record T app with the minimum total distance TD cur. Ties are arbitrarily broken.
15
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 15 -ISA: FINDING APPROXIMATE NEAREST NEIGHBORS OF COMPLEX VAGUE QUERIES Step 4: Let d i be distance from query condition q i to the last NCR-Value returned in the corresponding feature space, which is being managed by F i. Compute LBTD as follows: LBTD = min {TD cur, d i }, i=1,2…n(5) Step 5: If TD cur <= (1+ )LBTD, return T app as a (1+ )-approximate NN record for Q. Otherwise, go to step 6 Step 6: Compute the maximum searching radius for each q i with respect to TD cur using formula 3 and continue doing the search as steps from 1 to 5 until the algorithm is stopped at step 5. If the current searching radius of a certain q i is greater than or equal to its maximum searching radius then searching on F i is stopped See next slice
16
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 16 -ISA: FINDING APPROXIMATE NEAREST NEIGHBORS OF COMPLEX VAGUE QUERIES Lower Bound Total Distance - An example AB CD QRAttr1Attr2 AB Cq2 q1D
17
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 17 -ISA: FINDING APPROXIMATE NEAREST NEIGHBORS OF COMPLEX VAGUE QUERIES Approximate k-nearest neighbors See our paper for more details
18
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 18 EXPERIMENTAL RESULTS Data sets: Uniformly distributed: 2, 4, and 8 dimensions (100K objects for each of them) Real: 9 and 16 dimensions (more than 64K feature vectors of images, URL: http://kdd.ics.uci.edu/) Using the SH-tree [Dang et al 2001a] to manage multidimensional data Page size: 8KB 100 query points were randomly selected from each corresponding data set...
19
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 19 EXPERIMENTAL RESULTS 2-condition (4-d and 8-d) NN queries, different values
20
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 20 EXPERIMENTAL RESULTS 2-condition (4-d) k-NN queries, = 0.2
21
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 21 EXPERIMENTAL RESULTS 3-condition (2-d) NN queries, different values 2-condition NN queries (9-d and 16-d real data sets), =1 =1 means tolerant error is permitted up to 100% -ISA saved about 4.5 % and 1% of the affected object and disk access number, individually, for 16-d data set while it remained the accuracy at 71% One notable fact here is that the effective epsilon calculated as introduced in (Arya et al. 1998) is quite low, only 0.23. This is a very promising result.
22
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 22 CONCLUSIONS -ISA: An Incremental Lower Bound Approach for Efficiently Finding Approximate Nearest Neighbor of Multi-Feature Queries in VQS -ISA is one of the vanguard solutions to dealing with this problem -ISA is very useful for application domains that the returned results need not to be exact but similar or approximate similar (with a certain tolerant error) to a given query. The experimental results have proven this. With a suitable value, the -ISA can save a very high percentage of the costs including both IO-cost and CPU-cost while it still preserves the accuracy of the returned results at a particularly very high value -ISA is applicable to not only numeric domains such as NCR- tables, but also any ranked input Application areas: TIS (tourist information systems), GIS, digital libraries, multimedia systems, etc.
23
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 23 More information URL: http://www.faw.uni-linz.ac.at/ E-mail: {khanh, jkueng, rwagner}@faw.uni-linz.ac.at
24
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 24 Research related to dealing with complex vague queries The A 0 algorithm [Fagin 1996] (There are some improvements of Fagins algorithm, see the paper for more details): Finding top-k matches for a user query involving several multimedia attributes Problem: this algorithm assumes that random access is possible in the system. This assumption is correct only three following conditions hold: 1.there is at least a key for each subsystem, 2.there is a mapping between the keys, 3.and we must ensure that the mapping is one-to-one In VQS: condition (1) is always satisfied (each fuzzy field are the key for the corresponding NCR-table), but there is no the mapping one-to-one between the fuzzy fields Cannot be applied to our problem
25
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 25 Other approaches for multimedia databases: [Ortega et al 1997, Chaudhuri et al 1996, Boehm K. et al 2001] (see our paper) Chaudhuri et al. 1999 introduced a solution to translate a top-k multi-feature query to a range query that the conventional DBMS can process. This approach employs information in the histograms kept by a relational system … Research related to dealing with complex vague queries (cont.)
26
Hagenberg -Linz -Prague- Vienna iiWAS 2002, 10-12 September, Bandung, Indonesia, Page 26 ISA and J* algorithm The ISAThe J* algorithm The input is ranked with support of the incremental algorithm adapted for range queries Assume that the ranked input is available, do not show how to deal with it Reduce the database access cost first; this cost and the processed states are reduced by taking into account the hyper-sphere range queries and computing the maximum searching radii Reduce the processed states first, the database access cost is alleviated by iterative deepening technique (S. Russell and P. Norvig: Artificial Inteligence: A Modern Approach. Prentice Hall, Inc., 1995) Derived from the ICA that had been introduced earlier and had the same overall goals as the J* alg. Claimed to be the first alg. that can process joins of ranked input and multi-level joins
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.