Presentation is loading. Please wait.

Presentation is loading. Please wait.

E.G.M. PetrakisSearching Signals and Patterns1  Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately  The ideal.

Similar presentations


Presentation on theme: "E.G.M. PetrakisSearching Signals and Patterns1  Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately  The ideal."— Presentation transcript:

1 E.G.M. PetrakisSearching Signals and Patterns1  Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately  The ideal method should be:  Fast: faster than sequential scanning  Correct: returns all qualifying object  Dynamic: allows for insertions, deletions, updates

2 E.G.M. PetrakisSearching Signals and Patterns2 Similarity Queries  Range queries: find all objects within distance e from the query  D(Q,I) < e, where D,e: user defined  Nearest Neighbor (NN): find the k most similar objects  All-pairs (“spatial join”) queries: find all pairs of objects O i,O j within distance e of each other D(O i,O j ) < e

3 E.G.M. PetrakisSearching Signals and Patterns3 Similarity queries (cont,d)  Whole matching: the whole query Q matches an object O i  the image is 512x512, the query is 512x512  Partial matching: the query specifies only a part of an object  find parts of objects that match the query  the images are 512x512, the query is 32x32

4 E.G.M. PetrakisSearching Signals and Patterns4 Object Types  1D signals:  time sequences  scientific data  digitized voice or music  2D signals:  digitized images (gray scale, color)  video clips  General objects:  text, multimedia documents

5 E.G.M. PetrakisSearching Signals and Patterns5 Applications  In many applications searching for similar patterns helps in predictions, decision making, data mining etc.  Financial  Marketing & production of 1D signals  Scientific databases  DNA/genome databases  Audio databases  Image and Video databases

6 E.G.M. PetrakisSearching Signals and Patterns6 Queries  Find companies whose stock prices move similarly or with similar pattern of growth  Find products with similar selling patterns  Find if a musical score is similar to one of the copyrighted scores  Find images that look like a sunset  Find X-rays showing lung tumor

7 E.G.M. PetrakisSearching Signals and Patterns7 Indexing [Agrawal et.al 93]  To achieve faster than sequential scanning the objects are indexed  Extract f features from each object and apply a SAM to index this object  Search the SAM to retrieve promising objects  Clean-up the response  The indexing method must be correct (i.e., has no “misses”), have small space overhead and be dynamic

8 E.G.M. PetrakisSearching Signals and Patterns8  Objects are mapped to points  A query Q becomes a sphere with radius e Mapping Objects to Space

9 E.G.M. PetrakisSearching Signals and Patterns9 Mapping Objects to Points  F( ): mapping function  D f : object distance in feature space  D: object distance in actual space  Selection of F( ) and D f ?  Ideally, D f (Q i,O j ) = D(Q i,O j )  The mapping preserves the distances  The mapping should guarantee no misses

10 E.G.M. PetrakisSearching Signals and Patterns10 GEMINI [Faloutsos 96]  GEMINI: Generic Multimedia Indexing 1.Define F( ): mapping of objects to f features (objects become vectors) 2.Determine the distance function D f in the f space 3.Guarantee correctness: prove that D f < D 4.Apply a SAM (e.g., R-tree) to index the f- dimensional vectors 5.Apply the Search Algorithm to eliminate flase drops.

11 E.G.M. PetrakisSearching Signals and Patterns11 Search Algorithm  Problem: Retrieve all objects satisfying D(Q,O) < e  Retrieve points D f (Q i,O j ) < e  Retrieve the actual objects S  Keep only those satisfying D(Q,S) < e (discard false alarms)

12 E.G.M. PetrakisSearching Signals and Patterns12 Lower Bounding  Lemma: To guarantee no false dismissals F( ) should satisfy  D f (Q,O i ) <= D(Q,O i ) for all Q, O i  Proof: prove that if an object qualifies for the query, it will be retrieved in the feature space  D f (Q,O i ) <= e but since D f (Q,O i ) <= D(Q,O i ) we have that D(Q,O i ) <= e

13 E.G.M. PetrakisSearching Signals and Patterns13 Indexing 1D Signals  Find all signals S=(s 1,s 2,…S n ) within distance e from Q=(q 1,q 2,…q n )  D(Q,S) < e  s i, q i : amplitudes at time I  D is defined as  Apply GEMINI  But how F( ) and D f ( ) are defined?

14 E.G.M. PetrakisSearching Signals and Patterns14 Definition of F, D  DFT maps signals s=(s 1,s 2,…s n ) to the frequency spectrum S=(S 1,S 2,…S n )  F( ) takes first f c Fourier coefficients  f c : “cut-off” frequency (e.g., f c = 5)  Signals become points in an f = 2f c space (because the coefficients s are complex numbers)  D f is defined as

15 E.G.M. PetrakisSearching Signals and Patterns15 D f Lower Bounds D  Let S, Q be the DFTs of s, q  Parseval’s: the energy in the time and frequency domains is the same  This implies that and  D(Q,S) <= D (q,s) because D is computed using f c <= n fewer terms

16 E.G.M. PetrakisSearching Signals and Patterns16 Experiments  Faster than sequential for all set sizes  Slower but more accurate for more coefficients  The trade-of reaches an equilibrium for f=3 or 4

17 E.G.M. PetrakisSearching Signals and Patterns17 Intuition  For the majority of 1D signals there will be a few frequencies with high amplitudes  If we index only the first few f c (f c < 5 or 10) coefficients we shall have only a few false drops  R-trees can handle up to 20 dimensions for point data

18 E.G.M. PetrakisSearching Signals and Patterns18 NN Queries [Korn. et. al. 98]  Find the k-NN’s of query Q: 1.Search the SAM to the find the k-NN’s [e.g., Rous95] using D f 2.Compute D for all these k objects 3.Let E = max{D(q,s i )}, 1<= i <= k 4.Issue a range query D(q,s) <= E on the SAM and retrieve a new set of objects 5.Compute their actual distances D(q,s) 6.Output the nearest k objects

19 E.G.M. PetrakisSearching Signals and Patterns19 Correctness of NN Algorithm  Lemma: the algorithm has no misses  Proof: Let s k be the k-NN retrieved object and s l be the l-th NN object (l < k), prove D(q,s l ) < D(q,s k ) (then the l-th object is retrieved too !!)  If the algorithm did not retrieve s l then the range query (step 4) has missed it: D f (q,s l ) > E  From lower bounding: D(q,s l ) > D f (q,s l ) > E ®  However, D f (q,s k ) D(q,s k ) which contradicts ®

20 E.G.M. PetrakisSearching Signals and Patterns20 Partial Matching [Faloutsos94]  Problem: given N data sequences S 1,S 2,…S N and a query Q, locate data subsequences that match a query subsequence  locate stock prices with similar monthly patterns of growth  extract f features, apply a SAM etc.

21 E.G.M. PetrakisSearching Signals and Patterns21 Methodology  Locate matching window of length w on signal (length(S)–w+1 positions)  Assume minimum query length w  the method handles any query  shorter queries are of no interest  Longer queries are split into w- queries

22 E.G.M. PetrakisSearching Signals and Patterns22 Splitting a Query  Mapping sequences S=(s 1,s 2, s 3 ) and S’=(s’ 1,s’ 2 ) and query Q=(q 1,q 2 ) q1q1 q 2 s1s1 s2s2 s3s3 s’ 1 s’ 2 e e F2F2 F1F1

23 E.G.M. PetrakisSearching Signals and Patterns23 Indexing Subsequences  I-naive method: index all w-trails  Inefficient in terms of space and speed  1:f increase in storage, tall, slow R-tree  ST-index: index the w-trails in groups  Subsequent trails are similar  Grouping in the f-dimensional feature space  Index rectangles containing similar trails

24 E.G.M. PetrakisSearching Signals and Patterns24 Grouping of Subsequences  Organize w-trails in the f space in rectangles so that disk accesses are minimized  Fixed number of points per rectangle, but which is the optimal number?  Smaller rectangles, less disk accesses  a rectangle L=(l 1,l 2,…l n ) causes Π(l i +0.5) accesses  an m point rectangle causes Π(l i +0.5)/m accesses

25 E.G.M. PetrakisSearching Signals and Patterns25 I-Adaptive Algorithm  Map the points of w-trails in rectangles in the f space  Assign the first point of a w-trail to a rectangle  For each successive point, if it increases the cost of the rectangle start a new rectangle, else include it in the same rectangle

26 E.G.M. PetrakisSearching Signals and Patterns26 Naïve Method  Fixed number of points per rectangle

27 E.G.M. PetrakisSearching Signals and Patterns27 I-Adaptive Method  Variable number of points per rectangle  Smaller rectangles, less disk accesses

28 E.G.M. PetrakisSearching Signals and Patterns28 Range Queries [Petrakis 02]  Input : query Q, distances D,D f, tolerance e  Output : signals S satisfying D(Q,S) <= e 1.Decompose Q = (q 1,q 2,…,q n ) 2.Apply D f (q i,s j ) <= e, store results in A i 3.Compute 4.For each S in A compute D(Q,S) 5.Output sequences satisfying D(Q,S) <= e

29 E.G.M. PetrakisSearching Signals and Patterns29 NN Queries [Petrakis 02]  Input: query Q, distance D, D f,, number k  Output: the k sequences most similar to Q 1.Decompose Q = (q 1,q 2,…,q n ) 2.Apply a k-NN query for each q i  Retrieve k distinct w-trails (incremental k-NN search) [Hjaltason 99]  Compute e i their max distance from Q 3.Compute e = min{e i } 4.Apply a range query D(Q,S) <=e 5.Output the k sequences closest to Q

30 E.G.M. PetrakisSearching Signals and Patterns30 References  R. Agrawal, C. Faloutsos, A. Swani, “Efficient Similarity Search in Sequence Databases”, Proc. of FODO Conf, Oct. 1993Efficient Similarity Search in Sequence Databases  C. Faloutsos, M. Ranganathan, Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases”, Proc. of SIGMOD, May 1994Fast Subsequence Matching in Time-Series Databases  P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. Protopapas, “Fast and Effective Retrieval of Medical Tumor Shapes”, IEEE TKDE, Vol. 11, 1998Fast and Effective Retrieval of Medical Tumor Shapes  Euripides G.M. Petrakis: "Fast Retrieval by Spatial Structure in Image DataBases", Journal of Visual Languages and Computing, 2002 (to appear)Fast Retrieval by Spatial Structure in Image DataBases  N. Rousopoulos, S. Kelley, F. Vincent: “Nearest-Neighbor Queries”, Proc. ACM SIGMOD, May 1995Nearest-Neighbor Queries  G. R. Hjaltason and H. Samet: “Distance Browsing in Spatial Databases”, ACM Trans. on Inf.Syst., 24(2):265–318, 1999


Download ppt "E.G.M. PetrakisSearching Signals and Patterns1  Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately  The ideal."

Similar presentations


Ads by Google