Download presentation
Presentation is loading. Please wait.
Published byYvonne Wragg Modified over 9 years ago
1
Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung
2
Metric v.s. Non-Metric Euclidean distance dominates DB queries Similarity in human perception Metric distance is not enough! 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 2
3
Outline Bregman Divergence Solution Basic solution Better pruning bounds Query distribution Experiments Conclusion 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 3
4
Bregman Divergence 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 4 qp Euclidean dist. convex function f(x) Bregman divergence D f (p,q) (q,f(q)) (p,f(p)) h
5
Bregman Divergence Mathematical Interpretation The distance between p and q is defined as the difference between f(p) and the first order Taylor expansion at q 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 5 original f(x)first order Taylor expansion of f(x) at q
6
Bregman Divergence General Properties Uniqueness A function f(x) uniquely decides the D f (p,q) Non-Negativity D f (p,q)≥0 for any p, q Identity D f (p,p)=0 for any p Symmetry and Triangle Inequality Do NOT hold any more 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 6
7
Examples 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 7 Distancef(x)Df(p,q)Df(p,q)Usage KL-Divergencex logxp log (p/q)distribution, color histogram Itakura-Saito Distance -logxp/q-log (p/q)-1signal, speech Squared Euclidean x2x2 (p-q)2(p-q)2 traditional queries Von-Nuemann Entropy tr(X log X – X)tr(X logX – X logY – X + Y) symmetric matrix
8
Why in DB system? Database application Retrieval of similar images, speech signals, or time series Optimization on matrices in machine learning Efficiency is important! Query Types Nearest Neighbor Query Range Query 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 8
9
Euclidean Space How to answer the queries R-Tree 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 9
10
Euclidean Space How to answer the queries VA File 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 10
11
Our goal Re-use the infrastructure of existing DB system to support Bregman divergence Storage management Indexing structures Query processing algorithms 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 11
12
Outline Bregman Divergence Solution Basic solution Better pruning bounds Query distribution Experiments Conclusion 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 12
13
Basic Solution Extended Space Convex function f(x) = x 2 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 13 pointA1A1 A2A2 p01 q0.5 r10.8 t1.50.3 pointA1A1 A2A2 A3A3 p+p+ 011 q+q+ 0.5 r+r+ 10.81.64 t+t+ 1.50.33.15
14
Basic Solution After the extension Index extended points with R-Tree or VA File Re-use existing algorithms with new lower and upper bound computation 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 14
15
How to improve? Reformulation of Bregman divergence Tighter bounds are derived No change on index construction or query processing algorithm 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 15
16
A New Formulation 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 16 qp D f (p,q)+Δ query vector v q D* f (p,q) h h’
17
Math. Interpretation Reformulation of similarity search queries k-NN query: query q, data set P, divergence D f Find the point p, minimizing Range query: query q, threshold θ, data set P Return any point p that 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 17
18
Naïve Bounds Check the corners of the bounding rectangles 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 18
19
Tighter Bounds Take the curve f(x) into consideration 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 19
20
Query distribution Distortion of rectangles The difference between maximum and minimum distances from inside the rectangle to the query 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 20
21
Can we improve it more? When Building R-Tree in Euclidean space Minimize the volume/edge length of MBRs Does it remain valid? 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 21
22
Query distribution Distortion of bounding rectangles Invariant in Euclidean space (triangle inequality) Query-dependent for Bregman Divergence 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 22
23
Utilize Query Distribution Summarize query distribution with O(d) real number Estimation on expected distortion on any bounding rectangle in O(d) time Allows better index to be constructed for both R-Tree and VA File 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 23
24
Outline Bregman Divergence Solution Basic solution Better pruning bounds Query distribution Experiments Conclusion 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 24
25
Experiments Data Sets KDD’99 data Network data, the proportion of packages in 72 different TCP/IP connection Types DBLP data Use co-authorship graph to generate the probabilities of the authors related to 8 different areas 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 25
26
Experiment Data Sets Uniform Synthetic data Generate synthetic data with uniform distribution Clustered Synthetic data Generate synthetic data with Gaussian Mixture Model 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 26
27
Experiments Methods to compare 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 27 BasicImproved Bounds Query Distribution R-TreeRR-BR-BQ VA FileVV-BV-BQ Linear ScanLS BB-TreeBBT
28
Experiments Index Construction Time 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 28
29
Experiments Varying dimensionality 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 29
30
Experiments Varying dimensionality (cont.) 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 30
31
Experiments Varying k for nearest neighbor query 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 31
32
Conclusion A general technique on similarity for Bregman Divergence All techniques are based on existing infrastructure of commercial database Extensive experiments to compare performances with R-Tree and VA File with different optimizations 2015-4-29 Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 32
33
Acknowledgment Zhenjie Zhang, Anthony K. H. Tung and Beng Chin Ooi were supported by Singapore NRF grant R- 252-000-376-279. Srinivasan Parthasarathy was supported by NSF IIS-0347662 (CAREER) and NSF CCF-0702587.
34
Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.