Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Similar presentations


Presentation on theme: "Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington."— Presentation transcript:

1 Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

2 2 4128 images are generated for each hand shape. Total: 107,328 images. A Database of Hand Images

3 3 Efficiency of the Chamfer Distance Computing chamfer distances is slow. –For images with d edge pixels, O(d log d) time. –Comparing input to entire database takes over 4 minutes. Must measure 107,328 distances. input model

4 4 The Nearest Neighbor Problem database

5 5 The Nearest Neighbor Problem query Goal: –find the k nearest neighbors of query q. database

6 6 The Nearest Neighbor Problem Goal: –find the k nearest neighbors of query q. Brute force time is linear to: –n (size of database). –time it takes to measure a single distance. query database

7 7 Goal: –find the k nearest neighbors of query q. Brute force time is linear to: –n (size of database). –time it takes to measure a single distance. The Nearest Neighbor Problem query database

8 8 Examples of Expensive Measures DNA and protein sequences: Smith-Waterman. Dynamic gestures and time series: Dynamic Time Warping. Edge images: Chamfer distance, shape context distance. These measures are non-Euclidean, sometimes non-metric.

9 9 Embeddings database x1x1 x2x2 x3x3 xnxn

10 10 database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn RdRd Embeddings

11 11 database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn q query RdRd Embeddings

12 12 x1x1 x2x2 x3x3 x4x4 xnxn q database x1x1 x2x2 x3x3 xnxn embedding F q query RdRd Embeddings

13 13 x1x1 x2x2 x3x3 x4x4 xnxn RdRd q Measure distances between vectors (typically much faster). Caveat: the embedding must preserve similarity structure. Embeddings database x1x1 x2x2 x3x3 xnxn embedding F q query

14 14 Reference Object Embeddings original space X

15 15 Reference Object Embeddings original space X r r: reference object

16 16 Reference Object Embeddings original space X r: reference objectEmbedding: F(x) = D(x,r) D: distance measure in X. r

17 17 Reference Object Embeddings original space XReal line F r: reference objectEmbedding: F(x) = D(x,r) D: distance measure in X. r

18 18 original space XReal line F r: reference objectEmbedding: F(x) = D(x,r) D: distance measure in X. F(r) = D(r,r) = 0 r Reference Object Embeddings

19 19 original space XReal line F r: reference objectEmbedding: F(x) = D(x,r) D: distance measure in X. a F(r) = D(r,r) = 0 If a and b are similar, their distances to r are also similar (usually). b r Reference Object Embeddings

20 20 original space XReal line F r: reference objectEmbedding: F(x) = D(x,r) D: distance measure in X. a F(r) = D(r,r) = 0 If a and b are similar, their distances to r are also similar (usually). b r Reference Object Embeddings

21 21 F(x) = D(x, Lincoln) F(Sacramento)....= 1543 F(Las Vegas).....= 1232 F(Oklahoma City).= 437 F(Washington DC).= 1207 F(Jacksonville)..= 1344

22 22 F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

23 23 F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

24 24 Embedding Hand Images F(x) = (C(x, R 1 ), C(A, R 2 ), C(A, R 3 )) R1R1 R2R2 R3R3 image x x: hand image.C: chamfer distance.

25 25 Basic Questions R1R1 R2R2 R3R3 x: hand image.C: chamfer distance. How many prototypes? Which prototypes? What distance should we use to compare vectors? image x F(x) = (C(x, R 1 ), C(A, R 2 ), C(A, R 3 ))

26 26 Some Easy Answers. R1R1 R2R2 R3R3 x: hand image.C: chamfer distance. How many prototypes? Pick number manually. Which prototypes? Randomly chosen. What distance should we use to compare vectors? L 1, or Euclidean. image x F(x) = (C(x, R 1 ), C(A, R 2 ), C(A, R 3 ))

27 27 Filter-and-refine Retrieval Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches.

28 28 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor?

29 29 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor?

30 30 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor? How many exact distance computations do we need?

31 31 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor? How many exact distance computations do we need?

32 32 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor? How many exact distance computations do we need?

33 33 query Database (107,328 images) nearest neighbor Brute force retrieval time: 260 seconds. Results: Chamfer Distance on Hand Images

34 34 Brute Force Embeddings Accuracy100%95%100% # of distances80640186624650 Sec. per query1122.634 Speed-up factor1433.27 Query set: 710 real images of hands. Database: 80,640 synthetic images of hands. Results: Chamfer Distance on Hand Images

35 35 Ideal Embedding Behavior original space X FRdRd Notation: NN(q) is the nearest neighbor of q. For any q: if a = NN(q), we want F(a) = NN(F(q)). a q

36 36 A Quantitative Measure q a original space X FRdRd b If b is not the nearest neighbor of q, F(q) should be closer to F(NN(q)) than to F(b). For how many triples (q, NN(q), b) does F fail?

37 37 A Quantitative Measure original space X FRdRd q a F fails on five triples.

38 38 Embeddings Seen As Classifiers q a b Classification task: is q closer to a or to b?

39 39 Any embedding F defines a classifier F’(q, a, b). F’ checks if F(q) is closer to F(a) or to F(b). q a b Embeddings Seen As Classifiers Classification task: is q closer to a or to b?

40 40 Given embedding F: X  R d : F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.” q a b Classifier Definition Classification task: is q closer to a or to b?

41 41 Given embedding F: X  R d : F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.” Classifier Definition Goal: build an F such that F’ has low error rate on triples of type (q, NN(q), b).

42 42 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

43 43 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate). We can define lots of different classifiers. Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier?

44 44 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate). We can define lots of different classifiers. Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier? Answer: use AdaBoost. AdaBoost is a machine learning method designed for exactly this problem.

45 45 Using AdaBoost original space X FnFn F2F2 F1F1 Real line Output: H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d. AdaBoost chooses 1D embeddings and weighs them. Goal: achieve low classification error. AdaBoost trains on triples chosen from the database.

46 46 From Classifier to Embedding AdaBoost output H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d What embedding should we use? What distance measure should we use?

47 47 From Classifier to Embedding AdaBoost output BoostMap embedding H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d F(x) = (F 1 (x), …, F d (x)).

48 48 From Classifier to Embedding AdaBoost output BoostMap embedding Distance measure D((u 1, …, u d ), (v 1, …, v d )) =  i=1 w i |u i – v i | d F(x) = (F 1 (x), …, F d (x)). H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d

49 49 From Classifier to Embedding AdaBoost output D((u 1, …, u d ), (v 1, …, v d )) =  i=1 w i |u i – v i | d F(x) = (F 1 (x), …, F d (x)). BoostMap embedding Distance measure Claim: Let q be closer to a than to b. H misclassifies triple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a. H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d

50 50 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

51 51 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

52 52 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

53 53 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

54 54 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

55 55 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

56 56 Significance of Proof AdaBoost optimizes a direct measure of embedding quality. We have converted a database indexing problem into a machine learning problem.

57 57 query Database (80,640 images) nearest neighbor Brute force retrieval time: 112 seconds. Results: Chamfer Distance on Hand Images

58 58 Query set: 710 real images of hands. Database: 80,640 synthetic images of hands. Results: Chamfer Distance on Hand Images Brute Force Random Reference Objects BoostMap Accuracy100%95% # of distances806401866450 Sec. per query1122.60.63 Speed-up factor143179

59 59 Query set: 710 real images of hands. Database: 80,640 synthetic images of hands. Results: Chamfer Distance on Hand Images Brute Force Random Reference Objects BoostMap Accuracy100% # of distances80640249505995 Sec. per query1123413.5 Speed-up factor13.238.3


Download ppt "Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington."

Similar presentations


Ads by Google