1 Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Aggregating local image descriptors into compact codes
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Linear Classifiers (perceptrons)
Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung.
Data Mining Classification: Alternative Techniques
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB-Technical.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Principal Component Analysis
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Instance Based Learning
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Using Relevance Feedback in Multimedia Databases
J Cheng et al,. CVPR14 Hyunchul Yang( 양현철 )
Dimensionality Reduction
1 Embedding-Based Subsequence Matching in Large Sequence Databases Panagiotis Papapetrou Doctoral Dissertation Defense Committee: George Kollios Stan Sclaroff.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Reference-Based Indexing of Sequence Databases (VLDB ’ 06) Jayendra Venkateswaran Deepak Lachwani Tamer Kahveci Christopher Jermaine Presented by Angela.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CSE 185 Introduction to Computer Vision Face Recognition.
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
NTU & MSRA Ming-Feng Tsai
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Project Overview CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.
KNN & Naïve Bayes Hongning Wang
Nearest Neighbor Classifiers
Lecture 26 Hand Pose Estimation Using a Database of Hand Images
K Nearest Neighbor Classification
Hidden Markov Models Part 2: Algorithms
Dimensionality Reduction and Embeddings
Instance Based Learning
CS4670: Intro to Computer Vision
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
Physics-guided machine learning for milling stability:
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

1 Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University

2 Overview Background on similarity-based retrieval and embeddings. BoostMap. Embedding optimization using machine learning. Query-sensitive embeddings. Ability to preserve non-metric structure. Cascades of embeddings. Speeding up nearest neighbor classification.

3 database (n objects) x1x1 x2x2 x3x3 xnxn Problem Definition

4 database (n objects) x1x1 x2x2 x3x3 xnxn q Problem Definition Goals: find the k nearest neighbors of query q.

5 Goals: find the k nearest neighbors of query q. Brute force time is linear to: n (size of database). time it takes to measure a single distance. database (n objects) x1x1 x2x2 x3x3 xnxn q Problem Definition x2x2 xnxn

6 Goals: find the k nearest neighbors of query q. Brute force time is linear to: n (size of database). time it takes to measure a single distance. database (n objects) x1x1 x3x3 q Problem Definition x2x2 xnxn

7 Applications Nearest neighbor classification. Similarity-based retrieval. Image/video databases. Biological databases. Time series. Web pages. Browsing music or movie catalogs. faces letters/digits handshapes

8 Expensive Distance Measures Comparing d- dimensional vectors is efficient: O(d) time. x1x1 x2x2 x3x3 x4x4 … xdxd y1y1 y2y2 y3y3 y4y4 … ydyd

9 Expensive Distance Measures Comparing d- dimensional vectors is efficient: O(d) time. Comparing strings of length d with the edit distance is more expensive: O(d 2 ) time. Reason: alignment. x1x1 x2x2 x3x3 x4x4 … xdxd y1y1 y2y2 y3y3 y4y4 … ydyd i m m i g r a t i o n i m i t a t i o n

10 Expensive Distance Measures Comparing d- dimensional vectors is efficient: O(d) time. Comparing strings of length d with the edit distance is more expensive: O(d 2 ) time. Reason: alignment. x1x1 x2x2 x3x3 x4x4 … xdxd y1y1 y2y2 y3y3 y4y4 … ydyd i m m i g r a t i o n i m i t a t i o n

11 Matching Handwritten Digits

12 Matching Handwritten Digits

13 Matching Handwritten Digits

14 Shape Context Distance Proposed by Belongie et al. (2001). Error rate: 0.63%, with database of 20,000 images. Uses bipartite matching (cubic complexity!). 22 minutes/object, heavily optimized. Result preview: 5.2 seconds, 0.61% error rate.

15 More Examples DNA and protein sequences: Smith-Waterman. Time series: Dynamic Time Warping. Probability distributions: Kullback-Leibler Distance. These measures are non-Euclidean, sometimes non-metric.

16 Indexing Problem Vector indexing methods NOT applicable. PCA. R-trees, X-trees, SS-trees. VA-files. Locality Sensitive Hashing.

17 Metric Methods Pruning-based methods. VP-trees, MVP-trees, M-trees, Slim-trees,… Use triangle inequality for tree-based search. Filtering methods. AESA, LAESA… Use the triangle inequality to compute upper/lower bounds of distances. Suffer from curse of dimensionality. Heuristic in non-metric spaces. In many datasets, bad empirical performance.

18 Embeddings database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn RdRd

19 Embeddings database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn q query RdRd

20 Embeddings database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn q query q RdRd

21 Embeddings database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn RdRd q query q Measure distances between vectors (typically much faster).

22 Embeddings database x1x1 x2x2 x3x3 xnxn embedding F x1x1 x2x2 x3x3 x4x4 xnxn RdRd q query q Measure distances between vectors (typically much faster). Caveat: the embedding must preserve similarity structure.

23 Reference Object Embeddings database

24 Reference Object Embeddings database r1r1 r2r2 r3r3

25 Reference Object Embeddings database r1r1 r2r2 r3r3 x F(x) = (D(x, r 1 ), D(x, r 2 ), D(x, r 3 ))

26 F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

27 Existing Embedding Methods FastMap, MetricMap, SparseMap, Lipschitz embeddings. Use distances to reference objects (prototypes). Question: how do we directly optimize an embedding for nearest neighbor retrieval? FastMap & MetricMap assume Euclidean properties. SparseMap optimizes stress. Large stress may be inevitable when embedding non-metric spaces into a metric space. In practice often worse than random construction.

28 BoostMap BoostMap: A Method for Efficient Approximate Similarity Rankings. Athitsos, Alon, Sclaroff, and Kollios, CVPR BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios, PAMI 2007 (to appear).

29 Key Features of BoostMap Maximizes amount of nearest neighbor structure preserved by the embedding. Based on machine learning, not on geometric assumptions. Principled optimization, even in non-metric spaces. Can capture non-metric structure. Query-sensitive version of BoostMap. Better results in practice, in all datasets we have tried.

30 Ideal Embedding Behavior original space X FRdRd a q For any query q: we want F(NN(q)) = NN(F(q)).

31 Ideal Embedding Behavior original space X FRdRd a q For any query q: we want F(NN(q)) = NN(F(q)).

32 Ideal Embedding Behavior original space X FRdRd For any query q: we want F(NN(q)) = NN(F(q)). a q

33 Ideal Embedding Behavior original space X FRdRd a q For any query q: we want F(NN(q)) = NN(F(q)). For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b). b

34 Embeddings Seen As Classifiers q a b For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

35 Any embedding F defines a classifier F’(q, a, b). F’ checks if F(q) is closer to F(a) or to F(b). q a b Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

36 Given embedding F: X  R d : F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.” q a b Classifier Definition For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

37 Key Observation original space X FRdRd a q b If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)). If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.

38 Key Observation original space X FRdRd a q b Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.

39 Goal: construct an embedding F optimized for k-nearest neighbor retrieval. Method: maximize accuracy of F’ on triples (q, a, b) of the following type: q is any object. a is a k-nearest neighbor of q in the database. b is in database, but NOT a k-nearest neighbor of q. If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors. Optimization Criterion

40 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

41 Lincoln Chicago Detroit New York LA Cleveland DetroitNew York Chicago LA

42 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate). We can define lots of different classifiers. Every object in the database can be a reference object.

43 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate). We can define lots of different classifiers. Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier?

44 1D Embeddings as Weak Classifiers 1D embeddings define weak classifiers. Better than a random classifier (50% error rate). We can define lots of different classifiers. Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier? Answer: use AdaBoost. AdaBoost is a machine learning method designed for exactly this problem.

45 Using AdaBoost original space X FnFn F2F2 F1F1 Real line Output: H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d. AdaBoost chooses 1D embeddings and weighs them. Goal: achieve low classification error. AdaBoost trains on triples chosen from the database.

46 From Classifier to Embedding AdaBoost output H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d What embedding should we use? What distance measure should we use?

47 From Classifier to Embedding AdaBoost output H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d F(x) = (F 1 (x), …, F d (x)). BoostMap embedding

48 From Classifier to Embedding AdaBoost output H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d D((u 1, …, u d ), (v 1, …, v d )) =  i=1 w i |u i – v i | d F(x) = (F 1 (x), …, F d (x)). BoostMap embedding Distance measure

49 From Classifier to Embedding AdaBoost output H = w 1 F’ 1 + w 2 F’ 2 + … + w d F’ d D((u 1, …, u d ), (v 1, …, v d )) =  i=1 w i |u i – v i | d F(x) = (F 1 (x), …, F d (x)). BoostMap embedding Distance measure Claim: Let q be closer to a than to b. H misclassifies triple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

50 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

51 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

52 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

53 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

54 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

55 Proof H(q, a, b) = = w i F’ i (q, a, b) = w i (|F i (q) - F i (b)| - |F i (q) - F i (a)|) = (w i |F i (q) - F i (b)| - w i |F i (q) - F i (a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)  i=1 d d d

56 Significance of Proof AdaBoost optimizes a direct measure of embedding quality. We optimize an indexing structure for similarity-based retrieval using machine learning. Take advantage of training data.

57 How Do We Use It? Filter-and-refine retrieval: Offline step: compute embedding F of entire database.

58 How Do We Use It? Filter-and-refine retrieval: Offline step: compute embedding F of entire database. Given a query object q: Embedding step: Compute distances from query to reference objects  F(q).

59 How Do We Use It? Filter-and-refine retrieval: Offline step: compute embedding F of entire database. Given a query object q: Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space.

60 How Do We Use It? Filter-and-refine retrieval: Offline step: compute embedding F of entire database. Given a query object q: Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches.

61 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor?

62 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor?

63 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor? How many exact distance computations do we need?

64 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor? How many exact distance computations do we need?

65 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. How often do we find the true nearest neighbor? How many exact distance computations do we need?

66 Evaluating Embedding Quality Embedding step: Compute distances from query to reference objects  F(q). Filter step: Find top p matches of F(q) in vector space. Refine step: Measure exact distance from q to top p matches. What is the nearest neighbor classification error? How many exact distance computations do we need?

67 Chamfer distance: 112 seconds per query Results on Hand Dataset query Database (80,640 images) nearest neighbor

68 Query set: 710 real images of hands. Database: 80,640 synthetic images of hands. Results on Hand Dataset Brute Force Accuracy100% Distances80640 Seconds112 Speed-up1

69 Brute Force BMRLPFMVP Accuracy100%95% Distances Seconds Speed-up Results on Hand Dataset Query set: 710 real images of hands. Database: 80,640 synthetic images of hands.

70 MNIST: 60,000 database objects, 10,000 queries. Shape context (Belongie 2001): 0.63% error, 20,000 distances, 22 minutes. 0.54% error, 60,000 distances, 66 minutes. Results on MNIST Dataset

71 Results on MNIST Dataset Method Distances per query Seconds per query Error rate Brute force60,0003, % VP-trees21, % Condensing1, % VP-trees % BoostMap % Zhang % BoostMap % BoostMap* %

72 Query-Sensitive Embeddings Richer models. Capture non-metric structure. Better embedding quality. References: Athitsos, Hadjieleftheriou, Kollios, and Sclaroff, SIGMOD Athitsos, Hadjieleftheriou, Kollios, and Sclaroff, TODS, June 2007.

73 Capturing Non-Metric Structure A human is not similar to a horse. A centaur is similar both to a human and a horse. Triangle inequality is violated: Using human ratings of similarity (Tversky, 1982). Using k-median Hausdorff distance.

74 Capturing Non-Metric Structure Mapping to a metric space presents dilemma: If D(F(centaur), F(human)) = D(F(centaur), F(horse)) = C, then D(F(human), F(horse)) <= 2C. Query-sensitive embeddings: Have the modeling power to preserve non-metric structure.

75 Local Importance of Coordinates How important is each coordinate in comparing embeddings? database x1x1 x2x2 xnxn embedding F RdRd q query x 11 x 12 x 13 x 14 … x 1d x 21 x 22 x 23 x 24 … x 2d x n1 x n2 x n3 x n4 … x nd q1q1 q2q2 q3q3 q4q4 … qdqd

76 F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

77 original space X 12 3 Classifier: H = w 1 F’ 1 + w 2 F’ 2 + … + w j F’ j. Observation: accuracy of weak classifiers depends on query. F’ 1 is perfect for (q, a, b) where q = reference object 1. F’ 1 is good for queries close to reference object 1. Question: how can we capture that? General Intuition

78 Query-Sensitive Weak Classifiers V: area of influence (interval of real numbers). F’(q, a, b) if F(q) is in V Q F,V (q, a, b) = “I don’t know” if F(q) not in V original space X 12 3

79 Query-Sensitive Weak Classifiers V: area of influence (interval of real numbers). F’(q, a, b) if F(q) is in V Q F,V (q, a, b) = “I don’t know” if F(q) not in V If V includes all real numbers, Q F,V = F’. original space X 12 j

80 Applying AdaBoost original space X FdFd F2F2 F1F1 Real line AdaBoost forms classifiers Q F i,V i. F i : 1D embedding. V i : area of influence for F i. Output: H = w 1 Q F 1,V 1 + w 2 Q F 2,V 2 + … + w d Q F d,V d.

81 Applying AdaBoost original space X FdFd F2F2 F1F1 Real line Empirical observation: At late stages of the training, query-sensitive weak classifiers are still useful, whereas query-insensitive classifiers are not.

82 From Classifier to Embedding What embedding should we use? What distance measure should we use? AdaBoost output H(q, a, b) =  i=1 w i Q F i,V i (q, a, b) d

83 From Classifier to Embedding D(F(q), F(x)) =  i=1 w i S F i,V i (q) |F i (q) – F i (x)| d F(x) = (F 1 (x), …, F d (x)) BoostMap embedding Distance measure AdaBoost output H(q, a, b) =  i=1 w i Q F i,V i (q, a, b) d

84 From Classifier to Embedding Distance measure is query-sensitive. Weighted L 1 distance, weights depend on q. S F,V (q) = 1 if F(q) is in V, 0 otherwise. D(F(q), F(x)) =  i=1 w i S F i,V i (q) |F i (q) – F i (x)| d F(x) = (F 1 (x), …, F d (x)) BoostMap embedding Distance measure AdaBoost output H(q, a, b) =  i=1 w i Q F i,V i (q, a, b) d

85 Centaurs Revisited Reference objects: human, horse, centaur. For centaur queries, use weights (0,0,1). For human queries, use weights (1,0,0). Query-sensitive distances are non-metric. Combine efficiency of L 1 distance and ability to capture non- metric structure.

86 F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

87 Recap of Advantages Capturing non-metric structure. Finding most informative reference objects for each query. Richer model overall. Choosing a weak classifier now also involves choosing an area of influence.

88 Query- Sensitive Query- Insensitive Accuracy95% # of distances Sec. per query3395 Speed-up factor165.6 Query set: 1000 time series. Database: time series. Dynamic Time Warping on Time Series

89 Query- Sensitive Vlachos KDD 2003 Accuracy100% # of distances640over 6500 Sec. per query10.7over 110 Speed-up factor51.2under 5 Query set: 50 time series. Database: time series. Dynamic Time Warping on Time Series

90 Cascades of Embeddings Speeding up nearest neighbor classification. Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures. Athitsos, Alon, and Sclaroff, CVPR 2005.

91 Speeding Up Classification For each test object: Measure distance to 100 prototypes. Find 700 nearest neighbors using the embedding. Find 3 nearest neighbors among the 700 candidates. Is this work always necessary?

92 Speeding Up Classification Suppose that, for some test object: We measure distance to 10 prototypes. Find 50 nearest neighbors using the embedding. All 50 objects are twos. It is a two!

93 Using a Cascade 10 dimensions, 50 nearest neighbors. 20 dimensions, 26 nearest neighbors. 30 dimensions, 43 nearest neighbors. 40 dimensions, 32 nearest neighbors. … Filter-and-refine, 1000 distances. Easy objects take less work to recognize. Thresholds can be learned.

94 Brute force BoostMapCascade Distances per query Average time 22 min67 sec6.2 sec Error rate 0.63%0.68%0.74% Cascade Results on MNIST

95 Brute force BoostMapCascade Cascade (60000) Distances per query Average time 22 min67 sec6.2 sec5.2 sec Error rate 0.63%0.68%0.74%0.61% Cascade Results on MNIST

96 Results on UNIPEN Dataset Method Distances per query Seconds per query Error rate Brute force10, % VP-trees1, % VP-trees % Bahlmann % BoostMap % BoostMap % Cascade %

97 BoostMap Recap - Theory Machine-learning method for optimizing embeddings. Explicitly maximizes amount of nearest neighbor structure preserved by embedding. Optimization method is independent of underlying geometry. Query-sensitive version can capture non-metric structure. Additional savings can be gained using cascades.

98 END