Similarity-based Classifiers: Problems and Solutions.

Similarity-based Classifiers: Problems and Solutions

Classifying based on similarities: 2 Van Gogh Or Monet ? Van Gogh Monet

the Similarity-based Classification Problem 3 (painter) (paintings)

the Similarity-based Classification Problem 4

Examples of Similarity Functions Computational Biology – Smith-Waterman algorithm (Smith & Waterman, 1981) – FASTA algorithm (Lipman & Pearson, 1985) – BLAST algorithm (Altschul et al., 1990) Computer Vision – Tangent distance (Duda et al., 2001) – Earth mover’s distance (Rubner et al., 2000) – Shape matching distance (Belongie et al., 2002) – Pyramid match kernel (Grauman & Darrell, 2007) Information Retrieval – Levenshtein distance (Levenshtein, 1966) – Cosine similarity between tf-idf vectors (Manning & Schütze, 1999) 6

Approaches to Similarity-based Classification 7 MDS Similarities as kernels SVM Similarities as features theory k-NN weights Generative Models SDA

Approaches to Similarity-based Classification 8 MDS Similarities as kernels SVM Similarities as features theory k-NN weights Generative Models SDA

Can we treat similarities as kernels? 9

Can we treat similarities as kernels? 11

Example: Amazon similarity 12 96 books

Example: Amazon similarity 13 96 books

Example: Amazon similarity 96 books Rank

Well, let’s just make S be a kernel matrix 15 00

Well, let’s just make S be a kernel matrix 18 00 Flip, Clip or Shift? Best bet is Clip.

Well, let’s just make S be a kernel matrix 19 Learn the best kernel matrix for the SVM: (Luss NIPS 2007, Chen et al. ICML 2009) Learn the best kernel matrix for the SVM: (Luss NIPS 2007, Chen et al. ICML 2009)

Approaches to Similarity-based Classification 20. MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA

Let the similarities to the training samples be features – SVM (Graepel et al., 1998; Liao & Noble, 2003) – Linear programming (LP) machine (Graepel et al., 1999) – Linear discriminant analysis (LDA) (Pekalska et al., 2001) – Quadratic discriminant analysis (QDA) (Pekalska & Duin, 2002) – Potential support vector machine (P-SVM) (Hochreiter & Obermayer, 2006; Knebel et al., 2008) 21

22 AMAZON47 classes AURAL SONAR 2 classes CALTECH 101 classes FACE REC 139 classes MIREX 10 classes VOTING VDM 2 classes # samplesn = 204n =100n = 8677n = 945n = 3090n = 435 SVM (clip) 81.2413.0033.494.1857.834.89 SVM sim- as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim- as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34

23 AMAZON47 classes AURAL SONAR 2 classes CALTECH 101 classes FACE REC 139 classes MIREX 10 classes VOTING VDM 2 classes # samplesn = 204n =100n = 8677n = 945n = 3090n = 435 SVM-kNN (clip) (Zhang et al. 2006) 17.5613.7536.824.2361.255.23 SVM (clip) 81.2413.0033.494.1857.834.89 SVM sim- as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim- as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34

Approaches to Similarity-based Classification 24 MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA

Weighted Nearest-Neighbors Take a weighted vote of the k-nearest-neighbors: Algorithmic parallel of the exemplar model of human learning. 25 ?

Weighted Nearest-Neighbors Take a weighted vote of the k-nearest-neighbors: Algorithmic parallel of the exemplar model of human learning. 26

Design Goals for the Weights 27 ?

Design Goals for the Weights 28 Design Goal 1 (Affinity): w i should be an increasing function of ψ(x, x i ). ?

Design Goals for the Weights 29 ?

Design Goals for the Weights (Chen et al. JMLR 2009) 30 Design Goal 2 (Diversity): w i should be a decreasing function of ψ(x i, x j ). ?

Linear Interpolation Weights Linear interpolation weights will meet these goals: 31

Linear Interpolation Weights Linear interpolation weights will meet these goals: 32

LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 33

Kernelize Linear Interpolation (Chen et al. JMLR 2009) 37

Kernelize Linear Interpolation 38 regularizes the variance of the weights

Kernelize Linear Interpolation 39 only need inner products – can replace with kernel or similarities!

KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 40

KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 41 affinity:

KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 42 diversity:

KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 43

KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: Remove the constraints on the weights: Can show equivalent to local ridge regression: KRR weights. 44

Weighted k-NN: Example 1 45 KRI weightsKRR weights

48 Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34

Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 49

Approaches to Similarity-based Classification 52. MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA

Generative Classifiers 53

Generative Classifiers 54

Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009) 55

Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009) 56 Reg. Local SDA Performance: Competitive

Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 57

Lots of Open Questions Making S PSD. Fast k-NN search for similarities Similarity-based regression Relationship with learning on graphs Try it out on real data Fusion with Euclidean features (see our FUSION 2009 papers) Open theoretical questions (Chen et al. JMLR 2009, Balcan et al. ML 2008) 62

Code/Data/Papers: idl.ee.washington.edu/similaritylearning Similarity-based Classification by Chen et al., JMLR 2009

Training and Test Consistency For a test sample x, given, shall we classify x as 64 No! If a training sample was used as a test sample, could change its class!

Data Sets 65 AmazonAural SonarProtein Eigenvalue Rank Eigenvalue

Data Sets 66 VotingYeast-5-7Yeast-5-12 Eigenvalue Eigenvalue Rank

SVM Review Empirical risk minimization (ERM) with regularization: 67 Hinge loss: SVM Primal:

Learning the Kernel Matrix Find for classification the best K regularized toward S: 68 SVM that learns the full kernel matrix:

Related Work 69 Robust SVM (Luss & d’Aspremont, 2007): SVM Dual: “This can be interpreted as a worst-case robust classification problem with bounded uncertainty on the kernel matrix K.”

Related Work 70 Let Rewrite the robust SVM as Theorem (Sion, 1958) Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M  N, which is quasiconcave in M, quasiconvex in N, upper semi- continuous in μ for each ν  N, and lower semi-continuous in ν for each μ  M, then Theorem (Sion, 1958) Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M  N, which is quasiconcave in M, quasiconvex in N, upper semi- continuous in μ for each ν  N, and lower semi-continuous in ν for each μ  M, then

Related Work 71 Let Rewrite the robust SVM as By Sion’s minimax theorem, the robust SVM is equivalent to: Compare zero duality gap

Learning the Kernel Matrix It is not trivial to directly solve: 72 Lemma (Generalized Schur Complement) Let, and. Then if and only if, z is in the range of K, and. Lemma (Generalized Schur Complement) Let, and. Then if and only if, z is in the range of K, and. Let, and notice that since.

Learning the Kernel Matrix It is not trivial to directly solve: 73 However, it can be expressed as a convex conic program: – We can recover the optimal by.

Learning the Spectrum Modification Concerns about learning the full kernel matrix: – Though the problem is convex, the number of variables is O(n 2 ). – The flexibility of the model may lead to overfitting. 74

Similarity-based Classifiers: Problems and Solutions.

Similar presentations

Presentation on theme: "Similarity-based Classifiers: Problems and Solutions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Similarity-based Classifiers: Problems and Solutions.

Similar presentations

Presentation on theme: "Similarity-based Classifiers: Problems and Solutions."— Presentation transcript:

Similar presentations

About project

Feedback