Download presentation
Presentation is loading. Please wait.
Published byGinger Bond Modified over 9 years ago
1
Similarity-based Classifiers: Problems and Solutions
2
Classifying based on similarities: 2 Van Gogh Or Monet ? Van Gogh Monet
3
the Similarity-based Classification Problem 3 (painter) (paintings)
4
the Similarity-based Classification Problem 4
5
5 ?
6
Examples of Similarity Functions Computational Biology – Smith-Waterman algorithm (Smith & Waterman, 1981) – FASTA algorithm (Lipman & Pearson, 1985) – BLAST algorithm (Altschul et al., 1990) Computer Vision – Tangent distance (Duda et al., 2001) – Earth mover’s distance (Rubner et al., 2000) – Shape matching distance (Belongie et al., 2002) – Pyramid match kernel (Grauman & Darrell, 2007) Information Retrieval – Levenshtein distance (Levenshtein, 1966) – Cosine similarity between tf-idf vectors (Manning & Schütze, 1999) 6
7
Approaches to Similarity-based Classification 7 MDS Similarities as kernels SVM Similarities as features theory k-NN weights Generative Models SDA
8
Approaches to Similarity-based Classification 8 MDS Similarities as kernels SVM Similarities as features theory k-NN weights Generative Models SDA
9
Can we treat similarities as kernels? 9
10
10
11
Can we treat similarities as kernels? 11
12
Example: Amazon similarity 12 96 books
13
Example: Amazon similarity 13 96 books
14
Example: Amazon similarity 96 books Rank
15
Well, let’s just make S be a kernel matrix 15 00
16
Well, let’s just make S be a kernel matrix 16 00
17
Well, let’s just make S be a kernel matrix 17 00
18
Well, let’s just make S be a kernel matrix 18 00 Flip, Clip or Shift? Best bet is Clip.
19
Well, let’s just make S be a kernel matrix 19 Learn the best kernel matrix for the SVM: (Luss NIPS 2007, Chen et al. ICML 2009) Learn the best kernel matrix for the SVM: (Luss NIPS 2007, Chen et al. ICML 2009)
20
Approaches to Similarity-based Classification 20. MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA
21
Let the similarities to the training samples be features – SVM (Graepel et al., 1998; Liao & Noble, 2003) – Linear programming (LP) machine (Graepel et al., 1999) – Linear discriminant analysis (LDA) (Pekalska et al., 2001) – Quadratic discriminant analysis (QDA) (Pekalska & Duin, 2002) – Potential support vector machine (P-SVM) (Hochreiter & Obermayer, 2006; Knebel et al., 2008) 21
22
22 AMAZON47 classes AURAL SONAR 2 classes CALTECH 101 classes FACE REC 139 classes MIREX 10 classes VOTING VDM 2 classes # samplesn = 204n =100n = 8677n = 945n = 3090n = 435 SVM (clip) 81.2413.0033.494.1857.834.89 SVM sim- as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim- as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34
23
23 AMAZON47 classes AURAL SONAR 2 classes CALTECH 101 classes FACE REC 139 classes MIREX 10 classes VOTING VDM 2 classes # samplesn = 204n =100n = 8677n = 945n = 3090n = 435 SVM-kNN (clip) (Zhang et al. 2006) 17.5613.7536.824.2361.255.23 SVM (clip) 81.2413.0033.494.1857.834.89 SVM sim- as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim- as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34
24
Approaches to Similarity-based Classification 24 MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA
25
Weighted Nearest-Neighbors Take a weighted vote of the k-nearest-neighbors: Algorithmic parallel of the exemplar model of human learning. 25 ?
26
Weighted Nearest-Neighbors Take a weighted vote of the k-nearest-neighbors: Algorithmic parallel of the exemplar model of human learning. 26
27
Design Goals for the Weights 27 ?
28
Design Goals for the Weights 28 Design Goal 1 (Affinity): w i should be an increasing function of ψ(x, x i ). ?
29
Design Goals for the Weights 29 ?
30
Design Goals for the Weights (Chen et al. JMLR 2009) 30 Design Goal 2 (Diversity): w i should be a decreasing function of ψ(x i, x j ). ?
31
Linear Interpolation Weights Linear interpolation weights will meet these goals: 31
32
Linear Interpolation Weights Linear interpolation weights will meet these goals: 32
33
LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 33
34
LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 34
35
LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 35
36
LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 36
37
Kernelize Linear Interpolation (Chen et al. JMLR 2009) 37
38
Kernelize Linear Interpolation 38 regularizes the variance of the weights
39
Kernelize Linear Interpolation 39 only need inner products – can replace with kernel or similarities!
40
KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 40
41
KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 41 affinity:
42
KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 42 diversity:
43
KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 43
44
KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: Remove the constraints on the weights: Can show equivalent to local ridge regression: KRR weights. 44
45
Weighted k-NN: Example 1 45 KRI weightsKRR weights
46
Weighted k-NN: Example 2 46 KRI weightsKRR weights
47
Weighted k-NN: Example 3 47 KRI weightsKRR weights
48
48 Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34
49
Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 49
50
Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 50
51
Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 51
52
Approaches to Similarity-based Classification 52. MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA
53
Generative Classifiers 53
54
Generative Classifiers 54
55
Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009) 55
56
Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009) 56 Reg. Local SDA Performance: Competitive
57
Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 57
58
Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 58
59
Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 59
60
Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 60
61
Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 61
62
Lots of Open Questions Making S PSD. Fast k-NN search for similarities Similarity-based regression Relationship with learning on graphs Try it out on real data Fusion with Euclidean features (see our FUSION 2009 papers) Open theoretical questions (Chen et al. JMLR 2009, Balcan et al. ML 2008) 62
63
Code/Data/Papers: idl.ee.washington.edu/similaritylearning Similarity-based Classification by Chen et al., JMLR 2009
64
Training and Test Consistency For a test sample x, given, shall we classify x as 64 No! If a training sample was used as a test sample, could change its class!
65
Data Sets 65 AmazonAural SonarProtein Eigenvalue Rank Eigenvalue
66
Data Sets 66 VotingYeast-5-7Yeast-5-12 Eigenvalue Eigenvalue Rank
67
SVM Review Empirical risk minimization (ERM) with regularization: 67 Hinge loss: SVM Primal:
68
Learning the Kernel Matrix Find for classification the best K regularized toward S: 68 SVM that learns the full kernel matrix:
69
Related Work 69 Robust SVM (Luss & d’Aspremont, 2007): SVM Dual: “This can be interpreted as a worst-case robust classification problem with bounded uncertainty on the kernel matrix K.”
70
Related Work 70 Let Rewrite the robust SVM as Theorem (Sion, 1958) Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M N, which is quasiconcave in M, quasiconvex in N, upper semi- continuous in μ for each ν N, and lower semi-continuous in ν for each μ M, then Theorem (Sion, 1958) Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M N, which is quasiconcave in M, quasiconvex in N, upper semi- continuous in μ for each ν N, and lower semi-continuous in ν for each μ M, then
71
Related Work 71 Let Rewrite the robust SVM as By Sion’s minimax theorem, the robust SVM is equivalent to: Compare zero duality gap
72
Learning the Kernel Matrix It is not trivial to directly solve: 72 Lemma (Generalized Schur Complement) Let, and. Then if and only if, z is in the range of K, and. Lemma (Generalized Schur Complement) Let, and. Then if and only if, z is in the range of K, and. Let, and notice that since.
73
Learning the Kernel Matrix It is not trivial to directly solve: 73 However, it can be expressed as a convex conic program: – We can recover the optimal by.
74
Learning the Spectrum Modification Concerns about learning the full kernel matrix: – Though the problem is convex, the number of variables is O(n 2 ). – The flexibility of the model may lead to overfitting. 74
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.