Presentation is loading. Please wait.

Presentation is loading. Please wait.

Similarity-based Classifiers: Problems and Solutions.

Similar presentations


Presentation on theme: "Similarity-based Classifiers: Problems and Solutions."— Presentation transcript:

1 Similarity-based Classifiers: Problems and Solutions

2 Classifying based on similarities: 2 Van Gogh Or Monet ? Van Gogh Monet

3 the Similarity-based Classification Problem 3 (painter) (paintings)

4 the Similarity-based Classification Problem 4

5 5 ?

6 Examples of Similarity Functions Computational Biology – Smith-Waterman algorithm (Smith & Waterman, 1981) – FASTA algorithm (Lipman & Pearson, 1985) – BLAST algorithm (Altschul et al., 1990) Computer Vision – Tangent distance (Duda et al., 2001) – Earth mover’s distance (Rubner et al., 2000) – Shape matching distance (Belongie et al., 2002) – Pyramid match kernel (Grauman & Darrell, 2007) Information Retrieval – Levenshtein distance (Levenshtein, 1966) – Cosine similarity between tf-idf vectors (Manning & Schütze, 1999) 6

7 Approaches to Similarity-based Classification 7 MDS Similarities as kernels SVM Similarities as features theory k-NN weights Generative Models SDA

8 Approaches to Similarity-based Classification 8 MDS Similarities as kernels SVM Similarities as features theory k-NN weights Generative Models SDA

9 Can we treat similarities as kernels? 9

10 10

11 Can we treat similarities as kernels? 11

12 Example: Amazon similarity 12 96 books

13 Example: Amazon similarity 13 96 books

14 Example: Amazon similarity 96 books Rank

15 Well, let’s just make S be a kernel matrix 15 00

16 Well, let’s just make S be a kernel matrix 16 00

17 Well, let’s just make S be a kernel matrix 17 00

18 Well, let’s just make S be a kernel matrix 18 00 Flip, Clip or Shift? Best bet is Clip.

19 Well, let’s just make S be a kernel matrix 19 Learn the best kernel matrix for the SVM: (Luss NIPS 2007, Chen et al. ICML 2009) Learn the best kernel matrix for the SVM: (Luss NIPS 2007, Chen et al. ICML 2009)

20 Approaches to Similarity-based Classification 20. MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA

21 Let the similarities to the training samples be features – SVM (Graepel et al., 1998; Liao & Noble, 2003) – Linear programming (LP) machine (Graepel et al., 1999) – Linear discriminant analysis (LDA) (Pekalska et al., 2001) – Quadratic discriminant analysis (QDA) (Pekalska & Duin, 2002) – Potential support vector machine (P-SVM) (Hochreiter & Obermayer, 2006; Knebel et al., 2008) 21

22 22 AMAZON47 classes AURAL SONAR 2 classes CALTECH 101 classes FACE REC 139 classes MIREX 10 classes VOTING VDM 2 classes # samplesn = 204n =100n = 8677n = 945n = 3090n = 435 SVM (clip) 81.2413.0033.494.1857.834.89 SVM sim- as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim- as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34

23 23 AMAZON47 classes AURAL SONAR 2 classes CALTECH 101 classes FACE REC 139 classes MIREX 10 classes VOTING VDM 2 classes # samplesn = 204n =100n = 8677n = 945n = 3090n = 435 SVM-kNN (clip) (Zhang et al. 2006) 17.5613.7536.824.2361.255.23 SVM (clip) 81.2413.0033.494.1857.834.89 SVM sim- as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim- as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34

24 Approaches to Similarity-based Classification 24 MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA

25 Weighted Nearest-Neighbors Take a weighted vote of the k-nearest-neighbors: Algorithmic parallel of the exemplar model of human learning. 25 ?

26 Weighted Nearest-Neighbors Take a weighted vote of the k-nearest-neighbors: Algorithmic parallel of the exemplar model of human learning. 26

27 Design Goals for the Weights 27 ?

28 Design Goals for the Weights 28 Design Goal 1 (Affinity): w i should be an increasing function of ψ(x, x i ). ?

29 Design Goals for the Weights 29 ?

30 Design Goals for the Weights (Chen et al. JMLR 2009) 30 Design Goal 2 (Diversity): w i should be a decreasing function of ψ(x i, x j ). ?

31 Linear Interpolation Weights Linear interpolation weights will meet these goals: 31

32 Linear Interpolation Weights Linear interpolation weights will meet these goals: 32

33 LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 33

34 LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 34

35 LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 35

36 LIME weights Linear interpolation weights will meet these goals: Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006): 36

37 Kernelize Linear Interpolation (Chen et al. JMLR 2009) 37

38 Kernelize Linear Interpolation 38 regularizes the variance of the weights

39 Kernelize Linear Interpolation 39 only need inner products – can replace with kernel or similarities!

40 KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 40

41 KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 41 affinity:

42 KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 42 diversity:

43 KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: 43

44 KRI Weights Satisfy Design Goals Kernel ridge interpolation (KRI) weights: Remove the constraints on the weights: Can show equivalent to local ridge regression: KRR weights. 44

45 Weighted k-NN: Example 1 45 KRI weightsKRR weights

46 Weighted k-NN: Example 2 46 KRI weightsKRR weights

47 Weighted k-NN: Example 3 47 KRI weightsKRR weights

48 48 Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34

49 Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 49

50 Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 50

51 Amazon- 47 Aural Sonar Caltech- 101 Face Rec MirexVoting # samples20410086779453090435 # classes472101139102 LOCAL k-NN16.9517.0041.554.2361.215.80 affinity k-NN15.00 39.204.2361.155.86 KRI k-NN (clip)17.6814.0030.134.1561.205.29 KRR k-NN (pinv)16.1015.2529.904.3161.185.52 SVM-KNN (clip)17.5613.7536.824.2361.255.23 GLOBAL SVM sim-as-kernel (clip) 81.2413.0033.494.1857.834.89 SVM sim-as-feature (linear) 76.1014.2538.184.2955.545.40 SVM sim-as-feature (RBF) 75.9814.2538.163.9255.725.52 P-SVM70.1214.2534.234.0563.815.34 51

52 Approaches to Similarity-based Classification 52. MDS Similarities as Kernels SVM Similarities as features theory k-NN weights Generative Models SDA

53 Generative Classifiers 53

54 Generative Classifiers 54

55 Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009) 55

56 Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009) 56 Reg. Local SDA Performance: Competitive

57 Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 57

58 Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 58

59 Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 59

60 Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 60

61 Some Conclusions Performance depends heavily on oddities of each dataset Weighted k-NN with affinity-diversity weights work well. Preliminary: Reg. Local SDA works well. Probabilities useful. Local models useful - less approximating - hard to model entire space, underlying manifold? - always feasible 61

62 Lots of Open Questions Making S PSD. Fast k-NN search for similarities Similarity-based regression Relationship with learning on graphs Try it out on real data Fusion with Euclidean features (see our FUSION 2009 papers) Open theoretical questions (Chen et al. JMLR 2009, Balcan et al. ML 2008) 62

63 Code/Data/Papers: idl.ee.washington.edu/similaritylearning Similarity-based Classification by Chen et al., JMLR 2009

64 Training and Test Consistency For a test sample x, given, shall we classify x as 64 No! If a training sample was used as a test sample, could change its class!

65 Data Sets 65 AmazonAural SonarProtein Eigenvalue Rank Eigenvalue

66 Data Sets 66 VotingYeast-5-7Yeast-5-12 Eigenvalue Eigenvalue Rank

67 SVM Review Empirical risk minimization (ERM) with regularization: 67 Hinge loss: SVM Primal:

68 Learning the Kernel Matrix Find for classification the best K regularized toward S: 68 SVM that learns the full kernel matrix:

69 Related Work 69 Robust SVM (Luss & d’Aspremont, 2007): SVM Dual: “This can be interpreted as a worst-case robust classification problem with bounded uncertainty on the kernel matrix K.”

70 Related Work 70 Let Rewrite the robust SVM as Theorem (Sion, 1958) Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M  N, which is quasiconcave in M, quasiconvex in N, upper semi- continuous in μ for each ν  N, and lower semi-continuous in ν for each μ  M, then Theorem (Sion, 1958) Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M  N, which is quasiconcave in M, quasiconvex in N, upper semi- continuous in μ for each ν  N, and lower semi-continuous in ν for each μ  M, then

71 Related Work 71 Let Rewrite the robust SVM as By Sion’s minimax theorem, the robust SVM is equivalent to: Compare zero duality gap

72 Learning the Kernel Matrix It is not trivial to directly solve: 72 Lemma (Generalized Schur Complement) Let, and. Then if and only if, z is in the range of K, and. Lemma (Generalized Schur Complement) Let, and. Then if and only if, z is in the range of K, and. Let, and notice that since.

73 Learning the Kernel Matrix It is not trivial to directly solve: 73 However, it can be expressed as a convex conic program: – We can recover the optimal by.

74 Learning the Spectrum Modification Concerns about learning the full kernel matrix: – Though the problem is convex, the number of variables is O(n 2 ). – The flexibility of the model may lead to overfitting. 74


Download ppt "Similarity-based Classifiers: Problems and Solutions."

Similar presentations


Ads by Google