Download presentation
Presentation is loading. Please wait.
Published byLesley Randall Modified over 9 years ago
1
1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.
2
2 Outline Background and Motivation Support Cluster Machine - SCM Kernel in SCM Experiments An Interesting Application: Privacy-preserving Data Mining Discussions
3
3 Background and Motivation Large scale classification problem Decomposition methods Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001; Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006; Parallel techniques Collobert et al., 2001; Graf et al., 2004; Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001; Choose representatives Active learning - Schohn & Cohn, 2003; Cluster Based-SVM - Yu et al., 2003; Core Vector Machine (CVM) - Tsang et al., 2005; Clustering SVM - Boley, D. & Cao, 2004;
4
4 Support Cluster Machine - SCM Given training samples: Procedure
5
5 SCM Solution Dual representation Decision function
6
6 Kernel Probability product kernel By Gaussian assumption, i.e., Hence
7
7 Kernel Property I That is Decision function Property II
8
8 Experiments Datasets Toydata MNIST – Handwritten digits ( ‘0’-’9’ ) classification Adult – Privacy-preserving Dataset Clustering algorithms Threshold Order Dependent (TOD) EM algorithm Classification methods libSVM SVMTorch SVM light CVM (Core Vector Machine) SCM Model selection CPU: 3.0GHz
9
9 Toydata Samples: 2500 samples/class generated from a mixture of Gaussian distribution Clustering algorithm: TOD Clustering results: 25 positive, 25 negative
10
10 MNIST Data description 10 classes: Handwritten digits ‘0’-’9’ Training samples: 60,000, about 6000 for each class Testing samples: 10,000 Construct 45 binary classifiers Results 25 Clusters for EM algorithm
11
11 MNIST Test results for TOD algorithm
12
12 Privacy-preserving Data Mining Inter-Enterprise data mining Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information. Horizontally partitioned Records (users) split across companies Example: Credit card fraud detection model Vertically partitioned Attributes split across companies Example: Associations across websites
13
13 Privacy-preserving Data Mining Randomization approach 50 | 40K |...30 | 70K |...... Randomizer Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms Model 65 | 20K |...25 | 60K |......
14
14 Classification Example
15
15 Privacy-preserving Dataset: Adult Data description Training samples: 30162 Testing samples: 15060 Percentage of positive samples: 24.78% Procedure Horizontally partition data into three subsets (parties) Cluster by TOD algorithm Obtain three positive and three negative GMMs Combine positive and negative GMMs into one positive and one negative GMMs with modified priors Classify them by SCM
16
16 Privacy-preserving Dataset: Adult Partition results Experimental results
17
17 Discussions Solved problems Large scale problems: downsample by clustering + classifier Privacy-preserving problems: hide individual information Differences to other methods Training units are generative model, testing units are vectors Training units contain complete statistical information Only one parameter for model selection Easy implementation Generalization ability is not clear, while the RBF kernel in SVM has the property of larger width leads to lower VC dimension.
18
18 Discussions Advantages of using priors and covariances
19
19 Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.