Presentation is loading. Please wait.

Presentation is loading. Please wait.

Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University.

Similar presentations


Presentation on theme: "Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University."— Presentation transcript:

1 Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University of Guelph, 2 Biomedical Sciences, University of Guelph, 3 Obstetrics and Gynecology, University of Western Ontario, 4 Anatomy & Cell Biology, Queen’s University

2 Outline  Background  Separating Hyper-plane & Basis Expansion  Support Vector Machines  Simulations  Remarks

3 Background  Motivation  The IVF (In-Vitro Fertilization) project  18 infertile women  each undergoing the IVF treatment  Outcome (Outputs, Y’s) : Binary (pregnancy)  Predictor (Inputs, X’s): Longitudinal data (adhesion) CD56 bright cells

4 Background  methods  Classification methods  Relatively new method: Support Vector Machines - - V. Vapnik: first proposed in 1979 - input space into a high dimensional feature space - Maps input space into a high dimensional feature space - feature space - Constructs a linear classifier in the new feature space  Traditional method: Discriminant Analysis - R.A. Fisher: - R.A. Fisher: 1936 - - Classify according to the values from the discriminant functions - - Assumption: the predictors X in a given class has a Multi- Normal distribution.

5 Separating Hyper-plane Suppose there are 2 classes (A, B)   y = 1 for group A, y = -1 for group B. Let a hyper-plane be defined as f(X) = β 0 +β T X = 0 then f(X) is the decision boundary that separates the two groups. f(X) = β 0 +β T X > 0 for X Є A f(X) = β 0 +β T X < 0 for X Є B Given X 0 Є A, misclassified when f(X 0 ) < 0. Given X 0 Є B, misclassified when f(X 0 ) > 0. f(X) = β 0 +β T X = 0 A: f(X)>0 B: f(X)<0

6 f(X) = β 0 +β T X = 0 Separating Hyper-plane The perceptron learning algorithm search for a hyper-plane that minimizes the distance of misclassified points to the decision boundary. However this does not provide a unique solution.

7 Optimal Separating Hyper-plane Let C be the distance of the closest point from the two groups to the hyper-plane. The Optimal Separating hyper-plane is the unique separating hyper-plane f(X) = β 0 * +β *T X = 0, where (β * 0,β *T ) maximizes C. f(X) = β 0 * +β* T X = 0 C C

8 Optimal Separating Hyper-plane Maximization Problem: C f(X) = β 0 * +β* T X = 0 C (the support vectors) Dual LaGrange problem: Subjects to 1. α i [y i (x i Tβ+ β 0 ) -1] = 0 2. α i ≥ 0 all i=1…N 3. β = Σ i=1..N α i y i x i 4. Σ i=1..N α i y i = 0 5. The Kuhn Tucker Conditions f(X) only depends on the x i ’s where α i ≠ 0

9 Optimal Separating Hyper-plane C f(X) = β 0 * +β* T X = 0 C (the support vectors)

10 Basis Expansion Suppose there are p inputs X=(x … x p ) Suppose there are p inputs X=(x 1 … x p ) Let h k (X) be a transformation that maps X from R  R. Let h k (X) be a transformation that maps X from R p  R. h k (X) is called the basis function. H = {h 1 (X), …,h m (X)} is the basis of a new feature space (dim=m) X=(x)H = {h(X), h(X),h(X)} Example: X=(x 1,x 2 )H = {h 1 (X), h 2 (X),h 3 (X)} h(X) = h(x) = x h 1 (X) = h 1 (x 1,x 2 ) = x 1, h(X) = h(x) =x h 2 (X) = h 2 (x 1,x 2 ) = x 2, h(X) = h(x) =x 1 x 2 h 3 (X) = h 3 (x 1,x 2 ) = x 1 x 2 X_new = H(X)= (x 1, x 2, x 1 x 2 ) x1x1 x2x2 x1x2x1x2 x 1 + x 2 +

11 Support Vector Machines The optimal hyper-plane {X| The optimal hyper-plane {X| f(X) = β 0 * +β* T X=0 }. is called the Support Vector Classifier. f(X) = β 0 * +β* T X is called the Support Vector Classifier. Separable Case: Separable Case: all points are outside of the margins The classification rule is the sign of the decision function. f(X) = β 0 * +β* T X = 0 C C

12 Support Vector Machines Hyper-plane: {X| Hyper-plane: {X| f(X) = β 0 +β T X = 0 } Non-separable Case: Non-separable Case: training data is non-separable. f(X) = β 0 +β T X = 0 S i = C – yi f(X i ) when X i crosses the margin and it’s zero when Xi outside. Xi crosses the margin of its group when C – y i f(X i ) > 0. y i f(X i ) SiSi C Let ξ i C =S i, ξi is the proportional of C that the prediction has crossed the margin. Misclassification occurs when S i > C (ξ i > 1).

13 Support Vector Machines The overall misclassification is Σξ i, The overall misclassification is Σξ i, and is bounded by δ. Maximization Problem: Dual LaGrange problem: (non-separable case) s.t.. 0≤ α i ≤ ζ, Σ α i y i = 0 Subjects to 1. α i [y i (x i Tβ+ β0) –(1-ξ i )] = 0 2. v i ≥ 0 all i=1…N 3. β = Σ α i y i x i 4. The Kuhn Tucker Conditions

14 Support Vector Machines SVM search for an optimal hyper-plane in a new feature space where the data are more separate. The linear classifier becomes Dual LaGrange problem: Suppose H = {h1(X), …,hm(X)} is the basis for the new feature space F. All elements in the new feature space is a linear basis expansion of X.

15 Support Vector Machines For example: Kernel: This implies The kernel and the basis transformation define one another.

16 Support Vector Machines Dual LaGrange function: This shows the basis transformation in SVM does not need to be define explicitly. The most common kernels: 1. d th Degree Polynomial: 2. Radial Basis: 3. Neural Network:

17 Simulations  3 cases  100 simulations per case  Each simulation consists of 200 points  100 points from each group  Input space: 2 dimensional  Output: 0 or 1 (2 groups)  Half of the points are randomly selected as the training set. X=(x 1,x 2 ),Y є {0,1}

18 Simulations Case 1 (Normal with same covariance matrix) Black ~ group 0 Red ~ group 1

19 Simulations Case 1 Misclassifications (in 100 simulations) TrainingTesting MeanSdMeanSd LDA7.85%2.658.07%2.51 SVM6.98%2.338.48%2.81

20 Simulations Case 2 (Normal with unequal covariance matrixes) Black ~ group 0 Red ~ group 1

21 Simulations Case 2 Misclassifications (in 100 simulations) TrainingTesting MeanSdMeanSd QDA15.5%3.7516.84%3.48 SVM13.6%4.0318.8%4.01

22 Simulations Case 3 (Non-normal) Black ~ group 0 Red ~ group 1

23 Simulations Case 3 Misclassifications (in 100 simulations) TrainingTesting MeanSdMeanSd QDA14%3.7916.8%3.63 SVM9.34%3.4614.8%3.21

24 Simulations Paired t-test for differences in misclassifications Ho: mean different = 0; Ha: mean different ≠ 0 Case 1 mean different (LDA - SVM) = - 0.41, se = 0.3877 mean different (LDA - SVM) = - 0.41, se = 0.3877 t = -1.057, p-value = 0.29 (insignificant) t = -1.057, p-value = 0.29 (insignificant) Case 2 mean different (QDA - SVM) = -1.96, se = 0.4170 mean different (QDA - SVM) = -1.96, se = 0.4170 t = -4.70, p-value = 8.42e-06 (significant) Case 3 mean different (QDA - SVM) = 2, sd= 0.4218 mean different (QDA - SVM) = 2, sd= 0.4218 t = 4.74, p-value = 7.13e-06 (significant)

25 Remarks Support Vector Machines  Maps the original input space onto a feature space of higher dimension  No assumption on the distributions of X’s Performance  The performances of Discriminant Analysis and SVM are similar (when (X|Y) has a Normal distribution and share the same Σ)  Discriminant Analysis has a better performance (when the covariance matrices for the two groups are different)  SVM has a better performance (when the input (X) violated the distribution assumption) (when the input (X) violated the distribution assumption)

26 Reference 1. 1. N. Cristianini, and J. Shawe-Taylor An introduction to Support Vector Machines and other kernel-based learning methods. New York: Cambridge University Press, 2000. 2. 2. J. Friedman, T. Hastie, and R. Tibshirani The Elements of Statistical Learning. NewYork: Springer, 2001. 3. 3. D. Meyer, C. Chang, and C. Lin. R Documentation: Support Vector Machines. http://www.maths.lth.se/help/R/.R/library/e1071/html/svm.htmlhttp://www.maths.lth.se/help/R/.R/library/e1071/html/svm.html Last updated: March 2006 4. 4. H. Planatscher and J. Dietzsch. SVM-Tutorial using R (e1071-package) http://www.potschi.de/svmtut/svmtut.htm http://www.potschi.de/svmtut/svmtut.htm 5. 5. M. Van Den Heuvel, J. Horrocks, S. Bashar, S. Taylor, S. Burke, K. Hatta, E. Lewis, and A. Croy. Menstrual Cycle Hormones Induce Changes in Functional Interac-tions Between Lymphocytes and Endothelial Cells. Journal of Clinical Endocrinology and Metabolism, 2005.

27


Download ppt "Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University."

Similar presentations


Ads by Google