Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Support Vector Machines

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

Mutual Information Mathematical Biology Seminar

Kernel Technique Based on Mercer’s Condition (1909)

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Reduced Support Vector Machine

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of nonlinear features.

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.

Support Vector Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Lecture 10: Support Vector Machines

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Mathematical Programming in Support Vector Machines

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Efficient Model Selection for Support Vector Machines

SVM by Sequential Minimal Optimization (SMO)

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

START OF DAY 5 Reading: Chap. 8. Support Vector Machine.

CS 478 – Tools for Machine Learning and Data Mining SVM.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Dimensionality reduction

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

SVMs in a Nutshell.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

Support vector machines

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Geometrical intuition behind the dual problem

Support vector machines

Support vector machines

Concave Minimization for Support Vector Machine Classifiers

Feature Selection Methods

University of Wisconsin - Madison

Presentation transcript:

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer dataset: 2000 # of gene vs. 62 samples  Feature selection will be needed

Feature Selection Approach Filter model  Weight score approach Wrapper model  1-norm SVM  IRSVM

Feature Selection – Filter Model Using Weight Score Approach Feature 1Feature 2Feature 3

Filter Model – Weight Score Approach Weight score: whereandare the mean and standard deviation of feature for training examples of positive or negative class.

Filter Model – Weight Score Approach is defined as the ratio between the difference of the means of expression levels and the sum of standard deviation in two classes. Selecting genes with largest as our top features. The weight score is calculated with the information about a single feature. The highly linear correlated features might be selected by this approach.

(Different Measure of Margin) 1-Norm SVM: 1-Norm SVM Equivalent to: Good for feature selection!

Clustering Process: Feature Selection & Initial Cluster Centers  6 out of 31 features selected by a linear SVM ( )  mean of area, standard error of area, worst area, worst texture, worst perimeter and tumor size

Reduced Support Vector Machine (ii) Solve the following problem by the Newton’s method min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrixof entire data matrix Nonlinear Classifier:

Reduced Set: Plays the Most Important Role in RSVM  It is natural to raise two questions:  Is there a way to choose the reduced set other than random selection so that RSVM will have a better performance?  Is there a mechanism to determine the size of reduced set automatically or dynamically?

Reduced Set Selection According to the Data Scatter in Input Space  Expected these points to be representative sample  Choose reduced set randomly but only keep the points in the reduced set that are more than a certain minimal distance apart

A Better Way According to the Data Scatter in Feature Space  An example is given as following : Training data analogous to XOR problem

Mapping to Feature Space  Map the input data via nonlinear mapping ：  Equivalent to polynomial kernel with degree 2:

Data Points in the Feature Space

The Polynomial Kernel Matrix

Experiment Result

Mathematical Observations Another Reason for IRSVM is a linear combination of a set of kernel functions  If the kernel functions are very similar, the hypothesis space spanned by this kernel functions will be very limited.  In SVMs, the nonlinear separating surface is:  In RSVMs, the nonlinear separating surface

Incremental Reduced SVMs The strength of weak ties  Start with a very small reduced set, then add a new data point only when the kernel vector is dissimilar to the current function set  This point contributes the most extra information for generating the separating surface  Repeat until several successive points cannot be added  The strength of weak ties (….)

 The distance from the kernel vector to the column space of is greater than a threshold  The criterion for adding a point into reduced set is  This distance can be determined by solving a least squares problem How to measure the dissimilar? Solving Least Squares Problems  It has a unique solution, and the distance is

IRSVM Algorithm pseudo-code (sequential version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For each data point not in the reduced set 4 Computes its kernel vector 5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix 7 If its distance exceed a certain threshold 8 Add this point into the reduced set and form the new reduced kernel matrix 9 Until several successive failures happened in line 7 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

Wrapper Model – IRSVM Find a Linear Classifier: I. Randomly choose a very small feature subset from the input features as the initial feature reduced set. II. Select a feature vector not in the current feature reduced set and computing the distance between this vector and the space spanned by current feature reduced set. III. If the distance is larger than a given gap, then we add this feature vector to the feature reduced set. IV. Repeat step II and step III until there are no feature can be added to the current feature reduced set. V. Features in the resulting feature reduced set is our final result of feature selection.