CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Support Vector Machines

Lecture 9 Support Vector Machines

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Classification / Regression Support Vector Machines

Linear Classifiers/SVMs

Data Mining Classification: Alternative Techniques

An Introduction of Support Vector Machine

Support Vector Machines

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Classification and Decision Boundaries

Discriminative and generative methods for bags of features

Machine Learning Bioinformatics Data Analysis and Tools

Support Vector Machines (and Kernel Methods in general)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Discrimination or Class prediction or Supervised Learning.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?

Lecture 5 (Classification with Decision Trees)

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

CS 4700: Foundations of Artificial Intelligence

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

An Introduction to Support Vector Machines Martin Law.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.

Statistics for Microarray Data Analysis with R Session 8: Discrimination Class web site:

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

The Broad Institute of MIT and Harvard Classification / Prediction.

Microarray Workshop 1 Introduction to Classification Issues in Microarray Data Analysis Jane Fridlyand Jean Yee Hwa Yang University of California, San.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Today Ensemble Methods. Recap of the course. Classifier Fusion

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Topics in analysis of microarray data : clustering and discrimination

Support Vector Machines

CS 9633 Machine Learning Support Vector Machines

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Support Vector Machines

Basic machine learning background with Python scikit-learn

An Introduction to Support Vector Machines

An Introduction to Support Vector Machines

COSC 4335: Other Classification Techniques

Support Vector Machine _ 2 (SVM)

CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis

Other Classification Models: Support Vector Machine (SVM)

Model generalization Brief summary of methods

SVMs for Document Ranking

Lecture 16. Classification (II): Practical Considerations

Support Vector Machines 2

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering

Outline Classification problem in microarray data Classification concepts and algorithms Evaluation of classification algorithms Summary 4/16/20152

Lab 2.33 ? Bad prognosis recurrence < 5yrs Good Prognosis recurrence > 5yrs Reference L van’t Veer et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, Jan.. Objects Array Feature vectors Gene expression Predefine classes Clinical outcome new array Learning set Classification rule Good Prognosis Matesis > 5

Lab 2.34 B-ALL T-ALLAML Reference Golub et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439): Objects Array Feature vectors Gene expression Predefine classes Tumor type ? new array Learning set Classification Rule T-ALL

Classification/Discrimination Each object (e.g. arrays or columns)associated with a class label (or response) Y  {1, 2, …, K} and a feature vector (vector of predictor variables) of G measurements: X = (X 1, …, X G ) Aim: predict Y_new from X_new. sample1sample2sample3sample4sample5 … New sample Y Normal Normal Normal Cancer Cancer unknown =Y_new X X_new

Discrimination/Classification Lab 2.36

7 Predefined Class {1,2,…K} 12 K Objects Basic principles of discrimination Each object associated with a class label (or response) Y  {1, 2, …, K} and a feature vector (vector of predictor variables) of G measurements: X = (X 1, …, X G ) Aim: predict Y from X. X = {red, square} Y = ? Y = Class Label = 2 X = Feature vector {colour, shape} Classification rule ?

Lab 2.38 KNN: Nearest neighbor classifier Based on a measure of distance between observations (e.g. Euclidean distance or one minus correlation). k-nearest neighbor rule (Fix and Hodges (1951)) classifies an observation X as follows: ◦ find the k observations in the learning set closest to X ◦ predict the class of X by majority vote, i.e., choose the class that is most common among those k observations. The number of neighbors k can be chosen by cross-validation (more on this later).

9 3-Nearest Neighbors query point q f 3 nearest neighbors 2x,1o

Limitation of KNN: what is K?

SVM: Support Vector Machines SVMs are currently among the best performers for a number of classification tasks ranging from text to genomic data. In order to discriminate between two classes, given a training dataset ◦ Map the data to a higher dimension space (feature space) ◦ Separate the two classes using an optimal linear separator 11

12 Key Ideas of SVM: Margins of Linear Separators Maximum margin linear classifier

13 Optimal hyperplane ρ Support vector margin Optimal hyper-plane Support vectors uniquely characterize optimal hyper-plane

Finding the Support Vectors Lagrangian multiplier method for constrained opt Inner product of vectors

15 Key Ideas of SVM: Feature Space Mapping Map the original data to some higher-dimensional feature space where the training set is linearly separable: Φ: x → φ(x) (x1,x2)(x1,x2, x1^2, x2^2, x1*x2, …)

The “Kernel Trick” The linear classifier relies on inner product between vectors K(x i,x j )=x i T x j If every datapoint is mapped into high-dimensional space via some transformation Φ : x → φ (x), the inner product becomes: K(x i,x j )= φ (x i ) T φ (x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space. Example: 2-dimensional vectors x=[x 1 x 2 ]; let K(x i,x j )=(1 + x i T x j ) 2, Need to show that K(x i,x j )= φ (x i ) T φ (x j ): K(x i,x j )=(1 + x i T x j ) 2, = 1+ x i1 2 x j x i1 x j1 x i2 x j2 + x i2 2 x j x i1 x j1 + 2x i2 x j2 = = [1 x i1 2 √2 x i1 x i2 x i2 2 √2x i1 √2x i2 ] T [1 x j1 2 √2 x j1 x j2 x j2 2 √2x j1 √2x j2 ] = = φ (x i ) T φ (x j ), where φ (x) = [1 x 1 2 √2 x 1 x 2 x 2 2 √2x 1 √2x 2 ] 16

Examples of Kernel Functions Linear: K(x i,x j )= x i T x j Polynomial of power p: K(x i,x j )= (1+ x i T x j ) p Gaussian (radial-basis function network): Sigmoid: K(x i,x j )= tanh(β 0 x i T x j + β 1 )

18 SVM Advantages: ◦ maximize the margin between two classes in the feature space characterized by a kernel function ◦ are robust with respect to high input dimension Disadvantages: ◦ difficult to incorporate background knowledge ◦ Sensitive to outliers

19 Variable/Feature Selection with SVMs Recursive Feature Elimination ◦ Train a linear SVM ◦ Remove the variables with the lowest weights (those variables affect classification the least), e.g., remove the lowest 50% of variables ◦ Retrain the SVM with remaining variables and repeat until classification is reduced Very successful Other formulations exist where minimizing the number of variables is folded into the optimization problem Similar algorithm exist for non-linear SVMs Some of the best and most efficient variable selection methods

20 Software A list of SVM implementation can be found at machines.org/software.html Some implementation (such as LIBSVM) can handle multi-class classification SVMLight, LibSVM are among one of the earliest implementation of SVM Several Matlab toolboxes for SVM are also available

How to Use SVM to Classify Microarray Data Prepare the data format for LibSVM Labels Index of non- zero features value of non- zero features : :... Usage: svm-train [options] training_set_file [model_file] Examples of options: -s 0 -c 10 - t 1 -g 1 -r 1 -d 3 Usage: svm-predict [options] test_file model_file output_file

22 Decision tree classifiers Gene 1 M i1 < Gene 2 M i2 > yes no 0.18 Advantage: transparent rules, easy to interpret G G G3 … … … Class 0 1 0

23 Ensemble classifiers Training Set X 1, X 2, … X 100 Classifier 1 Resample 1 Classifier 2 Resample 2 Classifier 499 Resample 499 Classifier 500 Resample 500 Examples: Bagging Boosting Random Forest Aggregate classifier

24 Aggregating classifiers: Bagging Training Set (arrays) X 1, X 2, … X 100 Tree 1 Resample 1 X* 1, X* 2, … X* 100 Lets the tree vote Tree 2 Resample 2 X* 1, X* 2, … X* 100 Tree 499 Resample 499 X* 1, X* 2, … X* 100 Tree 500 Resample 500 X* 1, X* 2, … X* 100 Test sample Class 1 Class 2 Class 1 90% Class 1 10% Class 2

Weka Data Mining Toolbox Weka Package (java) includes: ◦ All previous classifiers ◦ Neural networks ◦ Projection pursuit ◦ Bayesian belief networks ◦ And More 25

26 Feature Selection in Classification What: select a subset of features Why: ◦ Lead to better classification performance by removing variables that are noise with respect to the outcome ◦ May provide useful insights into the biology ◦ Can eventually lead to the diagnostic tests (e.g., “breast cancer chip”)

Classifier Performance assessment Any classification rule needs to be evaluated for its performance on the future samples. It is almost never the case in microarray studies that a large independent population-based collection of samples is available at the time of initial classifier-building phase. One needs to estimate future performance based on what is available: often the same set that is used to build the classifier. Assessing performance of the classifier based on ◦ Cross-validation. ◦ Test set ◦ Independent testing on future dataset 27

Diagram of performance assessment Training set Performance assessment Training Set Independent test set Classifier Resubstitution estimation Test set estimation

Diagram of performance assessment Training set Performance assessment Training Set Independent test set (CV) Learning set (CV) Test set Classifier Resubstitution estimation Test set estimation Cross Validation

Performance assessment V-fold cross-validation (CV) estimation: Cases in learning set randomly divided into V subsets of (nearly) equal size. Build classifiers by leaving one set out; compute test set error rates on the left out set and averaged. ◦ Bias-variance tradeoff: smaller V can give larger bias but smaller variance ◦ Computationally intensive. Leave-one-out cross validation (LOOCV). (Special case for V=n). Works well for stable classifiers (k- NN, LDA, SVM) Lab Supplementary slide

Which to use depends mostly on sample size If the sample is large enough, split into test and train groups. If sample is barely adequate for either testing or training, use leave one out In between consider V-fold. This method can give more accurate estimates than leave one out, but reduces the size of training set.

Summary Microarray Classification Task Classifiers: KNN, SVM, Decision Tree, Weka, LibSVM Classifier evaluation, cross-validation

Acknowledgement Terry Speed Jean Yee Hwa Yang Jane Fridlyand