Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola.

Slides:



Advertisements
Similar presentations
1/38 Jochen Jäger University of Washington Department of Computer Science Advisors: Larry Ruzzo Rimli Sengupta Improved gene selection in microarrays by.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Targeted Projection Pursuit for Microarray Data Analysis Joe Faith Northumbria University.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Transcriptional Control in Eukaryotes Background Information Microarrays.
Bio277 Lab 2: Clustering and Classification of Microarray Data Jess Mar Department of Biostatistics Quackenbush Lab DFCI
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Reduced Support Vector Machine
Supervised gene expression data analysis using SVMs and MLPs Giorgio Valentini
. Differentially Expressed Genes, Class Discovery & Classification.
Mining Phenotypes and Informative Genes from Gene Expression Data Chun Tang, Aidong Zhang and Jian Pei Department of Computer Science and Engineering State.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 Reference: [DS1] D.A. Notterman, U. Alon, A.J. Sierk, and A.J. Levine (2001). Transcriptional Gene Expression Profiles of Colorectal Adenoma, Adenocarcinoma,
CIBB-WIRN 2004 Perugia, 14 th -17 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini Feature.
Expression of kinase genes in primary hyperparathyroidism; Adenoma versus hyperplastic parathyroid tissue Pinhas P. Schachter1 M.D., Suhail Ayesh2 PhD,
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
DIMACS Workshop on Machine Learning Techniques in Bioinformatics 1 Cancer Classification with Data-dependent Kernels Anne Ya Zhang (with Xue-wen.
Generate Affy.dat file Hyb. cRNA Hybridize to Affy arrays Output as Affy.chp file Text Self Organized Maps (SOMs) Functional annotation Pathway assignment.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
CBCl/AI MIT Class 19: Bioinformatics S. Mukherjee, R. Rifkin, G. Yeo, and T. Poggio.
\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
Sp’10Bafna/Ideker Classification (SVMs / Kernel method)
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Whole Genome Expression Analysis
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
The Broad Institute of MIT and Harvard Classification / Prediction.
Central dogma of biology DNA  RNA  pre-mRNA  mRNA  Protein Central dogma.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Support vector machines for classification Radek Zíka
Controlling FDR in Second Stage Analysis Catherine Tuglus Work with Mark van der Laan UC Berkeley Biostatistics.
1 Effective Feature Selection Framework for Cluster Analysis of Microarray Data Gouchol Pok Computer Science Dept. Yanbian University China Keun Ho Ryu.
Scenario 6 Distinguishing different types of leukemia to target treatment.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Examples of Classifying Expression Data / 7.90 Computational Functional Genomics Spring 2002.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
The Broad Institute of MIT and Harvard Differential Analysis.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Presented by: Isabelle Guyon Machine Learning Research.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Support Vector Machines: Brief Overview
Supervised Learning I BME 230.
Lab 4.1 From Database to Data mining
Gene expression.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Gene Expression Classification
Random Testing.
Generalization ..
Volume 1, Issue 2, Pages (March 2002)
Presentation transcript:

Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola Ancona a a)Istituto di Studi sui Sistemi Intelligenti per l’Automazione, CNR, Via Amendola 122/D-I, Bari, Italy, b)Istituto di Tecnologie Biomediche-Sezione di Bari,CNR, Via Amendola 122/D, Bari Italy c)Dipartimento Scienze Biomolecolari e Biotecnologie, Università di Milano, Via Caloria 26, Milano, Italy Abstract SVM[1] are the state-of-the-art supervised learning techniques for cancer classification. Other machine learning approaches such as RLS[2] classifiers may represent highly suitable alternative for their simplicity and reliability. We compared the performances of the RLS classifiers with SVM on three different benchmark data sets, also with respect to the number of selected genes and different gene selection strategies. We show that RLS classifiers have performances comparable to SVM classifiers expressed in terms of the LOO-error. The main advantage of RLS machines is that for solving a classification problem they use a linear system of order equal to the number of training examples. Moreover RLS machines allow to get an exact measure of the LOO error with just one training. Benchmark Data set description Leukemia data set [3]. 25 examples of Acute Myeloid Leukemia (AML) vs 47 examples of Acute Lymphoblastic one (ALL), divided into training and test set; Each sample consists of 7129 human gene expression levels (see Colon data set [4]. 40 examples of Tumor Colon tissue vs 22 Normal Colon tissue samples. Each sample consists of 2000 human gene expression levels (see Multi-cancer data set [5]. 190 examples relative to Cancer tissues, spanning 14 common tumor types, vs 90 Normal tissue samples; each example consists of the expression levels of genes (see SVMRLS LOO error on Leukemia training set22 Leukemia test error33 LOO error on Leukemia data set12 LOO error on Colon data set89 LOO error on Multi-Cancer data set8890 RLS computes the LOO error in just one training by using all the training exmples GENE SELECTION strategies Two techniques are used to rank the genes and a not parametric permutation test is used to determine how many genes are really important for classifying a given specimen: 999 genes in the Leukemia data set, 500 in the Colon one and 1400 in the Multi-Cancer one. S2N StatisticNRFE Statistic with j=1, 2, …., number of genes Visualization of the Statistic S2N 47 examples ALL25 examples AML HP HN Observed T S2N (j) distribution computed on the Leukemia data set compared to randomly permutated class distinctions. S2N Statistic LeukemiaColonMulti-Cancer genesSVMRLSgenesSVMRLSgenesSVMRLS NRFE Statistic LeukemiaColonMulti-Cancer genesSVMRLSgenesSVMRLSgenesSVMRLS Conclusions The RLS classifiers have performances comparable to the ones of SVM classifiers for the problem of cancer classification by gene expression data and are a valuable alternative to SVM because they enjoy several interesting properties. RLS machines are fast and easy to implement and, more important, they allow to measure the exact LOO error performing one training only. References [1] Vapnik, V. Statistical Learning Theory, John Wiley & Sons, INC.,1998. [2] Tikhonov, A.N. Arsenin, V. Y. Solutions of ill-posed problems, W.H. Winston Washington D.C., 1977 [3]Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caliguri, M.A., Bloomfield, C.D., Lander, E.S., (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science, 286, [4]Alon,U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.(1999) Broad patterns of gene expression revealed by clustering analysis of tumor and colon tissues probed by oligonucleotide arrays, PNAS, 96, [5]Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R. (2001) Multi-class cancer diagnosis using tumor gene expression signatures PNAS, 98,