Classification of Cancer Patients Naomi Altman Nov. 06.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Non-linear DA and Clustering Stat 600. Nonlinear DA We discussed LDA where our discriminant boundary was linear Now, lets consider scenarios where it.
Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal components.
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
Differentially expressed genes
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
. Differentially Expressed Genes, Class Discovery & Classification.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Patrick Kemmeren Using EP:NG.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Ensemble Learning (2), Tree and Forest
Genomic Profiles of Brain Tissue in Humans and Chimpanzees.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Classification (Supervised Clustering) Naomi Altman Nov '06.
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
Lecture 19 Classification analysis – R code
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
Copyright © Cengage Learning. All rights reserved. 4 Quadratic Functions.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Non-Bayes classifiers. Linear discriminants, neural networks.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Classification of tissues and samples 指導老師:藍清隆 演講者:張許恩、王人禾.
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
CellExpress Tutorial A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
PREDICT 422: Practical Machine Learning
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong
Exploring Microarray data
Differential Gene Expression
Regression.
Gene Expression Classification
CellExpress Examples A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Classification Discriminant Analysis
Classification Discriminant Analysis
Predicting Breast Cancer Diagnosis From Fine-Needle Aspiration
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Classification of Cancer Patients Naomi Altman Nov. 06

The Prostate Cancer Data I downloaded from GEO, 19 cel files, with genes. 6 benign tissue 6 primary tumors 7 metastatic tumors Normalized by RMA Used limma to compute F.p.value for differential expression. Wrote the data and q.values to cancerSig.txt

Some preliminaries We might want to start by clustering the samples or looking at the plots in the SVD directions (for the arrays). Since SVD and hclust both act on rows, we need to transpose the data matrix (and remember to eliminate the q-value column) groups=as.factor(colnames(sig.cancer)) plot(hclust(dist(t(sig.cancer[,1:19])))) svd.c=svd(t(sig.cancer[,1:19])) plot(svd.c$u[,1],svd.c$u[,2], col=as.numeric(groups))

Linear and Quadratic Discriminant Analysis The first 2 SVD directions do not entirely separate the samples, although we can see that we ought to be able to do this from the cluster analysis. But the cluster analysis does not give us a rule for classifying a new sample. So, lets try LDA and QDA, which can be found in the MASS library. library(MASS) lda.c=lda(t(sig.cancer[,1:19]),groups)

Linear and Quadratic Discriminant Analysis #Since R runs out of memory - cut down the number of genes - e.g. 100 gc() #you should always do this if you run out of memory lda.c=lda(t(sig.cancer[1:100,1:19]),groups)

Linear and Quadratic Discriminant Analysis #If the number of genes is greater than the number of samples, # the S matrix is singular, so the LDA function cannot be # computed! lda.c=lda(t(sig.cancer[1:12,1:19]),groups) plot(lda.c,col=as.numeric(groups)) # What is special about the first 12 genes? Nothing dim(sig.cancer) lda.c=lda(t(sig.cancer[sample(1:9299,12),1:19]),groups) plot(lda.c,col=as.numeric(groups))

Linear and Quadratic Discriminant Analysis #Any random set of genes in the file will separate the metastatic cancers from the others. #Some sets of genes will also separate the benign tumors from the primary cancers. #How can we choose a "good" set? and how many genes do we need in this set? Some ideas: # Use limma to compute the pairwise comparisons and take the 3 or 4 most significant genes from each comparison. # Look at lots of random sets and compute the misclassification rate for each (many are perfect). # Use a method like recursive partitioning which is stepwise.

Linear and Quadratic Discriminant Analysis #While we are here, lets try quadratic. qda.c=qda(t(sig.cancer[sample(1:9299,12),1:19]),groups)

Linear and Quadratic Discriminant Analysis #While we are here, lets try quadratic. qda.c=qda(t(sig.cancer[1:12,1:19]),groups) #QDA requires more data, because it needs to invert the within # group covariance matrix which has rank 1 less than sample size # We can try fewer predictors (genes) - 1 less than the minimum #sample size qda.c=qda(t(sig.cancer[1:5,1:19]),groups) predict(qca.c)

Recursive Partitioning library(rpart) rpart.c=rpart(groups~t(sig.cancer[,1:19])) plot(rpart.c) #rpart will not split groups that are already small. # 20 is considered small rpart.c=rpart(groups~t(sig.cancer[,1:19]), minsplit=5) plot(rpart.c) summary(rpart.c)

Recursive Partitioning #Again the choice of genes is pretty arbitrary rpart.c=rpart(groups~t(sig.cancer[sample(1:9299,100),1:19]), minsplit=5) plot(rpart.c) summary(rpart.c)

Prediction Analysis for Microarrays - PAM I downloaded pamr from the same site as SAM www-stat.stanford.edu/~tibs The package can be installed from the packages menu.

PAM #We need to format the data to pamr input pam.in=list(x=sig.cancer[,1:19], y=groups, genenames=rownames(sig.cancer), geneid=rownames(sig.cancer)) # We then "train" the data train.c=pamr.train(pam.in) #The centroids can be plotted at different cut-offs to see the "informative genes"

PAM pamr.plotcen(pamr.out, pam.in, threshold=10) pamr.plotcen(pamr.out, pam.in, threshold=4) #Cross-validation can be used to decide on a threshold cv.c<- pamr.cv(pamr.out, pam.in) pamr.plotcv(cv.c) #list the genes at the appropriate threshold pamr.listgenes(pamr.out,pam.in, threshold=8.0)

PAM #Unlike their example, we have lots of genes. Lets see if we can #reduce the number #The problem is the class that is easiest to predict - i.e. meta, so #lets just remove some of those genes from the data pam.gene=pamr.listgenes(pamr.out,pam.in, threshold=9.0) pam.keep=pam.gene[c(1:20,40:49,75),1] data.keep=which(pam.keep %in% pam.in$genenames) small.in=pam.in small.in$x=pam.in$x[data.keep,] small.in$y=pam.in$y[data.keep] small.in$genenames=pam.in$genenames[data.keep] small.in$geneid=pam.in$geneid[data.keep] pamr.plotcv(small.cv) pamr.plotcen(small.out,small.in,threshold=5.5)

small.out=pamr.train(small.in) pamr.cv(small.out,small.in) small.cv=pamr.cv(small.out, small.in) pamr.plotcv(small.cv) small.cv pamr.plotcen(small.out,small.in, threshold=2) # 2 errors with 25 genes # 1 error with 29 genes