Microarray-based Disease Prognosis using Gene Annotation Signatures Michael Kovshilovsky Swapna Annavarapu SoCalBSI 2005.

Slides:



Advertisements
Similar presentations
Basic Gene Expression Data Analysis--Clustering
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
Microarray analysis as a prognostic and predictive tool: are we ready? Enzo Medico Laboratory of Functional Oncogenomics Institute for Cancer Research.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Exploring gene pathway interactions using SOM Keala Chan SoCalBSI August 20, 2004.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Lesson Fourteen Interpreting Scores. Contents Five Questions about Test Scores 1. The general pattern of the set of scores  How do scores run or what.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Pathway Analysis Michael Sneddon Southern California Bioinformatics Institute August 20, 2004.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Evaluation of Two Methods to Cluster Gene Expression Data Odisse Azizgolshani Adam Wadsworth Protein Pathways SoCalBSI.
Genetic Regulators of Large-scale Transcriptional Signatures in Cancer Presented by Mei Liu September 26, 2007.
Gene Expression Based Tumor Classification Using Biologically Informed Models ISI 2003 Berlin Claudio Lottaz und Rainer Spang Computational Diagnostics.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
Gene expression profiling identifies molecular subtypes of gliomas
From motif search to gene expression analysis
On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.
Bioinformatics Brad Windle Ph# Web Site:
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Supplemental figure 1: Correlation coefficients between signal intensities from biological replicates of wild.
1 Decision tree based classifications of heterogeneous lung cancer data Student: Yi LI Supervisor: Associate Prof. Jiuyong Li Data: 15 th May 2009.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Wang Y 1,2, Damaraju S 1,3,4, Cass CE 1,3,4, Murray D 3,4, Fallone G 3,4, Parliament M 3,4 and Greiner R 1,2 PolyomX Program 1, Department.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung,
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Cluster validation Integration ICES Bioinformatics.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Gene expression. Gene Expression 2 protein RNA DNA.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Scott Kopetz, MD, PhD Department of Gastrointestinal Medical Oncology
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Glossary of Technical Terms Correlation filter: a set of carefully designed correlation templates with regard to shift invariance as well as distortion-
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Gene Expression Profiling Brad Windle, Ph.D
Gene expression.
Molecular Classification of Cancer
Claudio Lottaz and Rainer Spang
Somi Jacob and Christian Bach
15.1 The Role of Statistics in the Research Process
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Robustness of TRU, Proximal-proliferative (PP), and Proximal-inflammatory (PI) classification. Robustness of TRU, Proximal-proliferative (PP), and Proximal-inflammatory.
Volume 26, Issue 12, Pages e5 (March 2019)
Functional classification and visualization of differentially expressed genes. Functional classification and visualization of differentially expressed.
Claudio Lottaz and Rainer Spang
SY-1425 shows similar response in RARA-high AML cell lines to APL
Presentation transcript:

Microarray-based Disease Prognosis using Gene Annotation Signatures Michael Kovshilovsky Swapna Annavarapu SoCalBSI 2005

Internship site: BioDiscovery, Inc. Mentor: Dr. Bruce Hoff Source of Funding: BioDiscovery, Inc.

Motivation Microarray gene-expression profiling studies to predict disease outcomes. –ex: cancer outcome To improve treatment of patients based on knowledge of gene-expression profile (molecular signature).

Lancet Paper “Prediction of cancer outcome with microarrays: a multiple random validation strategy” Findings of Stefan Michiels et al :- “Gene expression microarray-based predictors of clinical outcome have been poorly optimistic and careful review shows that performance is poor and variable.” - Analyzed data from the 7 largest published studies that have attempted to predict prognosis of cancer patients based on DNA microarray analysis. - Random sampling approach

Goal Reproduce the Lancet paper. Compare the classification based on expression levels of microarray probes, with classification based on GSEA scores of biological pathways. Validate our hypothesis:- –By abstracting away from the gene expression domain to that of biological properties, performance should stabilize and improve.

Phase I : Reproduce the Lancet Paper (Gene-Expression based classification)

Methodology Data loading Data preprocessing Data selection Correlating with clinical outcome Determine the molecular signature Classification of data

Data Loading Read Affymetrix chip expression data. Sample data:

Data Preprocessing Scaling –Identify the present, absent and marginal expressional levels. –scaling the average of the fluorescent intensities of all genes to a constant target intensity of –Expression values above capped to and the ones below 100 to 1. Filtration –Eliminate the genes with low or no variance Log transformation –Log 2 (values)

Preprocessed Data: Before After

Data Selection Training-Validation Approach:- –Training set for identifying the molecular signature. –Validation set for estimating the proportion of misclassifications. Therefore, such that, –Each set includes half the patients with and half without a favorable outcome. Dataset(N) Training(n)Validation(N-n) (Random selection)

Correlation Clinical outcome –Favorable = 1 (continuous complete remission) –Unfavorable = -1 (relapse) Correlate expression values of each gene with the clinical outcome –Pearson’s correlation coefficient Determined the molecular signature – defined by the top 50 highest correlated genes.

Data Classification (Nearest Centroid Prediction Rule) A new point is classified based on which centroid is nearest. Data is 50- dimensional. PCA plot is used to plot the data. Principle component analysis(PCA) is a powerful tool for analysing data by identifying patterns in it. Unfavorable Centroid Favorable Centroid

Results (cont’d.) Each of the 500 training sets provided a different molecular signature Plot of genes that occurred most frequently in the molecular signature.

Analysis The frequency of the genes participating in defining the signature is quite low. This suggests that the molecular signature is selected almost randomly and is unstable.

Phase II Analysis of Microarray data using GSEA (Gene Set Enrichment Analysis)

Methodology Data loading Data preprocessing Data selection GSEA – Determine enrichment scores Correlating with clinical outcome Classification of data

Preliminary steps Data loading Data preprocessing same as in phase I Data selection

GSEA Gene Set Enrichment Analysis –A microarray data analysis method that uses predefined gene sets and ranks of genes to identify significant biological changes in microarray data sets. – GSEA provides an enrichment score that measures the degree of enrichment of the gene set of a rank-ordered gene list derived from the data set.

GSEA (cont’d) GSEA Inputs: –List of genes ranked according to the expression difference between two classes. –a priori defined gene sets (ex. pathways), each consisting of members drawn from the list of genes. Ranking of genes is done using a distance metric, Signal-to-Noise ratio (SNR).

Signal to Noise ratio The signal-to-noise ratio method looks at the difference of the means in each of the classes scaled by the sum of the standard deviations: ((α)* sqrt(n)) ÷ σ where α (signal) is the difference in mean expressions of two classes and σ (noise) is the standard deviation.

Implementation Determine SNR for each microarray. Sort gene list based on SNR values. The degree of enrichment of the gene set is measured by comparing the SNR-ordered gene list with the gene set(pathways).

If gene is in gene set, increment rank by Y If gene is not in gene set, decrement rank by X X=√G/(N-G) Y=√(N-G)/G G=number of genes in set N=size of data Enrichment Score (ES) ES=greatest positive deviation of this running sum across all genes

Correlation & Classification Similar to phase I –First, the top 50 pathways are selected to create favorable and unfavorable centroids –Next, the training and validation set is classified based on the nearest-centroid prediction rule.

Results(cont’d.) Each of the 500 training sets provided a different molecular signature Plot of pathways that occurred in over 150 of the molecular signatures.

Results Average % =97.88%Average % =93.77% Gene ExpressionGene Set Based

Results (cont’d) Average % =93.80% Average % =96.45% Gene ExpressionGene Set Based

Results (cont’d) Average % =52.91% Average % =75.17% Gene ExpressionGene Set Based

Results (cont’d) Average % =26.48%Average % =47.76% Gene ExpressionGene Set Based

Three significant pathways Iron ion homeostasis –Reduces tumor angiogenesis by protecting cells from oxidative stress Unfolded protein response, positive regulation of target gene transcription –A stress-signaling pathway in tumor cells Tryptophan catabolism –Has an antiproliferative effect on many tumor cells

Conclusion Our results have shown that The centroid classification based on gene expression performs poorly with the validation set. The GSEA method does not perform any better than the gene expression method

Future Work Analysis with a different classification approach. Using much larger data sets from different samples.

Acknowledgements Dr. Bruce Hoff Dr. Soheil Shams SoCalBSI

References 1.Stefan Michiels, Serge Koscielny, Catherine Hill. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet, Vol. 365, 488–92 (2005). 2.Mootha, V. K., et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics, Vol. 34, (2003). 3. sea_algorithm.doc. 4. port.pdf 5.