Eigengenes as biological signatures

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
University of Washington Institute of Technology Tacoma, WA, USA Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France Isabelle.
The Broad Institute of MIT and Harvard Classification / Prediction.
A quick introduction to Oncinfo Lab Dr. Habil Zare, PhD PI of Oncinfo Lab Department of Computer Science Texas State University 18 September 2015.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Show & Tell Limsoon Wong Kent Ridge Digital Labs Singapore Role of Bioinformatics in the Genomic Era.
Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉
Topologically inferring risk-active pathways toward precise cancer classification by directed random walk Topologically inferring risk-active pathways.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Evolution-informed Modeling discover biomarkers for precision oncology Li Liu, M.D. August 22, 2016.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Graph clustering to detect network modules
David Amar, Tom Hait, and Ron Shamir
EQTLs.
Dr. Habil Zare, PhD Oncinfo Lab Texas State University 4 Dec 2014
Areas of Research Xia Jiang Associate Professor of
An Artificial Intelligence Approach to Precision Oncology
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Network analysis for AML data
Statistical Applications in Biology and Genetics
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Classifiers!!! BCH364C/394P Systems Biology / Bioinformatics
Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.
Gene expression.
Bioinformatics for biologists
Hallett, et al., - Supplementary Figure 1
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Gene Expression Classification
Molecular Classification of Cancer
Dept of Biomedical Informatics University of Pittsburgh
Impact of Formal Methods in Biology and Medicine Final Review
Claudio Lottaz and Rainer Spang
Areas of Research Xia Jiang Assistant Professor
Two splice-factor mutant leukemia subgroups uncovered at the boundaries of MDS and AML using combined gene expression and DNA-methylation profiling by.
Fig. 2 LYM attractor metagene.
Loyola Marymount University
Learning More from Microarrays: Insights from Modules and Networks
Volume 17, Issue 1, Pages (January 2010)
Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer  Meng Zhou, Long Hu, Zicheng.
Somi Jacob and Christian Bach
Diagnostics and Prognostics
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Volume 118, Issue 2, Pages (July 2004)
Loyola Marymount University
AZA treatment induces a distinct gene-expression pattern in stromal cells. AZA treatment induces a distinct gene-expression pattern in stromal cells. (A-C)
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Fig. 2 LYM attractor metagene.
Claudio Lottaz and Rainer Spang
Presentation transcript:

Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 15 September 2017 Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017

Bioinformatics: Computational and statistical analysis of biological data Biologists Data Genotypes / Phenotypes Results Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017 2

A highly collaborative team (External collaborators) Dr. Aly Karsan, MD Immunopathologist, British Columbia Cancer Agency Dr. Ron Walter Geneticist, Texas State University Dr. Kavitha Venkatesan, PhD Bioinformatician, Novartis Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017

Outline Large-scale gene network analysis reveals the role of extracellular matrix pathway and homeobox genes in acute myeloid leukemia. Similar approaches are useful in identifying low-risk breast cancer cases. Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017

AML Diagram by A. Rad Cancer here Acute myeloid leukemia (AML) is an aggressive type of blood cancer, which can cause death within months after diagnosis.

MDS Diagram by Cazzola Myelodysplastic syndromes (MDS) is less aggressive than AML but it can transform to AML with a risk probability of 30%.

Hypothesis Network analysis can reveal the biological differences between AML and MDS.

Overview of the methodology

Gene expression data

Gene expression data

Machine learning view Features

Expression data Discovery dataset: Microarray gene expression data of 202 AML-NK and 164 MDS cases from MILE study. Validation dataset: RNA-seq data of 52 AML-NK and 22 MDS cases from BCCA.

Identifying gene modules

Identifying gene modules - We analyzed 9,166 differentially expressed genes in AML vs. MDS. - We considered a module as a set of highly correlated genes in AML, and identified 33 such modules.

Computing eigengenes

Principal component analysis Summarizes the information of a high dimensional dataset (say, d=100) into a few vectors (usually 2-3 principal components). http://austingwalters.com/pca-principal-component-analysis/

Principal component analysis Summarizes the information of a high dimensional dataset (say, d=100) into a few vectors (usually 2-3 principal components)

Computing eigengenes An eigengene summarizes a module. It is a weighted sum (linear combination) of expression of all genes in the corresponding module. We applied PCA on each module separately to compute its corresponding eigengene.

Computing eigengenes Eigengenes are differentially expressed in AML compared to MDS.

The Bayesian network

Bayesian network The Bayesian network shows the probabilistic dependencies between the modules and the disease type.

The decision tree

The decision tree Average expression of 113 genes Average expression of 42 genes ECM and HOXA&B eigengenes were automatically selected from the set of children of the Disease node to build a predictive model.

Validation in an independent dataset

Qualitative validation MILE BCCA We inferred the expression of eigengenes in 52 AML and 22 MDS cases from BCCA dataset.

Qualitative validation MILE BCCA Some of the eigengenes showed expression patterns similar to MILE dataset.

Quantitative validation With the same thresholds, the tree classifies cases from both datasets.

Quantitative validation We trained our model on MILE microarray dataset, and validated its performance on BCCA RNA-seq dataset. Although the platforms differ, performances are comparable indicating the robustness of our approach.

Validation using epigenetics Among all genes in ECM pathway, MMP9 has the highest weight in the eigengene.

Validation using epigenetics These 3 genes from matrix metalloproteinase (MMP) family are methylated in AML, which can explain their relatively lower expression.

Validation at the protein level The expression of MMP9 protein is different in AML compared to MDS.

Robustness to noise Because an eigengene is based on the average expression of several genes, our approach is robust with respect to noise in expression profiles.

Robustness to noise { Even when 30% entries of the expression profile are replaced with noise, the accuracy drops only by 2%.

Kaplan-Meier survival curve PMCID: PMC3059453 Shows the cumulative probability of survival at a given time.

Kaplan-Meier survival curve

Breast cancer risk factors METABRIC discovery dataset METABRIC validation dataset MILLER dataset Two modules were automatically selected: A cell cycle associated module with 319 genes. A mysterious module with 26 genes, 24 in 9q34.

Breast cancer risk assessment Using a similar approach, we could identify low-risk ER+ breast cancer cases with precision > 88% in 3 datasets.

Acknowledgments Oncinfo Lab Members In collaboration with Dr. Habil Zare, PhD The PI Computational Biologist Dr. Amir Forpushani, PhD Postdoc, Rupesh Agrihari Grad student, Computer Science In collaboration with British Columbia Cancer Agency Dr. Aly Karsan, MD Hematopathologist Rod Docking Grad student, BCCA & UBC

Published in 2017

Alumni Oncinfo Lab Members Dr. Amir Forpushani, PhD Rupesh Agrihari Postdoc, Computational Biologist Rupesh Agrihari Grad student, Computer Science Now: Software Developer at Wells Fargo, California Now: Computer Scientist at Laboratory of Immunology, NIAID, NIH

Current members Dr. Habil Zre, PhD The PI Computational Biologist Gabriel Hurtado Undergrad student, Computer Science Bryan Shaw Grad student, Hanie Samimi

Future work We can apply a similar approach on fish RNA-seq data. Identify gene modules using all available expression data including normal samples. Compute the eigengenes for each module. Investigate which eigengenes are associated with experiment conditions like dosage or wavelength. Perform overrepresentation analysis on the corresponding modules to determine the most relevant biological processes. Dosage Eigengene 5

References: Cazzola, Mario. "IDH1 and IDH2 mutations in myeloid neoplasms–Novel paradigms and clinical implications." Haematologica 95.10 (2010): 1623-1627. Haferlach, Torsten, et al. "Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group." Journal of Clinical Oncology 28.15 (2010): 2529-2537. Langfelder, Peter, and Steve Horvath. "WGCNA: an R package for weighted correlation network analysis." BMC bioinformatics 9.1 (2008): 1. Curtis, Christina, et al. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups." Nature 486.7403 (2012): 346-352. Miller, Lance D., et al. "An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival." Proceedings of the National Academy of Sciences of the United States of America 102.38 (2005): 13550-13555.