Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eigengenes as biological signatures

Similar presentations


Presentation on theme: "Eigengenes as biological signatures"— Presentation transcript:

1 Eigengenes as biological signatures
Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 15 September 2017 Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017

2 Bioinformatics: Computational and statistical analysis of biological data
Biologists Data Genotypes / Phenotypes Results Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017 2

3 A highly collaborative team
(External collaborators) Dr. Aly Karsan, MD Immunopathologist, British Columbia Cancer Agency Dr. Ron Walter Geneticist, Texas State University Dr. Kavitha Venkatesan, PhD Bioinformatician, Novartis Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017

4 Outline Large-scale gene network analysis reveals the role of extracellular matrix pathway and homeobox genes in acute myeloid leukemia. Similar approaches are useful in identifying low-risk breast cancer cases. Eigengenes as biological signature, Dr. Habil Zare, Oncinfo Lab Texas State University, 15 September 2017

5 AML Diagram by A. Rad Cancer here Acute myeloid leukemia (AML) is an aggressive type of blood cancer, which can cause death within months after diagnosis.

6 MDS Diagram by Cazzola Myelodysplastic syndromes (MDS) is less aggressive than AML but it can transform to AML with a risk probability of 30%.

7 Hypothesis Network analysis can reveal the biological differences between AML and MDS.

8 Overview of the methodology

9 Gene expression data

10 Gene expression data

11 Machine learning view Features

12 Expression data Discovery dataset: Microarray gene expression data of 202 AML-NK and 164 MDS cases from MILE study. Validation dataset: RNA-seq data of 52 AML-NK and 22 MDS cases from BCCA.

13 Identifying gene modules

14 Identifying gene modules
- We analyzed 9,166 differentially expressed genes in AML vs. MDS. - We considered a module as a set of highly correlated genes in AML, and identified 33 such modules.

15 Computing eigengenes

16 Principal component analysis
Summarizes the information of a high dimensional dataset (say, d=100) into a few vectors (usually 2-3 principal components).

17 Principal component analysis
Summarizes the information of a high dimensional dataset (say, d=100) into a few vectors (usually 2-3 principal components)

18 Computing eigengenes An eigengene summarizes a module. It is a weighted sum (linear combination) of expression of all genes in the corresponding module. We applied PCA on each module separately to compute its corresponding eigengene.

19 Computing eigengenes Eigengenes are differentially expressed in AML compared to MDS.

20 The Bayesian network

21 Bayesian network The Bayesian network shows the probabilistic dependencies between the modules and the disease type.

22 The decision tree

23 The decision tree Average expression of 113 genes Average expression of 42 genes ECM and HOXA&B eigengenes were automatically selected from the set of children of the Disease node to build a predictive model.

24 Validation in an independent dataset

25 Qualitative validation
MILE BCCA We inferred the expression of eigengenes in 52 AML and 22 MDS cases from BCCA dataset.

26 Qualitative validation
MILE BCCA Some of the eigengenes showed expression patterns similar to MILE dataset.

27 Quantitative validation
With the same thresholds, the tree classifies cases from both datasets.

28 Quantitative validation
We trained our model on MILE microarray dataset, and validated its performance on BCCA RNA-seq dataset. Although the platforms differ, performances are comparable indicating the robustness of our approach.

29 Validation using epigenetics
Among all genes in ECM pathway, MMP9 has the highest weight in the eigengene.

30 Validation using epigenetics
These 3 genes from matrix metalloproteinase (MMP) family are methylated in AML, which can explain their relatively lower expression.

31 Validation at the protein level
The expression of MMP9 protein is different in AML compared to MDS.

32 Robustness to noise Because an eigengene is based on the average expression of several genes, our approach is robust with respect to noise in expression profiles.

33 Robustness to noise { Even when 30% entries of the expression profile are replaced with noise, the accuracy drops only by 2%.

34 Kaplan-Meier survival curve
PMCID: PMC Shows the cumulative probability of survival at a given time.

35 Kaplan-Meier survival curve

36 Breast cancer risk factors
METABRIC discovery dataset METABRIC validation dataset MILLER dataset Two modules were automatically selected: A cell cycle associated module with 319 genes. A mysterious module with 26 genes, 24 in 9q34.

37 Breast cancer risk assessment
Using a similar approach, we could identify low-risk ER+ breast cancer cases with precision > 88% in 3 datasets.

38 Acknowledgments Oncinfo Lab Members In collaboration with
Dr. Habil Zare, PhD The PI Computational Biologist Dr. Amir Forpushani, PhD Postdoc, Rupesh Agrihari Grad student, Computer Science In collaboration with British Columbia Cancer Agency Dr. Aly Karsan, MD Hematopathologist Rod Docking Grad student, BCCA & UBC

39 Published in 2017

40

41 Alumni Oncinfo Lab Members Dr. Amir Forpushani, PhD Rupesh Agrihari
Postdoc, Computational Biologist Rupesh Agrihari Grad student, Computer Science Now: Software Developer at Wells Fargo, California Now: Computer Scientist at Laboratory of Immunology, NIAID, NIH

42

43

44

45

46 Current members Dr. Habil Zre, PhD The PI Computational Biologist
Gabriel Hurtado Undergrad student, Computer Science Bryan Shaw Grad student, Hanie Samimi

47 Future work We can apply a similar approach on fish RNA-seq data.
Identify gene modules using all available expression data including normal samples. Compute the eigengenes for each module. Investigate which eigengenes are associated with experiment conditions like dosage or wavelength. Perform overrepresentation analysis on the corresponding modules to determine the most relevant biological processes. Dosage Eigengene 5

48 References: Cazzola, Mario. "IDH1 and IDH2 mutations in myeloid neoplasms–Novel paradigms and clinical implications." Haematologica (2010): Haferlach, Torsten, et al. "Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group." Journal of Clinical Oncology (2010): Langfelder, Peter, and Steve Horvath. "WGCNA: an R package for weighted correlation network analysis." BMC bioinformatics 9.1 (2008): 1. Curtis, Christina, et al. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups." Nature (2012): Miller, Lance D., et al. "An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival." Proceedings of the National Academy of Sciences of the United States of America (2005):


Download ppt "Eigengenes as biological signatures"

Similar presentations


Ads by Google