Consensus modules: modules present across multiple data sets Peter Langfelder and Steve Horvath Eigengene networks for studying the relationships between.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Making Sense of Novel Prognostics: NOTCH1, SF3B1 Jennifer R Brown, MD PhD Director, CLL Center Dana-Farber Cancer Institute October 24, 2014.
The 70-Gene Profile and Chemotherapy Benefit in 1,600 Breast Cancer Patients Bender RA et al. ASCO 2009; Abstract 512. (Oral Presentation)
Functional Organization of the Transcriptome in Human Brain Michael C. Oldham Laboratory of Daniel H. Geschwind, UCLA BIOCOMP ‘08, Las Vegas, NV July 15,
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Microarrays Dr Peter Smooker,
Weighted Gene Co-Expression Network Analysis of Multiple Independent Lung Cancer Data Sets Steve Horvath University of California, Los Angeles.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e Steve Horvath Peter Langfelder University of California, Los Angeles.
Consensus eigengene networks: Studying relationships between gene co-expression modules across networks Peter Langfelder Dept. of Human Genetics, UC Los.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Empirical evaluation of prediction- and correlation network methods applied to genomic data Steve Horvath University of California, Los Angeles.
Analysis of microarray data
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Manolis Kellis Broad Institute of MIT and Harvard
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Surrogate Endpoints and Correlative Outcomes Hem/Onc Journal Club January 9, 2009.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Steve Horvath University of California, Los Angeles Module preservation statistics.
Hierarchical Clustering and Dynamic Branch Cutting
Networks and Interactions Boo Virk v1.0.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Steve Horvath Co-authors: Zhang Y, Langfelder P, Kahn RS, Boks MPM, van Eijk K, van den Berg LH, Ophoff RA Aging effects on DNA methylation modules in.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Marco Magistri , Journal Club. A non-coding RNA (ncRNA) is any RNA molecule that is not translated into a protein “Structural genes encode proteins.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
Apostolos Zaravinos and Constantinos C Deltas Molecular Medicine Research Center and Laboratory of Molecular and Medical Genetics, Department of Biological.
Gene Expression Signatures for Prognosis in NSCLC, Coupled with Signatures of Oncogenic Pathway Deregulation, Provide a Novel Approach for Selection of.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Networks of Neuronal Genes Affected by Common and Rare Variants in Autism Spectrum Disorders.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Date of download: 5/29/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Gene Expression Signatures, Clinicopathological Features,
Annals of Oncology 23: 298–304, 2012 종양혈액내과 R4 김태영 / prof. 김시영.
Steve Horvath University of California, Los Angeles Module preservation statistics.
 We investigated for biomarkers that distinguish metastatic or recurring disease with non-metastatic disease, with a particular focus on breast cancer.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉
Graph clustering to detect network modules
Gene expression.
Nat. Rev. Clin. Oncol. doi: /nrclinonc
UHRF1 is regulated by miR-9 in colorectal cancer
Figure 2 The association between CD8+ T‑cell density of the tumour
Loyola Marymount University
Volume 3, Issue 1, Pages (July 2016)
Molecular prognostication of liver cancer: End of the beginning
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Characteristics of tissue‐specific co‐expression networks (CNs)‏
Loyola Marymount University
CUP is a clinico-pathological syndrome of many specific cancer
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Derivation of ImSig. Derivation of ImSig. A, An example of a correlation network generated from a tissue data set where nodes represent unique genes and.
Presentation transcript:

Consensus modules: modules present across multiple data sets Peter Langfelder and Steve Horvath Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54

Given several independent data sets, find modules that are present in all (or a specified majority) of the data sets Rationale Find co-expression patterns common to multiple studied conditions Find common, robustly defined modules across several independent data sets (e.g., from GEO) that study the same conditions Why consensus modules?

Modules group together densely interconnected genes Consensus modules group together genes densely connected in all conditions Our solution: find the consensus gene-gene similarity and use it with clustering to find modules Finding consensus modules

, Network 1Network 2 Finding consensus modules Calibrate input networks to make them comparable

= ), Network 1Network 2 Consensus Finding consensus modules Calibrate input networks to make them comparable For 2-3 sets: Take component-wise minimum Component-wise min pmin() (

component-wise quantile pquantile() (, Network 1 Network 2 Consensus Finding consensus modules Calibrate input networks to make them comparable For 4 sets or more: suitable quantile (for example, quartile),,... Network 3 Network 2 ) = Network 4,

=min ), Network 1 Network 2 Consensus Finding consensus modules Consensus modules are defined from clustering of consensus similarity (

R implementation: blockwiseConsensusModules Input: Expression data in "multi-set" format Options for splitting data into smaller blocks if there are too many genes to be handled in one block ("blockwise") Network construction options for constructing individual networks Network calibration options Consensus quantile

R implementation: blockwiseConsensusModules Output: Consensus module labels, Gene clustering tree (or trees if the data was split into blocks) Module eigengenes Other diagnostic output Introductory tutorial: Consensus analysis of female and male liver expression data (Tutorial II) at labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials

Consensus modules vs. Module preservation statistics Consensus modules are by construction present (i.e., preserved) in all (or most) input data sets If a module identified in a reference data set is strongly preserved in test data set(s), it would also be a consensus module among the reference and test sets Consensus module construction treats all data sets the same; module preservation statistics require a reference and a test data set(s) Consensus modules are best suited to answer a different set of questions than module preservation statistics

Application 1: Consensus modules across multiple lung cancer data sets

Eight publicly available lung cancer sets 4 independent data sets described in Shedden et al, Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med Aug;14(8): Epub 2008 Jul 20. (Affy U133A) Bild et al, Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature Jan 19;439(7074):353-7 (Affy U133plus2) Tomida el al, Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis. J Clin Oncol 2009 Jun 10;27(17): (Agilent ) Takeuchi et al, Expression profile-defined classification of lung adeno- carcinoma shows close relationship with underlying major genetic changes and clinicopathologic behaviors. J Clin Oncol Apr 10;24(11): (Agilent 21.6K custom array) Roepman et al, An immune response enriched 72-gene prognostic profile for early-stage non-small-cell lung cancer. Clin Cancer Res Jan 1;15(1): (Agilent )

Eight publicly available lung cancer sets Concordance between the sets is poor Difficult to find genes consistently related to survival time For consistency: restrict analysis to adenocarcinoma Since we have 8 sets, we use the quartile instead of the minimum in consensus network construction

Consensus modules across 8 data sets Red: Upregulated in patients with short survival time Blue: Downregulated in patients with short survival time

Association of consensus modules with survival deviance Cell cycle module shows weak but consistent association with survival time (meta-analysis p-value = 6e-7, highly significant) Reflects the known fact that fast-proliferating cancers indicate a poor prognosis for patient

Using consensus modules to study genes related to a clinical trait Challenge: how to identify functional categories that associate with a trait (survival time) Can use meta-analysis to select genes related to survival time across all data sets, then study their enrichment

Using consensus modules to study genes related to a clinical trait Can also study enrichment of genes with highest connectivity in survival- associated module Challenge: how to identify functional categories that associate with a trait (survival time) Can use meta-analysis to select genes related to survival time across all data sets, then study their enrichment

Using consensus modules to study genes related to a clinical trait Can also study enrichment of genes with highest connectivity in survival- associated module Find much higher enrichment Consensus module analysis can lead to better biological insights when pooling several gene expression data sets Challenge: how to identify functional categories that associate with a trait (survival time) Can use meta-analysis to select genes related to survival time across all data sets, then study their enrichment

Using consensus modules to study genes related to a clinical trait Can also study enrichment of genes with highest connectivity in survival- associated module Find much higher enrichment Consensus module analysis can lead to better biological insights when pooling several gene expression data sets Challenge: how to identify functional categories that associate with a trait (survival time) Can use meta-analysis to select genes related to survival time across all data sets, then study their enrichment

Application 2: Consensus modules across 4 brain regions in Huntington's Disease patients and controls

Data Huntington's disease primarily affects motor skills Biggest changes are observed in Caudate Nucleus (CN), much smaller changes in Cortex and Cerebellum (CB) Disease causes dying of neurons and increase in astrocytes/ oligondendrocytes (inflammatory response) Hodges et al (2006): Measured expression in Caudate Nucleus, Motor Cortex, Association Cortex, Cerebellum in HD patients and controls Here: consensus analysis of the 4 data sets Hodges A et al, Regional and cellular gene expression changes in human Huntington's disease brain. Hum Mol Genet Mar 15;15(6):965-77

Consensus network and modules Two large branches that correspond to neurons and astro/oligodendrocytes Red: underexpressed in HD; blue: overexpressed in HD

Significance of modules for disease status Message: The disease effect is strongest in Caudate Nucleus and Motor Cortex, consistent with HD manifestations

Significance of modules for disease status The modules relate to disease status similarly across all tissues

Study meta-networks built from eigengenes of consensus modules Recall: modules are represented by their eigengenes (singular vectors obtained from SVD) Each consensus module has one eigengene in each data set In each data set: correlation among module eigengenes gives a bird- eye view of the entire gene network: a correlation matrix of 12k genes is reduced to a correlation matrix of 10 eigengenes Correlation of eigengenes reflects how the underlying pathways, processes, cell types, etc work together It may be interesting to study how eigengene correlation changes between data sets Langfelder and Horvath, Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54

Preservation of eigengene networks between brain regions Eigengene network defined as a signed network with power β=1: Preservation network: measures how much eigengene correlation varies among data sets Mean preservation: measures overall preservation of eigengene networks among data sets

Preservation of eigengene networks between CN and Association CTX Preservation is relatively high: D=0.88 Preservation of networks among neuronal modules is lower than the networks among inflammation-related modules We did not discover a cure for HD, but certainly have "Food for thought" for biologists

What have we learned? Cancer application: consensus analysis identifies a module consistently related to survival time The module provides a cleaner biological interpretation than genes identified using standard meta-analysis Huntington's disease application: Consensus module analysis shows that HD affects different brain regions in a broadly similar manner but also shows differences in the way the regions are affected Consensus eigengene network analysis provides a way to study commonalities and differences in network organization of gene expression or other genomic data

References Consensus modules and eigengene networks: Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 1:54 labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/EigengeneNetwork/ Cancer application of consensus modules: Langfelder P, Mischel PS, Horvath S (2013) When Is Hub Gene Selection Better than Standard Meta-Analysis? PLoS ONE 8(4): e labs.genetics.ucla.edu/horvath/CoexpressionNetwork/MetaAnalysis/