Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells ES cell culture Self- renewing Ecto- derm.

Slides:



Advertisements
Similar presentations
Gene Correlation Networks
Advertisements

Using genetic markers to orient the edges in quantitative trait networks: the NEO software Steve Horvath dissertation work of Jason Aten Aten JE, Fuller.
Functional Organization of the Transcriptome in Human Brain Michael C. Oldham Laboratory of Daniel H. Geschwind, UCLA BIOCOMP ‘08, Las Vegas, NV July 15,
Andy Yip, Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Weighted Gene Co-Expression Network Analysis of Multiple Independent Lung Cancer Data Sets Steve Horvath University of California, Los Angeles.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Steve Horvath University of California, Los Angeles
Steve Horvath University of California, Los Angeles
Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Is Forkhead Box N1 (FOXN1) significant in both men and women diagnosed with Chronic Fatigue Syndrome? Charlyn Suarez.
Fuzzy K means.
Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e Steve Horvath Peter Langfelder University of California, Los Angeles.
Consensus eigengene networks: Studying relationships between gene co-expression modules across networks Peter Langfelder Dept. of Human Genetics, UC Los.
Empirical evaluation of prediction- and correlation network methods applied to genomic data Steve Horvath University of California, Los Angeles.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Supplemental Figure 1. Relationship between peaksize and signal values in the NCOR1 data set. Peaks with a signal value lower than 2 (red line) were discarded.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Ai Li and Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles Generalizations of.
An Overview of Weighted Gene Co-Expression Network Analysis
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Networks and Interactions Boo Virk v1.0.
Steve Horvath Co-authors: Zhang Y, Langfelder P, Kahn RS, Boks MPM, van Eijk K, van den Berg LH, Ophoff RA Aging effects on DNA methylation modules in.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Consensus modules: modules present across multiple data sets Peter Langfelder and Steve Horvath Eigengene networks for studying the relationships between.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Outline Molecular Cell Biology Assessment Review from last lecture Role of nucleoporins in transcription Activators and Repressors Epigenetic mechanisms.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Quest for epigenetic determinants of local coexpression clusters Wieslawa Mentzen Labrador and Corces, 2002.
Canadian Bioinformatics Workshops
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Graph clustering to detect network modules
M. Fu, G. Huang, Z. Zhang, J. Liu, Z. Zhang, Z. Huang, B. Yu, F. Meng 
Chromatin Control of Developmental Dynamics and Plasticity
Schedule for the Afternoon
Revealing Global Regulatory Perturbations across Human Cancers
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems
Volume 33, Issue 4, Pages (February 2009)
Volume 3, Issue 1, Pages (July 2016)
Mapping Global Histone Acetylation Patterns to Gene Expression
Volume 2, Issue 2, Pages (February 2008)
Volume 63, Issue 4, Pages (August 2016)
Revealing Global Regulatory Perturbations across Human Cancers
Cell-type Phylogenetics and the Origin of Endometrial Stromal Cells
Volume 46, Issue 1, Pages (April 2012)
Volume 136, Issue 2, Pages (January 2009)
Volume 37, Issue 6, Pages (December 2012)
Volume 132, Issue 6, Pages (March 2008)
Volume 122, Issue 6, Pages (September 2005)
Volume 7, Issue 2, Pages (August 2010)
Volume 17, Issue 3, Pages (October 2016)
Presentation transcript:

Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells ES cell culture Self- renewing Ecto- derm Meso- Endoderm Steve Horvath University of California, Los Angeles Acknowledgement: Dissertation work of Mike J Mason Guoping Fan, Kathrin Plath, Qing Zhou

Weighted Gene Co-Expression Network Analysis Contents Weighted Gene Co-Expression Network Analysis Application to stem cell data

How to construct a weighted gene co-expression network How to construct a weighted gene co-expression network? Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1

Undirected Network =Adjacency Matrix A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected. A is a symmetric matrix with entries in [0,1] For unweighted network, entries are 1 or 0 depending on whether or not 2 nodes are adjacent (connected) For weighted networks, the adjacency matrix reports the connection strength between gene pairs

Steps for constructing a co-expression network Gene expression data Measure concordance of gene expression with a Pearson correlation C) The Pearson correlation matrix is either dichotomized to arrive at an unweighted adjacency matrix  unweighted network Or transformed continuously with the power adjacency function  weighted network

Power adjacency function for constructing unsigned and signed weighted gene co-expr. networks Default values: beta=6 for unsigned and beta=12 for signed networks. Alternatively, use the “scale free topology criterion” described in Zhang and Horvath 2005.

Comparing adjacency functions for transforming the correlation into a measure of connection strength Unsigned Network Signed Network

Why soft thresholding as opposed to hard thresholding? Preserves the continuous information of the co-expression information Results tend to be more robust with regard to different threshold choices But hard thresholding has its own advantages: In particular, graph theoretic algorithms from the computer science community can be applied to the resulting networks

Question: Are signed correlation networks superior to unsigned networks? Answer: Overall, recent applications have convinced me that signed networks are preferable. For example, signed networks were critical in a recent stem cell application Michael J Mason, Kathrin Plath, Qing Zhou, SH (2009) Signed Gene Co-expression Networks for Analyzing Transcriptional Regulation in Murine Embryonic Stem Cells. BMC Genomics 2009, 10:327

Re-analysis of published microarray data sets Ivanova N, Dobrin R, Lu R, Kotenko L, Levorse J, DeCoste C, Schafer X, Lun Y, Lemischka I: Discecting self-renewal in stem cells with RNA interference.Nature 2006, 442:533-538 Zhou Q, Chipperfield H, Melton DA, Wong WH: A gene regulatrory network in mouse embryonic stem cells. Proc Natl Acad Sci 2007, 104(42):16438-16443.

ES Cell Datasets Used Ivanova et al.: RNA knockdown of 8 TFs thought to play a role in pluripotency Zhou et al.: ES cell samples and differentiated cell samples sorted into Oct4 positive and negative groups ES / Oct4+ Oct4- ES / Oct4+ Oct4-

How to detect network modules?

As default, we define modules as branches of a cluster tree We use average linkage hierarchical clustering which inputs a measure of interconnectedness often the topological overlap measure Once a dendrogram is obtained from a hierarchical clustering method, we define modules as branches using a branch cutting method dynamicTreeCut R package (Peter Langfelder et al 2007)

How to cut branches off a tree? Module=branch of a cluster tree Module genes are assigned the same color Bioinformatics 2008 24(5):719-720

Signed WGCNA finds a pluripotency related module, which cannot be found in an unsigned network analysis Pluripotency module

Question: How does one summarize the expression profiles in a module Question: How does one summarize the expression profiles in a module? Math answer: module eigengene = first principal component Network answer: the most highly connected intramodular hub gene Both turn out to be equivalent

Module Eigengene= measure of over-expression=average redness Rows,=genes, Columns=microarray The brown module eigengenes across samples

Eigengene-based connectivity, also known as kME or module membership measure kME(i) is simply the correlation between the i-th gene expression profile and the module eigengene. Very useful measure for annotating genes with regard to modules. Module eigengene turns out to be the most highly connected gene

What is weighted gene co-expression network analysis?

Relate modules to external information Construct a network Rationale: make use of interaction patterns between genes Identify modules Rationale: module (pathway) based analysis Relate modules to external information Array Information: RNAi knock-out Gene Information: gene ontology, DNA binding data, epigenetic Rationale: find biologically interesting modules Study Module Preservation across different data Rationale: Same data: to check robustness of module definition Example Ivanova versus Zhou data Find the key drivers in interesting modules Tools: intramodular connectivity kME Rationale: experimental validation, novel genes

What is different from other analyses? Emphasis on modules (pathways) instead of individual genes Greatly alleviates the problem of multiple comparisons Less than 20 comparisons versus 20000 comparisons Use of intramodular connectivity kME to find key drivers Quantifies module membership (centrality) If the module is preserved, intramodular hub genes are preserved as well Module definition is based on gene expression data only No prior pathway information is used for module definition Two module (eigengenes) can be highly correlated Typically defined by cutting branches of a cluster tree Emphasis on a unified approach for relating variables Default: power of a correlation Technical Details: soft thresholding with the power adjacency function, topological overlap matrix to measure interconnectedness

How to relate modules to external data?

Oct4 RNAi knock out status gives rise to a gene significance measure Possible definitions We defined a measure of gene significance (GS) as the t-statistic from the paired Student's t-test of expression in control RNAi samples and ES cell samples with RNAi knock down of Oct4 (paired by day of treatment) GS could also be a fold change GS(i)=|T-test(i)| of differential expression GS(i)=-log(p-value)

A gene significance naturally gives rise to a module significance measure Define module significance as mean gene significance Often highly related to the correlation between module eigengene and trait

The Black Module Contains Genes Involved in Pluripotency The genes of this module are significantly more likely to be bound by key regulators of pluripotency and self-renewal

The blue module contains transcription factors involved in differentiation

Module Membership and Binding Information in the Signed Ivanova et al (2006) Network. This file contains module membership, kME, and binding data from Loh et al (2006), Boyer et al (2007), and Chen et al (2008) for each gene on the microarray.

Signed WGCNA finds Novel Pathways Involved in Pluripotency in Zhou dataset Nup133 is ranked 29th by connectivity and 777th by fold change 28

Epigenetic Regulation and Module Membership Recent studies suggest that chromatin structure and epigenetic modifications, like histone modification and DNA methylation, play a role in controlling gene expression during ES cell self-renewal and differentiation. For example, gene repression by the PcG protein complex via histone H3 lysine 27 trimethylation (H3K27me3) is required for ES cell self-renewal and pluripotency. To understand how epigenetic variables contribute to the regulation of ES cells we studied the relationship of the pluripotency and differentiation modules with ES cell H3K4 and H3K27 trimethylation, DNA methylation, and CpG promoter content from previously published data sets. Data from Guenther et al. Cell 2007

Relating Module Membership to Epigenetic Regulation. The y-axis reports the proportion of top 1000 genes that are known to belong to the group of genes defined on the x-axis. Histone H3K4me3 trimethylation status is abbreviated K4, H3K27me3 trimethylation status is abbreviated by K27. Note that genes with promoter CpG methylation are significantly (p = 2.0 × 10-14) under-enriched with respect to the top 1000 black module genes.

Analysis of variance module membership (kME) versus epigenetic variables Source of Variation in kME kMEblack, Total Prop Var Explained = 8.3% kMEblue, Total Prop Var Explained = 4.2% Source Prop. Of Total Var p-value Prop. Of Total Var Histone Trimethylation (K4, K27,K4&K27) 0.067 < 2.2E-16 0.034 cMyc Complex 0.015 0.002 2.6E-04 Oct4 Complex 0.003 8.0E-08 0.001 7.5E-03 CPG class (HCP, ICP, LCP) 4.9E-04 0.005 6.0E-10 PcG Bound 0.000 8.7E-02 1.9E-01 CpG Methylated 7.1E-01 2.2E-02

Comparison of gene screening based on kME versus screening based on differential expression Venn diagrams show the amount of gene overlap between the top 1000 black (pluripotency) module genes and the top 1000 genes most significantly down-regulated upon Oct4 RNAi (left) gene overlap between the top 1000 blue (differentiation) module genes and the 1000 genes most significantly up-regulated with Oct4 RNAi (right). Ivanova et al data set. Grey: Standard differential expression analysis Green: genes with highest module membership kME Green= Black module genes

Module genes (green) have more significant enrichment than those found by a standard differential expression analysis

Conclusion Signed WGCNA has more consistent gene rankings between data sets, is better able to identify functionally enriched groups of genes Focus on module eigengenes circumvents the multiple testing problems that plague standard gene-based expression analysis. kME =module membership is very useful kME based gene screening identifies several novel stem cell related genes that would not have been found using a standard differential expression analysis kME is valuable for annotating genes with regard to module membership and for identifying genes related to pluripotency and differentiation Can be used as input of analysis of variance to dissect which factors contribute to module membership

Software and Data Availability R software tutorials etc can be found online Google search weighted co-expression network “WGCNA” “co-expression network” http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork

Acknowledgement Dissertation work of Mike J Mason Collaborators: Guoping Fan, Kathrin Plath, Qing Zhou WGCNA R package: Peter Langfelder