Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data Today you’ve heard quite a bit about weighted gene coexpression.

Slides:



Advertisements
Similar presentations
Using genetic markers to orient the edges in quantitative trait networks: the NEO software Steve Horvath dissertation work of Jason Aten Aten JE, Fuller.
Advertisements

Functional Organization of the Transcriptome in Human Brain Michael C. Oldham Laboratory of Daniel H. Geschwind, UCLA BIOCOMP ‘08, Las Vegas, NV July 15,
Andy Yip, Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Weighted Gene Co-Expression Network Analysis of Multiple Independent Lung Cancer Data Sets Steve Horvath University of California, Los Angeles.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Steve Horvath University of California, Los Angeles
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Is Forkhead Box N1 (FOXN1) significant in both men and women diagnosed with Chronic Fatigue Syndrome? Charlyn Suarez.
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e Steve Horvath Peter Langfelder University of California, Los Angeles.
Consensus eigengene networks: Studying relationships between gene co-expression modules across networks Peter Langfelder Dept. of Human Genetics, UC Los.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gene expression profiling identifies molecular subtypes of gliomas
Ai Li and Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles Generalizations of.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
An Overview of Weighted Gene Co-Expression Network Analysis
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Network Analysis and Application Yao Fu
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells ES cell culture Self- renewing Ecto- derm.
Steve Horvath University of California, Los Angeles Module preservation statistics.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Differential Network Analysis in Mouse Expression Data Tova Fuller Steve Horvath Department of Human Genetics University of California, Los Angeles BIOCOMP’07.
Steve Horvath Co-authors: Zhang Y, Langfelder P, Kahn RS, Boks MPM, van Eijk K, van den Berg LH, Ophoff RA Aging effects on DNA methylation modules in.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Networks of Neuronal Genes Affected by Common and Rare Variants in Autism Spectrum Disorders.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Introduction to biological molecular networks
Paper Review on Cross- species Microarray Comparison Hong Lu
The Broad Institute of MIT and Harvard Differential Analysis.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Consensus modules: modules present across multiple data sets Peter Langfelder and Steve Horvath Eigengene networks for studying the relationships between.
Systems Genetic Approaches for Studying Complex Traits
Finding genes in the genome
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Ping Wang, Mar Method Paper. Ping Wang, Mar Outline Methods –Multiple QTL model identification procedure –Adjacency Measurement –Clustering.
Steve Horvath University of California, Los Angeles Module preservation statistics.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Graph clustering to detect network modules
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Topological overlap matrix (TOM) plots of weighted, gene coexpression networks constructed from one mouse studies (A–F) and four human studies including.
The Impact of Network Medicine in Gastroenterology and Hepatology
Anastasia Baryshnikova  Cell Systems 
Volume 4, Issue 1, Pages e4 (January 2017)
Volume 3, Issue 1, Pages (July 2016)
Volume 37, Issue 6, Pages (December 2012)
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Volume 3, Issue 2, Pages (August 2014)
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data Today you’ve heard quite a bit about weighted gene coexpression network analysis. I’ll be talking about Differential Network Analysis, one application of WGCNA. Steve Horvath

Outline Standard differential expression analysis Statistical power studies Important network concepts Single versus differential network analysis Differential network construction First I’ll be discussing the what differential network analysis is, how it differs from single network analysis, and why we would use this method. I’ll move on to how such an analysis is implemented, Give an example of its application along with the results achieved And demonstrate the functional relevance of these results.

Standard (gene based) differential expression analysis Many software packages and R functions calculate T tests, p-values, false discovery rates, fold changes, etc. WGCNA R functions: For a binary trait (e.g. case control status), use standardScreeningBinaryTrait For a numeric trait (e.g. body weight), use standardScreeningNumericTrait For a right censored time variable, use standardScreeningCensoredTime

metaAnalysis R function in the WGCNA R package

helpfile metaAnalysis

Stouffer Z statistics from metaAnalysis

Ranking based metaAnalysis statistics

Combine several gene rankings using the rankPvalue function

Statistical Power Studies

Statistical power calculations According to google scholar, it was cited by 11708 (July 2013).

Network concept =network statistics

Network=Adjacency Matrix A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected. A is a symmetric matrix with entries in [0,1] For unweighted network, entries are 1 or 0 depending on whether or not 2 nodes are adjacent (connected) For weighted networks, the adjacency matrix reports the connection strength between node pairs Our convention: diagonal elements of A are all 1.

Motivational example I: Pair-wise relationships between genes across different mouse tissues and genders Challenge: Develop simple descriptive measures that describe the patterns. Solution: The following network concepts are useful: density, centralization, clustering coefficient, heterogeneity

Motivational example (continued) Challenge: Find a simple measure for describing the relationship between gene significance and connectivity Solution: network concept called hub gene significance

Backgrounds Network concepts are also known as network statistics or network indices Examples: connectivity (degree), clustering coefficient, topological overlap, etc Network concepts underlie network language and systems biological modeling. Dozens of potentially useful network concepts are known from graph theory.

Review of some fundamental network concepts which are defined for all networks (not just co-expression networks) Horvath 2011 Weighted Network Analysis. Springer Book. Hardcover ISBN: 978-1-4419-8818-8 Dong Horvath 2007 Understanding network concepts in modules BMC Syst Biol Horvath Dong (2008) Geometric Interpretation of Gene Co-expression network analysis. Plos Comp Biol

Connectivity Node connectivity = row sum of the adjacency matrix For unweighted networks=number of direct neighbors For weighted networks= sum of connection strengths to other nodes

Density Density= mean adjacency Highly related to mean connectivity

Centralization = 1 if the network has a star topology = 0 if all nodes have the same connectivity Centralization = 0 because all nodes have the same connectivity of 2 Centralization = 1 because it has a star topology

Heterogeneity Heterogeneity: coefficient of variation of the connectivity Highly heterogeneous networks exhibit hubs

Clustering Coefficient Measures the cliquishness of a particular node « A node is cliquish if its neighbors know each other » This generalizes directly to weighted networks (Zhang and Horvath 2005) Clustering Coef of the black node = 0 Clustering Coef = 1

The topological overlap dissimilarity is used as input of hierarchical clustering Mention that Ai Li worked on it. Generalized in Zhang and Horvath (2005) to the case of weighted networks Generalized in Li and Horvath (2006) to multiple nodes Generalized in Yip and Horvath (2007) to higher order interactions

Network Significance Defined as average gene significance We often refer to the network significance of a module network as module significance.

Maximum adjacency ratio

Network concepts for comparing two networks

Differential network concepts Node specific statistics: Diff.ClusterCoef(i) = CC1(i) – CC2(i) Diff.Mar(i)= MAR1(i) – MAR2(i) Global statistics Diff.MeanClusterCoef = Mean.CC1–Mean.CC2 Diff.MeanConnectivity=Mean.k1 – mean.k2 Diff.MeanMAR=Mean.MAR1 – mean.MAR2 Diff.MeanKME=Mean.KME Diff.Density=Density1 – Density2 can be calculated via the modulePreservation function

Measuring the similarity between two networks

R code for computing network concepts

R code, help file

Data analysis strategies Single network analysis versus differential network analysis

Goals of Single Network Analysis Identifying genetic pathways (modules) Finding key drivers (hub genes) Modeling the relationships between: Transcriptome Clinical traits / Phenotypes Genetic marker data

Single Network WGCNA 1 gene co-expression network Validation set 1 Validation set 2 1 gene co-expression network Multiple data sets may be used for validation

Goals of Differential Network Analysis Uncover differences in modules and connectivity in different data sets Ex: Human versus chimpanzee brains (Oldham et al. 2006) Differing topology in multiple networks reveals genes/pathways that are wired differently in different sample populations 7 Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, …(2007) "Weighted Gene Co-expression Network Analysis Strategies Applied to Mouse Weight", Mamm Genome. 18(6):463-472 Oldham MC, …Geschwind DH (2006) Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 103, 17973-17978.

Differential Network WGCNA 2+ gene co-expression networks Identify genes and pathways that are: Differentially expressed Differentially wired

BxH Mouse Data from AJ Lusis Single network analysis female BxH mice revealed a weight-related module (Ghazalpour et al. 2006) Samples: Constructed networks from mice from extrema of weight spectrum: Network 1: 30 leanest mice Network 2: 30 heaviest mice Transcripts: Used 3421 most connected and varying transcripts 135 FEMALES NETWORK 1 NETWORK 2 135 female mice, 3421 most connected and varying transcripts Ghazalpour A, Doss S, Zhang B, Wang S, Plaisier C, Castellanos R, Brozell A, Schadt EE, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS genetics 2, e130

Methods Compute Comparison Metrics Difference in expression: t-test statistic Compare difference in connectivity: DiffK Identify significantly different genes/pathways Permutation test Functional analysis of significant genes/pathways DAVID database Primary literature

Computing Comparison Metrics DIFFERENTIAL EXPRESSION t-test statistic computed for each gene, t(i) DIFFERENTIAL CONNECTIVITY K1(i) = k1(i) K2(i) = k2(i) max(k1) max(k2) DiffK(i): difference in normalized connectivities for each gene: DiffK(i) = K1(i) – K2(i)

Sector Plot We visualize the comparison metrics via a sector plot: x-axis: DiffK y-axis: t statistics We establish sector boundaries to identify regions of differentially expressed and/or connected regions |t| = 1.96 corresponding to p = 0.05 |DiffK| = 0.4

Permutation test: Identifying significant sectors NETWORK 1 NETWORK 2 no.perms: number of permutations For each sector j, we compare the number of genes in unpermuted and permuted sectors (nobs and nperm) PERMUTE

Sector Plot Results X 0.01 0.001

Functional Analysis SECTOR 3 High t statistic High DiffK Yellow module in lean Grey in obese (63 genes) SECTOR 5 Low t statistic High Diff K (28 genes) Genes in these sectors have higher connectivity in lean than obese mice: ~ pathways potentially disregulated in obesity ~

Sector 3: Functional Analysis Results DAVID Database “Extracellular”: extracellular region (38% of genes p = 1.8 x 10-4) extracellular space (34% of genes p = 5.7 x 10-4) signaling (36% of genes p = 5.4 x 10-4) cell adhesion (16% of genes p = 7.7 x 10-4) glycoproteins (34% of genes p = 1.6 x 10-3) 12 terms for epidermal growth factor or its related proteins EGF-like 1 (8.2% of genes p = 8.7 x 10-4), EGF-like 3 (6.6% of genes p = 1.6 x 10-3), EGF-like 2 (6.6% of genes p = 6.0 x 10-3), EGF (8.2% of genes p = 0.013) EGF_CA (6.6% of genes p = 0.015)

Sector 3: Functional Analysis Results Primary Literature Results supported by a study on EGF levels in mice (Kurachi et al. 1993) EGF found to be increased in obese mice Obesity was reversed in these mice by: Administration of anti-EGF Sialoadenectomy Kurachi H, Adachi H, Ohtsuka S, Morishige K, Amemiya K, Keno Y, Shimomura I, Tokunaga K, Miyake A, Matsuzawa Y, et al. (1993) Involvement of epidermal growth factor in inducing obesity in ovariectomized mice. The American journal of physiology 265, E323-331

Sector 5: Functional Analysis Results DAVID Database Enzyme inhibitor activity (p = 2.9 x 10-3)* Protease inhibitor activity (p = 6.0 x 10-3) Endopeptidase inhibitor activity (p = 6.0 x 10-3) Dephosphorylation (p = 0.012) Protein amino acid dephosphorylation (p = 0.012) Serine-type endopeptidase inhibitor activity (p = 0.042) * p values shown are corrected using Bonferroni correction

Sector 5: Functional Analysis Results Primary Literature Itih1 and Itih3 Enriched for all categories shown previously Located near a QTL for hyperinsulinemia (Almind and Kahn 2004) Itih3 identified as a gene candidate for obesity-related traits based on differential expression in murine hypothalamus (Bischof and Wevrick 2005) Serpina3n and Serpina10 Enriched for enzyme inhibitor, protease inhibitor, and endopeptidase inhibitor Serpina10, or Protein Z-dependent protease inhibitor (ZPI) has been found to be associated with venous thrombosis (Van de Water et al. 2004) Almind K, Kahn CR (2004) Genetic determinants of energy expenditure and insulin resistance in diet-induced obesity in mice. Diabetes 53, 3274-3285 Bischof JM, Wevrick R (2005) Genome-wide analysis of gene transcription in the hypothalamus. Physiological genomics 22, 191-196 Van de Water N, Tan T, Ashton F, O'Grady A, Day T, Browett P, Ockelford P, Harper P (2004) Mutations within the protein Z-dependent protease inhibitor gene are associated with venous thromboembolic disease: a new form of thrombophilia. Bjh 127, 190-194

Discussion If applicable, always report findings from a standard differential expression analysis as well. A host of network concepts exists for describing the network topology. Relatively few people use differential network analysis which may reflect the fact that large sample sizes are needed. A large sample size is needed to compare two correlation coefficients To check whether a module is preserved in another network use the modulePreservation function.

An R tutorial may be found at: Acknowledgements HORVATH LAB Dissertation work of Tova Fuller Jun Dong Peter Langfelder Mouse data collaboration LUSIS LAB Jake Lusis Anatole Ghazalpour Thomas Drake An R tutorial may be found at: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAnalysis