Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)

Slides:



Advertisements
Similar presentations
Using genetic markers to orient the edges in quantitative trait networks: the NEO software Steve Horvath dissertation work of Jason Aten Aten JE, Fuller.
Advertisements

1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Inferring Causal Phenotype Networks Elias Chaibub Neto & Brian S. Yandell UW-Madison June 2010 QTL 2: NetworksSeattle SISG: Yandell ©
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Functional genomics and inferring regulatory pathways with gene expression data.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Seattle Summer Institute : Systems Genetics for Experimental Crosses Brian S. Yandell, UW-Madison Elias Chaibub Neto, Sage Bionetworks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei.
Epistasis Analysis Using Microarrays Chris Workman.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Bayesian causal phenotype network incorporating genetic variation and biological knowledge Brian S Yandell, Jee Young Moon University of Wisconsin-Madison.
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Network Analysis and Application Yao Fu
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
Gene Set Enrichment Analysis (GSEA)
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Quantile-based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March © YandellMSRC5.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Monsanto: Yandell © Building Bridges from Breeding to Biometry and Biostatistics Brian S. Yandell Professor of Horticulture & Statistics Chair of.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
Breaking Up is Hard to Do: Migrating R Project to CHTC Brian S. Yandell UW-Madison 22 November 2013.
Weighted models for insulin Detected by scanone Detected by Ping’s multiQTL model tissue# transcripts Islet1984 Adipose605 Liver485 Gastroc404 # transcripts.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison) Modules/Pathways1SISG (c) 2012 Brian S Yandell.
Introduction to biological molecular networks
Yandell © June Bayesian analysis of microarray traits Arabidopsis Microarray Workshop Brian S. Yandell University of Wisconsin-Madison
Yandell © 2003JSM Dimension Reduction for Mapping mRNA Abundance as Quantitative Traits Brian S. Yandell University of Wisconsin-Madison
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Causal Network Models for Correlated Quantitative Traits Brian S. Yandell UW-Madison October Jax SysGen: Yandell.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University.
Insulin’s Weighted Model 1984 transcripts’ weighted models overlap with insulin’s weighted model Ping Wang.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
Gene Mapping for Correlated Traits Brian S. Yandell University of Wisconsin-Madison Correlated TraitsUCLA Networks (c)
Causal Network Models for Correlated Quantitative Traits Brian S. Yandell UW-Madison October Jax SysGen: Yandell.
Graph clustering to detect network modules
EQTLs.
Quantile-based Permutation Thresholds for QTL Hotspots
Quantile-based Permutation Thresholds for QTL Hotspots
Computational Infrastructure for Systems Genetics Analysis
Causal Network Models for Correlated Quantitative Traits
Inferring Genetic Architecture of Complex Biological Processes Brian S
4. modern high throughput biology
1 Department of Engineering, 2 Department of Mathematics,
Inferring Causal Phenotype Networks
Causal Network Models for Correlated Quantitative Traits
Causal Network Models for Correlated Quantitative Traits
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Inferring Causal Phenotype Networks Driven by Expression Gene Mapping
Inferring Genetic Architecture of Complex Biological Processes Brian S
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Presentation transcript:

Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)

Weighted models for insulin Detected by scanone Detected by Ping’s multiQTL model tissue# transcripts Islet1984 Adipose605 Liver485 Gastroc404 # transcripts that match weighted insulin model in each of 4 tissues:

insulin main effects Ping Wang How many islet transcripts show this same genetic dependence at these loci? Chr 9Chr 12Chr 14 Chr 16Chr 17Chr 19 Chr 2

Expression Networks Zhang & Horvath (2005) organize expression traits using correlation adjacency connectivity topological overlap

Using the topological overlap matrix (TOM) to cluster genes – modules correspond to branches of the dendrogram TOM plot Hierarchical clustering dendrogram TOM matrix Module: Correspond to branches Genes correspond to rows and columns

module traits highly correlated adjacency attenuates correlation can separate positive, negative correlation summarize module – eigengene – weighted average of traits relate module – to clinical traits – map eigengene

advantages of Horvath modules emphasize modules (pathways) instead of individual genes – Greatly alleviates the problem of multiple comparisons – ~20 module comparisons versus 1000s of gene comparisons intramodular connectivity k i finds key drivers (hub genes) – quantifies module membership (centrality) – highly connected genes have an increased chance of validation module definition is based on gene expression data – no prior pathway information is used for module definition – two modules (eigengenes) can be highly correlated unified approach for relating variables – compare data sets on same mathematical footing scale-free: zoom in and see similar structure

modules for 1984 transcripts with similar genetic architecture as insulin contains the insulin trait Ping Wang

Islet – modules Insulin trait chromosomes

Islet – enrichment for modules chromosomes ModulePvalueQvalueCountSizeTerm BLUE biosynthetic process cellular lipid metabolic process lipid biosynthetic process lipid metabolic process GREEN phosphate transport intermediate filament-based process ion transport PURPLE nucleobase, nucleoside, nucleotide and nucleic acid metabolic process BLACK sensory perception of sound MAGENTA2.54E cell cycle process microtubule-based process mitotic cell cycle M phase cell division cell cycle phase mitosis M phase of mitotic cell cycle YELLOW cell projection organization and biogenesis cell part morphogenesis cell projection morphogenesis RED steroid hormone receptor signaling pathway reproductive process BROWN response to pheromone TURQUOISE enzyme linked receptor protein signaling pathway morphogenesis of an epithelium morphogenesis of embryonic epithelium anatomical structure morphogenesis PINK vesicle organization and biogenesis regulation of apoptosis Insulin

ontologies – Cellular component (GOCC) – Biological process (GOBP) – Molecular function (GOMF) hierarchy of classification – general to specific – based on extensive literature search, predictions prone to errors, historical inaccuracies

Bayesian causal phenotype network incorporating genetic variation and biological knowledge Brian S Yandell, Jee Young Moon University of Wisconsin-Madison Elias Chaibub Neto, Sage Bionetworks Xinwei Deng, VA Tech Sysgen Biological12SISG (c) 2012 Brian S Yandell

13 glucoseinsulin (courtesy AD Attie)‏ BTBR mouse is insulin resistant B6 is not make both obese… Alan Attie Biochemistry Sysgen Biological

bigger picture how do DNA, RNA, proteins, metabolites regulate each other? regulatory networks from microarray expression data – time series measurements or transcriptional perturbations – segregating population: genotype as driving perturbations goal: discover causal regulatory relationships among phenotypes use knowledge of regulatory relationships from databases – how can this improve causal network reconstruction? Sysgen Biological14SISG (c) 2012 Brian S Yandell

BxH ApoE-/- chr 2: hotspot 15Sysgen BiologicalSISG (c) 2012 Brian S Yandell x% threshold on number of traits Data: Ghazalpour et al.(2006) PLoS Genetics DNA  local gene  distant genes

causal model selection choices in context of larger, unknown network focal trait target trait focal trait target trait focal trait target trait focal trait target trait causal reactive correlated uncorrelated 16Sysgen BiologicalSISG (c) 2012 Brian S Yandell

causal architecture references BIC: Schadt et al. (2005) Nature Genet CIT: Millstein et al. (2009) BMC Genet Aten et al. Horvath (2008) BMC Sys Bio CMST: Chaibub Neto et al. (2010) PhD thesis – Chaibub Neto et al. (2012) Genetics (in review) Sysgen BiologicalSISG (c) 2012 Brian S Yandell17

Sysgen BiologicalSISG (c) 2012 Brian S Yandell18 BxH ApoE-/- study Ghazalpour et al. (2008) PLoS Genetics

Sysgen BiologicalSISG (c) 2012 Brian S Yandell19

QTL-driven directed graphs given genetic architecture (QTLs), what causal network structure is supported by data? R/qdg available at references – Chaibub Neto, Ferrara, Attie, Yandell (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: [doi:genetics ] – Ferrara et al. Attie (2008) Genetic networks of liver metabolism revealed by integration of metabolic and transcriptomic profiling. PLoS Genet 4: e [doi: /journal.pgen ] SISG (c) 2012 Brian S Yandell20Sysgen Biological

partial correlation (PC) skeleton Sysgen BiologicalSISG (c) 2012 Brian S Yandell21 1 st order partial correlations true graph correlations drop edge

partial correlation (PC) skeleton Sysgen BiologicalSISG (c) 2012 Brian S Yandell22 true graph 1 st order partial correlations 2 nd order partial correlations drop edge

edge direction: which is causal? Sysgen BiologicalSISG (c) 2012 Brian S Yandell23 due to QTL

Sysgen BiologicalSISG (c) 2012 Brian S Yandell24 test edge direction using LOD score

Sysgen BiologicalSISG (c) 2012 Brian S Yandell25 true graph reverse edges using QTLs

causal graphical models in systems genetics What if genetic architecture and causal network are unknown? jointly infer both using iteration Chaibub Neto, Keller, Attie, Yandell (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: [doi: /09-AOAS288] R/qtlnet available from Related references – Schadt et al. Lusis (2005 Nat Genet); Li et al. Churchill (2006 Genetics); Chen Emmert-Streib Storey(2007 Genome Bio); Liu de la Fuente Hoeschele (2008 Genetics); Winrow et al. Turek (2009 PLoS ONE); Hageman et al. Churchill (2011 Genetics) SISG (c) 2012 Brian S Yandell26Sysgen Biological

Basic idea of QTLnet iterate between finding QTL and network genetic architecture given causal network – trait y depends on parents pa(y) in network – QTL for y found conditional on pa(y) Parents pa(y) are interacting covariates for QTL scan causal network given genetic architecture – build (adjust) causal network given QTL – each direction change may alter neighbor edges SISG (c) 2012 Brian S Yandell27Sysgen Biological

missing data method: MCMC known phenotypes Y, genotypes Q unknown graph G want to study Pr(Y | G, Q) break down in terms of individual edges – Pr(Y|G,Q) = sum of Pr(Y i | pa(Y i ), Q) sample new values for individual edges – given current value of all other edges repeat many times and average results Sysgen BiologicalSISG (c) 2012 Brian S Yandell28

MCMC steps for QTLnet propose new causal network G – with simple changes to current network: – change edge direction – add or drop edge find any new genetic architectures Q – update phenotypes when parents pa(y) change in new G compute likelihood for new network and QTL – Pr(Y | G, Q) accept or reject new network and QTL – usual Metropolis-Hastings idea Sysgen BiologicalSISG (c) 2012 Brian S Yandell29

BxH ApoE-/- causal network for transcription factor Pscdbp causal trait 30 work of Elias Chaibub Neto Sysgen BiologicalSISG (c) 2012 Brian S Yandell Data: Ghazalpour et al.(2006) PLoS Genetics

scaling up to larger networks reduce complexity of graphs – use prior knowledge to constrain valid edges – restrict number of causal edges into each node make task parallel: run on many machines – pre-compute conditional probabilities – run multiple parallel Markov chains rethink approach – LASSO, sparse PLS, other optimization methods SISG (c) 2012 Brian S Yandell31Sysgen Biological

graph complexity with node parents SISG (c) 2012 Brian S Yandell32 pa2pa1 node of2 of3 of1 pa1 node of2of1 pa3 of3 Sysgen Biological

parallel phases for larger projects SISG (c) 2012 Brian S Yandell b2.1 … m 4.1 … 5 Phase 1: identify parents Phase 2: compute BICs BIC = LOD – penalty all possible parents to all nodes Phase 3: store BICs Phase 4: run Markov chains Phase 5: combine results Sysgen Biological

parallel implementation R/qtlnet available at Condor cluster: chtc.cs.wisc.edu – System Of Automated Runs (SOAR) ~2000 cores in pool shared by many scientists automated run of new jobs placed in project SISG (c) 2012 Brian S Yandell34 Phase 4Phase 2 Sysgen Biological

SISG (c) 2012 Brian S Yandell35 single edge updates 100,000 runs burnin Sysgen Biological

neighborhood edge reversal SISG (c) 2012 Brian S Yandell36 Grzegorczyk M. and Husmeier D. (2008) Machine Learning 71 (2-3), orphan nodes reverse edge find new parents select edge drop edge identify parents Sysgen Biological

SISG (c) 2012 Brian S Yandell37 neighborhood for reversals only 100,000 runs burnin Sysgen Biological

how to use functional information? functional grouping from prior studies – may or may not indicate direction – gene ontology (GO), KEGG – knockout (KO) panels – protein-protein interaction (PPI) database – transcription factor (TF) database methods using only this information priors for QTL-driven causal networks – more weight to local (cis) QTLs? SISG (c) 2012 Brian S Yandell38Sysgen Biological

modeling biological knowledge infer graph G Y from biological knowledge B – Pr(G Y | B, W) = exp( – W * |B–G Y |) / constant – B = prob of edge given TF, PPI, KO database derived using previous experiments, papers, etc. – G Y = 0-1 matrix for graph with directed edges W = inferred weight of biological knowledge – W=0: no influence; W large: assumed correct – P(W|B) =  exp(-  W) exponential Werhli and Husmeier (2007) J Bioinfo Comput Biol SISG (c) 2012 Brian S Yandell39Sysgen Biological

combining eQTL and bio knowledge probability for graph G and bio-weights W – given phenotypes Y, genotypes Q, bio info B Pr(G, W | Y, Q, B) = c Pr(Y|G,Q)Pr(G|B,W,Q)Pr(W|B) – Pr(Y|G,Q) is genetic architecture (QTLs) using parent nodes of each trait as covariates – Pr(G|B,W,Q) = Pr(G Y |B,W) Pr(G Q  Y |Q) Pr(G Y |B,W) relates graph to biological info Pr(G Q  Y |Q) relates genotype to phenotype Moon JY, Chaibub Neto E, Deng X, Yandell BS (2011) Growing graphical models to infer causal phenotype networks. In Probabilistic Graphical Models Dedicated to Applications in Genetics. Sinoquet C, Mourad R, eds. (in review) SISG (c) 2012 Brian S Yandell40Sysgen Biological

encoding biological knowledge B transcription factors, DNA binding (causation) p = p-value for TF binding of i  j truncated exponential ( ) when TF i  j uniform if no detection relationship Bernard, Hartemink (2005) Pac Symp Biocomp Sysgen BiologicalSISG (c) 2012 Brian S Yandell41

encoding biological knowledge B protein-protein interaction (association) post odds = prior odds * LR use positive and negative gold standards Jansen et al. (2003) Science Sysgen BiologicalSISG (c) 2012 Brian S Yandell42

encoding biological knowledge B gene ontology(association) GO = molecular function, processes of gene sim = maximum information content across common parents of pair of genes Lord et al. (2003) Bioinformatics Sysgen BiologicalSISG (c) 2012 Brian S Yandell43

MCMC with pathway information sample new network G from proposal R(G*|G) – add or drop edges; switch causal direction sample QTLs Q from proposal R(Q*|Q,Y) – e.g. Bayesian QTL mapping given pa(Y) accept new network (G*,Q*) with probability A = min(1, f(G,Q|G*,Q*)/ f(G*,Q*|G,Q)) – f(G,Q|G*,Q*) = Pr(Y|G*,Q*)Pr(G*|B,W,Q*)/R(G*|G)R(Q*|Q,Y) sample W from proposal R(W*|W) accept new weight W* with probability … Sysgen BiologicalSISG (c) 2012 Brian S Yandell44

ROC curve simulation open = QTLnet closed = phenotypes only Sysgen BiologicalSISG (c) 2012 Brian S Yandell45

integrated ROC curve 2x2: genetics pathways probability classifier ranks true > false edges Sysgen BiologicalSISG (c) 2012 Brian S Yandell46 = accuracy of B

weight on biological knowledge Sysgen BiologicalSISG (c) 2012 Brian S Yandell47 incorrectcorrectnon-informative

yeast data—partial success Sysgen BiologicalSISG (c) 2012 Brian S Yandell48 26 genes 36 inferred edges dashed: indirect (2) starred: direct (3) missed (39) reversed (0) Data: Brem, Kruglyak (2005) PNAS

phenotypic buffering of molecular QTL SysGen: OverviewSeattle SISG: Yandell © Fu et al. Jansen (2009 Nature Genetics)

limits of causal inference Computing costs already discussed Noisy data leads to false positive causal calls – Unfaithfulness assumption violated – Depends on sample size and omic technology – And on graph complexity (d = maximal path length i  j) – Profound limits Uhler C, Raskutti G, Buhlmann P, Yu B (2012 in prep) Geometry of faithfulness assumption in causal inference. Yang Li, Bruno M. Tesson, Gary A. Churchill, Ritsert C. Jansen (2010) Critical reasoning on causal inference in genome-wide linkage and association studies. Trends in Genetics 26: Sysgen BiologicalSISG (c) 2012 Brian S Yandell50

sizes for reliable causal inference genome wide linkage & association Sysgen BiologicalSISG (c) 2012 Brian S Yandell51 Li, Tesson, Churchill, Jansen (2010) Trends in Genetics

limits of causal inference unfaithful: false positive edges =min|cor(Y i,Y j )| = csqrt(dp/n) d=max degree p=# nodes n=sample size Sysgen BiologicalSISG (c) 2012 Brian S Yandell52 Uhler, Raskutti, Buhlmann, Yu (2012 in prep)

Thanks! Grant support – NIH/NIDDK 58037, – NIH/NIGMS 74244, – NCI/ICBP U54-CA – NIH/R01MH Collaborators on papers and ideas – Alan Attie & Mark Keller, Biochemistry – Karl Broman, Aimee Broman, Christina Kendziorski Sysgen BiologicalSISG (c) 2012 Brian S Yandell53