Bayesian causal phenotype network incorporating genetic variation and biological knowledge Brian S Yandell, Jee Young Moon University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
SAMSI Program Beyond Bioinformatics: Statistical and Mathematical Challenges Topic: Data Integration Katerina Kechris, PhD Associate Professor.
Advertisements

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Inferring Causal Phenotype Networks Elias Chaibub Neto & Brian S. Yandell UW-Madison June 2010 QTL 2: NetworksSeattle SISG: Yandell ©
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Functional genomics and inferring regulatory pathways with gene expression data.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
6. Gene Regulatory Networks
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei.
Epistasis Analysis Using Microarrays Chris Workman.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Quantile-based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March © YandellMSRC5.
Professor Brian S. Yandell joint faculty appointment across colleges: –50% Horticulture (CALS) –50% Statistics (Letters & Sciences) Biometry Program –MS.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Monsanto: Yandell © Building Bridges from Breeding to Biometry and Biostatistics Brian S. Yandell Professor of Horticulture & Statistics Chair of.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Breaking Up is Hard to Do: Migrating R Project to CHTC Brian S. Yandell UW-Madison 22 November 2013.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison) Modules/Pathways1SISG (c) 2012 Brian S Yandell.
Introduction to biological molecular networks
BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling18-25.
Yandell © June Bayesian analysis of microarray traits Arabidopsis Microarray Workshop Brian S. Yandell University of Wisconsin-Madison
Yandell © 2003JSM Dimension Reduction for Mapping mRNA Abundance as Quantitative Traits Brian S. Yandell University of Wisconsin-Madison
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Causal Network Models for Correlated Quantitative Traits Brian S. Yandell UW-Madison October Jax SysGen: Yandell.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University.
Yandell © 2003Wisconsin Symposium III1 QTLs & Microarrays Overview Brian S. Yandell University of Wisconsin-Madison
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Gene Mapping for Correlated Traits Brian S. Yandell University of Wisconsin-Madison Correlated TraitsUCLA Networks (c)
Causal Network Models for Correlated Quantitative Traits Brian S. Yandell UW-Madison October Jax SysGen: Yandell.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
EQTLs.
Quantile-based Permutation Thresholds for QTL Hotspots
Quantile-based Permutation Thresholds for QTL Hotspots
Computational Infrastructure for Systems Genetics Analysis
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison.
Causal Network Models for Correlated Quantitative Traits
Inferring Genetic Architecture of Complex Biological Processes Brian S
4. modern high throughput biology
1 Department of Engineering, 2 Department of Mathematics,
Inferring Causal Phenotype Networks
Causal Network Models for Correlated Quantitative Traits
Causal Network Models for Correlated Quantitative Traits
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Inferring Causal Phenotype Networks Driven by Expression Gene Mapping
Inferring Genetic Architecture of Complex Biological Processes Brian S
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Principle of Epistasis Analysis
Presentation transcript:

Bayesian causal phenotype network incorporating genetic variation and biological knowledge Brian S Yandell, Jee Young Moon University of Wisconsin-Madison Elias Chaibub Neto, Sage Bionetworks Xinwei Deng, VA Tech 8 May 20121Oslo, Norway (c) 2012 Brian S Yandell

Monsanto: Yandell © glucoseinsulin (courtesy AD Attie)‏ BTBR mouse is insulin resistant B6 is not make both obese… Alan Attie Biochemistry

bigger picture how do DNA, RNA, proteins, metabolites regulate each other? regulatory networks from microarray expression data – time series measurements or transcriptional perturbations – segregating population: genotype as driving perturbations goal: discover causal regulatory relationships among phenotypes use knowledge of regulatory relationships from databases – how can this improve causal network reconstruction? 8 May 20123Oslo, Norway (c) 2012 Brian S Yandell

BxH ApoE-/- chr 2: hotspot 48 May 2012Oslo, Norway (c) 2012 Brian S Yandell x% threshold on number of traits Data: Ghazalpour et al.(2006) PLoS Genetics DNA  local gene  distant genes

QTL-driven directed graphs given genetic architecture (QTLs), what causal network structure is supported by data? R/qdg available at references – Chaibub Neto, Ferrara, Attie, Yandell (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: [doi:genetics ] – Ferrara et al. Attie (2008) Genetic networks of liver metabolism revealed by integration of metabolic and transcriptomic profiling. PLoS Genet 4: e [doi: /journal.pgen ] Oslo, Norway (c) 2012 Brian S Yandell58 May 2012

partial correlation (PC) skeleton 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell6 1 st order partial correlations true graph correlations drop edge

partial correlation (PC) skeleton 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell7 true graph 1 st order partial correlations 2 nd order partial correlations drop edge

edge direction: which is causal? 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell8 due to QTL

8 May 2012Oslo, Norway (c) 2012 Brian S Yandell9 test edge direction using LOD score

8 May 2012Oslo, Norway (c) 2012 Brian S Yandell10 true graph reverse edges using QTLs

causal graphical models in systems genetics What if genetic architecture and causal network are unknown? jointly infer both using iteration Chaibub Neto, Keller, Attie, Yandell (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: [doi: /09-AOAS288] R/qtlnet available from Related references – Schadt et al. Lusis (2005 Nat Genet); Li et al. Churchill (2006 Genetics); Chen Emmert-Streib Storey(2007 Genome Bio); Liu de la Fuente Hoeschele (2008 Genetics); Winrow et al. Turek (2009 PLoS ONE); Hageman et al. Churchill (2011 Genetics) Oslo, Norway (c) 2012 Brian S Yandell118 May 2012

Basic idea of QTLnet iterate between finding QTL and network genetic architecture given causal network – trait y depends on parents pa(y) in network – QTL for y found conditional on pa(y) Parents pa(y) are interacting covariates for QTL scan causal network given genetic architecture – build (adjust) causal network given QTL – each direction change may alter neighbor edges Oslo, Norway (c) 2012 Brian S Yandell128 May 2012

missing data method: MCMC known phenotypes Y, genotypes Q unknown graph G want to study Pr(Y | G, Q) break down in terms of individual edges – Pr(Y|G,Q) = sum of Pr(Y i | pa(Y i ), Q) sample new values for individual edges – given current value of all other edges repeat many times and average results 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell13

MCMC steps for QTLnet propose new causal network G – with simple changes to current network: – change edge direction – add or drop edge find any new genetic architectures Q – update phenotypes when parents pa(y) change in new G compute likelihood for new network and QTL – Pr(Y | G, Q) accept or reject new network and QTL – usual Metropolis-Hastings idea 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell14

BxH ApoE-/- causal network for transcription factor Pscdbp causal trait 15 work of Elias Chaibub Neto 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell Data: Ghazalpour et al.(2006) PLoS Genetics

scaling up to larger networks reduce complexity of graphs – use prior knowledge to constrain valid edges – restrict number of causal edges into each node make task parallel: run on many machines – pre-compute conditional probabilities – run multiple parallel Markov chains rethink approach – LASSO, sparse PLS, other optimization methods Oslo, Norway (c) 2012 Brian S Yandell168 May 2012

graph complexity with node parents Oslo, Norway (c) 2012 Brian S Yandell17 pa2pa1 node of2 of3 of1 pa1 node of2of1 pa3 of3 8 May 2012

parallel phases for larger projects Oslo, Norway (c) 2012 Brian S Yandell b2.1 … m 4.1 … 5 Phase 1: identify parents Phase 2: compute BICs BIC = LOD – penalty all possible parents to all nodes Phase 3: store BICs Phase 4: run Markov chains Phase 5: combine results 8 May 2012

parallel implementation R/qtlnet available at Condor cluster: chtc.cs.wisc.edu – System Of Automated Runs (SOAR) ~2000 cores in pool shared by many scientists automated run of new jobs placed in project Oslo, Norway (c) 2012 Brian S Yandell19 Phase 4Phase 2 8 May 2012

Oslo, Norway (c) 2012 Brian S Yandell20 single edge updates 100,000 runs burnin 8 May 2012

neighborhood edge reversal Oslo, Norway (c) 2012 Brian S Yandell21 Grzegorczyk M. and Husmeier D. (2008) Machine Learning 71 (2-3), orphan nodes reverse edge find new parents select edge drop edge identify parents 8 May 2012

Oslo, Norway (c) 2012 Brian S Yandell22 neighborhood for reversals only 100,000 runs burnin 8 May 2012

how to use functional information? functional grouping from prior studies – may or may not indicate direction – gene ontology (GO), KEGG – knockout (KO) panels – protein-protein interaction (PPI) database – transcription factor (TF) database methods using only this information priors for QTL-driven causal networks – more weight to local (cis) QTLs? Oslo, Norway (c) 2012 Brian S Yandell238 May 2012

modeling biological knowledge infer graph G Y from biological knowledge B – Pr(G Y | B, W) = exp( – W * |B–G Y |) / constant – B = prob of edge given TF, PPI, KO database derived using previous experiments, papers, etc. – G Y = 0-1 matrix for graph with directed edges W = inferred weight of biological knowledge – W=0: no influence; W large: assumed correct – P(W|B) =  exp(-  W) exponential Werhli and Husmeier (2007) J Bioinfo Comput Biol Oslo, Norway (c) 2012 Brian S Yandell248 May 2012

combining eQTL and bio knowledge probability for graph G and bio-weights W – given phenotypes Y, genotypes Q, bio info B Pr(G, W | Y, Q, B) = c Pr(Y|G,Q)Pr(G|B,W,Q)Pr(W|B) – Pr(Y|G,Q) is genetic architecture (QTLs) using parent nodes of each trait as covariates – Pr(G|B,W,Q) = Pr(G Y |B,W) Pr(G Q  Y |Q) Pr(G Y |B,W) relates graph to biological info Pr(G Q  Y |Q) relates genotype to phenotype Moon JY, Chaibub Neto E, Deng X, Yandell BS (2011) Growing graphical models to infer causal phenotype networks. In Probabilistic Graphical Models Dedicated to Applications in Genetics. Sinoquet C, Mourad R, eds. (in review) Oslo, Norway (c) 2012 Brian S Yandell258 May 2012

encoding biological knowledge B transcription factors, DNA binding (causation) p = p-value for TF binding of i  j truncated exponential ( ) when TF i  j uniform if no detection relationship Bernard, Hartemink (2005) Pac Symp Biocomp 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell26

encoding biological knowledge B protein-protein interaction (association) post odds = prior odds * LR use positive and negative gold standards Jansen et al. (2003) Science 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell27

encoding biological knowledge B gene ontology(association) GO = molecular function, processes of gene sim = maximum information content across common parents of pair of genes Lord et al. (2003) Bioinformatics 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell28

MCMC with pathway information sample new network G from proposal R(G*|G) – add or drop edges; switch causal direction sample QTLs Q from proposal R(Q*|Q,Y) – e.g. Bayesian QTL mapping given pa(Y) accept new network (G*,Q*) with probability A = min(1, f(G,Q|G*,Q*)/ f(G*,Q*|G,Q)) – f(G,Q|G*,Q*) = Pr(Y|G*,Q*)Pr(G*|B,W,Q*)/R(G*|G)R(Q*|Q,Y) sample W from proposal R(W*|W) accept new weight W* with probability … 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell29

ROC curve simulation open = QTLnet closed = phenotypes only 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell30

integrated ROC curve 2x2: genetics pathways probability classifier ranks true > false edges 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell31 = accuracy of B

weight on biological knowledge 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell32 incorrectcorrectnon-informative

yeast data—partial success 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell33 26 genes 36 inferred edges dashed: indirect (2) starred: direct (3) missed (39) reversed (0) Data: Brem, Kruglyak (2005) PNAS

limits of causal inference Computing costs already discussed Noisy data leads to false positive causal calls – Unfaithfulness assumption violated – Depends on sample size and omic technology – And on graph complexity (d = maximal path length i  j) – Profound limits Uhler C, Raskutti G, Buhlmann P, Yu B (2012 in prep) Geometry of faithfulness assumption in causal inference. Yang Li, Bruno M. Tesson, Gary A. Churchill, Ritsert C. Jansen (2010) Critical reasoning on causal inference in genome-wide linkage and association studies. Trends in Genetics 26: May 2012Oslo, Norway (c) 2012 Brian S Yandell34

sizes for reliable causal inference genome wide linkage & association 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell35 Li, Tesson, Churchill, Jansen (2010) Trends in Genetics

limits of causal inference unfaithful: false positive edges =min|cor(Y i,Y j )| = csqrt(dp/n) d=max degree p=# nodes n=sample size 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell36 Uhler, Raskutti, Buhlmann, Yu (2012 in prep)

Thanks! Grant support – NIH/NIDDK 58037, – NIH/NIGMS 74244, – NCI/ICBP U54-CA – NIH/R01MH Collaborators on papers and ideas – Alan Attie & Mark Keller, Biochemistry – Karl Broman, Aimee Broman, Christina Kendziorski 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell37

BxH ApoE-/- chr 2: hotspot 388 May 2012Oslo, Norway (c) 2012 Brian S Yandell x% threshold on number of traits Data: Ghazalpour et al.(2006) PLoS Genetics DNA  local gene  distant genes

causal model selection choices in context of larger, unknown network focal trait target trait focal trait target trait focal trait target trait focal trait target trait causal reactive correlated uncorrelated 398 May 2012Oslo, Norway (c) 2012 Brian S Yandell

causal architecture references BIC: Schadt et al. (2005) Nature Genet CIT: Millstein et al. (2009) BMC Genet Aten et al. Horvath (2008) BMC Sys Bio CMST: Chaibub Neto et al. (2010) PhD thesis – Chaibub Neto et al. (2012) Genetics (in review) 8 May 2012Oslo, Norway (c) 2012 Brian S Yandell40

8 May 2012Oslo, Norway (c) 2012 Brian S Yandell41 BxH ApoE-/- study Ghazalpour et al. (2008) PLoS Genetics

8 May 2012Oslo, Norway (c) 2012 Brian S Yandell42