Network Inference Chris Holmes Oxford Centre for Gene Function, &,

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Bayesian network for gene regulatory network construction
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Functional genomics and inferring regulatory pathways with gene expression data.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Epistasis Analysis Using Microarrays Chris Workman.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Reconstruction of Transcriptional Regulatory Networks
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
W w w. u o t t a w a. c a Mathematics and Statistics Faculty of Science Probability and Statistics Group Dept: (613) Fax: (613)
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
EB3233 Bioinformatics Introduction to Bioinformatics.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Incorporating graph priors in Bayesian networks
Learning gene regulatory networks in Arabidopsis thaliana
How to understand the cell by breaking it
Course: Autonomous Machine Learning
High-throughput Biological Data The data deluge
Recovering Temporally Rewiring Networks: A Model-based Approach
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
1 Department of Engineering, 2 Department of Mathematics,
PABIO 590B Advanced Topics in Bioinformatics
A Short Tutorial on Causal Network Modeling and Discovery
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Regulation Analysis using Restricted Boltzmann Machines
Modelling Structure and Function in Complex Networks
SEG5010 Presentation Zhou Lanjun.
Principle of Epistasis Analysis
CS639: Data Management for Data Science
Predicting Gene Expression from Sequence
Presentation transcript:

Network Inference Chris Holmes Oxford Centre for Gene Function, &, Department of Statistics University of Oxford

Overview Statistical Inference Challenges of inferring network topology & the structure of local dependencies Use of “Integrative Genomics” to aid inference Conclusions

Inference Inference is the process of “learning from data” We have two objects to infer: Network structure (topology) Functional form of the dependencies within a given network structure

Probabilistic (Bayesian) Networks Graphical structure used to define interactions which encode a set of conditional independencies Way of simplifying a joint distribution Have become extremely popular in genomics - R. Cowell et al, Springer (1999) - Friedman, http://www.cs.huji.ac.il/~nir/

Probabilistic Networks Advantages: Coherent axiomatic framework Provides a calculus for integrating information from multiple sources that guards against logical inconsistencies Allows precise statements of uncertainty - on global network structure (topologies), and marginals Sequential Experimental design - Calculate optimal follow up experiments to learn most about the network structure given current state of knowledge

Probabilistic Networks Disadvantages: Causal relationships not explicitly handled Dawid AP. Causal inference without counterfactuals (with Discussion). J Am Statist Assoc (2000) Restrictions on valid structures Hammersley-Clifford theorem; Rue & Held, Gaussian Markov Random Fields, Chapman Hall (2005)

Network Inference Prior on network space leads to posterior Computational framework to learn Markov Chain Monte Carlo: Wilks et al, MCMC in practice, Springer, (1999) Stochastic search

Hypothesis-Driven Networks Originally networks were hypothesis driven Well defined small networks Experiments set up to test specific hypothesis Then arrival of high-throughput genomic (disruptive) technologies Treats network structure unknown Data mining (data dredging?)

Bayesian Network Approach Aim is to find graph topology that maximises likelihood given the data

Finding Optimal Network – Hard Problem Need to use heruistics and greedy algorithms

Data Driven Networks Data is extremely sparse, compared with the dimensionality of the network space Great uncertainty in any conclusions High numbers of false positives (false connections) and false negatives (missing connections) This uncertainty is encompassed in a fully Bayesian model, via the posterior distribution on network space, Pr(F | y)

The Learned Network Structure

Data Driven Networks A problem with data mining approaches Often the “data goes in one end and the answer comes out the other end untouched by human thought” – adapted from Doug Altman

Further complicating issues Dynamic networks Imoto (2002); Beal et al, Bioinformatics (2005) Network Dynamics Luscombe et al, Nature, (2004) Interventional analysis Ideker et al, Science, (2002)

Way Forward More refined Prior structures Multiple information sources Literature mining Rajagopalan, Bioinformatics (2005) Comparative genomics Amoutzias, EMBO (2004) Combining other genomic measurement platforms Schadt et al, Nat. Genet. (2005); Zhu et al, Cytogenet Genome Res. (2004); Beer and Tavazoie, Cell. (2004)

Improving Network Inference Perturbations Genetics Biological Context Expression observations Regulatory Signals Comparative Genomics

Integrative Genomics Combine information from multiple sources to improve precision Information is preserved across sources while noise (random variation) is independent across information sources

Germline DNA Somatic DNA RNA Protein Physiology ENVIRONMENT Sequencing SNPs Epigenetics & CGH Microarrays Proteomics Metabonomics

Schadt, Nat. Genet. July 2005. Schadt et al.,

Transcription – cis and trans motifs AND Logic: AND Logic, OR Logic: OR Logic, NOT Logic: Combinatorial patterns help identify groups of transcripts predicted to show similar abundance profiles Beer and Tavazoie, Cell. 2004 Solid: Actual expression Dashed: Predicted

Conclusions Current move back towards more hypothesis driven analysis on smaller networks Conditioning on a well characterised network structures and using multiple data sources to infer and explore local topographic regions

References Bayes nets: Friedman, http://www.cs.huji.ac.il/~nir/