Genetic Networks ..

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.
Stochastic algebraic models SAMSI Transition Workshop June 18, 2009 Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Modelling and Identification of dynamical gene interactions Ronald Westra, Ralf Peeters Systems Theory Group Department of Mathematics Maastricht University.
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
An Intro To Systems Biology: Design Principles of Biological Circuits Uri Alon Presented by: Sharon Harel.
Le Song Joint work with Mladen Kolar and Eric Xing KELLER: Estimating Time Evolving Interactions Between Genes.
Simulation of Prokaryotic Genetic Circuits Jonny Wells and Jimmy Bai.
Signal Processing in Single Cells Tony 03/30/2005.
Models and methods in systems biology Daniel Kluesing Algorithms in Biology Spring 2009.
Petri net modeling of biological networks Claudine Chaouiya.
Regulated Flux-Balance Analysis (rFBA) Speack: Zhu YANG
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Dynamic Modeling Of Biological Systems. Why Model? When it’s a simple, constrained path we can easily go from experimental measurements to intuitive understanding.
Learning From Data Chichang Jou Tamkang University.
Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.
S E n  1 1 E 1 E T T 2 E 2 E I I I How to Quantify the Control Exerted by a Signal over a Target? System Response: the sensitivity (R) of the target.
Theoretical limitations of massively parallel biology Genetic network analysis – gene and protein expression measurements Zoltan Szallasi Children’s Hospital.
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Gene Network Inference From Microarray Data
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Beyond Co-expression: Gene Network Inference Patrik D’haeseleer Harvard University
Lecture 2 2. Number systems, codes, signals - fundamentals
Schematic of TIR signalling Cells as computational devices Contains 1 copy of the genome Contains ca protein molecules in a volume of.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
What Is a Gene Network?. Gene Regulatory Systems “Programs built into the DNA of every animal.” Eric H. Davidson.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Modeling and identification of biological networks Esa Pitkänen Seminar on Computational Systems Biology Department of Computer Science University.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Reconstruction of Transcriptional Regulatory Networks
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Conceptual Modelling and Hypothesis Formation Research Methods CPE 401 / 6002 / 6003 Professor Will Zimmerman.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Modeling Genetic Network: Boolean Network Yongyeol Ahn KAIST.
Steady-state Analysis of Gene Regulatory Networks via G-networks Intelligent Systems & Networks Group Dept. Electrical and Electronic Engineering Haseong.
Engineered Gene Circuits Jeff Hasty. How do we predict cellular behavior from the genome? Sequence data gives us the components, now how do we understand.
Network Evolution Statistics of Networks Comparing Networks Networks in Cellular Biology A. Metabolic Pathways B. Regulatory Networks C. Signaling Pathways.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Introduction to biological molecular networks
MA354 An Introduction to Math Models (more or less corresponding to 1.0 in your book)
Introduction to Models Lecture 8 February 22, 2005.
Modelling Gene Regulatory Networks using the Stochastic Master Equation Hilary Booth, Conrad Burden, Raymond Chan, Markus Hegland & Lucia Santoso BioInfoSummer2004.
IGEM 2008 Tutorial Modeling. What? Model A model in science is a physical, mathematical, or logical representation of a system of entities, phenomena,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Why use landscape models?  Models allow us to generate and test hypotheses on systems Collect data, construct model based on assumptions, observe behavior.
MA354 Math Modeling Introduction. Outline A. Three Course Objectives 1. Model literacy: understanding a typical model description 2. Model Analysis 3.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Traffic Simulation L2 – Introduction to simulation Ing. Ondřej Přibyl, Ph.D.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Modelling of biomolecular networks
A Stochastic Model of Cell Differentiation
System Structures Identification
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
1 Department of Engineering, 2 Department of Mathematics,
Causal Models Lecture 12.
Parametric Methods Berlin Chen, 2005 References:
Computational Biology
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Presentation transcript:

Genetic Networks .

Cellular Networks Most processes in the cell are controlled by “networks” of interacting molecules: Metabolic Networks Signal Transduction Networks Regulatory Networks

Unifying View The cell as a “state machine” Cell state S = (P1,P2, …, R1, R2, …m1, m2, …) P proteins, R mRNA molecules, m metabolites Each cell at any given time, can be characterized using its state S Dynamics: Input(t), S(t) => S(t+Dt)

What does it mean? Steady Cell State – cell type Neuron RBC muscle cell Tumor cell Dynamics – cellular process Differentiation Apoptosis Cell Cycle

Gene Regulation Networks Regulation of expression of genes is crucial Regulation occurs at many stages: pre-transcriptional (chromatin structure) transcription initiation RNA editing (splicing) and transport Translation initiation Post-translation modification RNA & Protein degradation Understanding regulatory processes is a central problem of biological research

Genetic Network Models: Goals Incorporate rule-based dependencies between genes Rule-based dependencies may constitute important biological information. Allow to systematically study global network dynamics In particular, individual gene effects on long-run network behavior. Must be able to cope with uncertainty Small sample size, noisy measurements, biological “noise” Quantify the relative influence and sensitivity of genes in their interactions with other genes This allows us to focus on individual (groups of) genes. What model should we use?

Level of Biochemical Detail Detailed models require lots of data! Highly detailed biochemical models are only feasible for very small systems which are extensively studied Example: Arkin et al. (1998), Genetics 149(4):1633-48 lysis-lysogeny switch in Lambda phage: 5 genes, 67 parameters based on 50 years of research stochastic simulation required supercomputer!

Example: Lysis-Lysogeny Arkin et al. (1998), Genetics 149(4):1633-48

Level of Biochemical Detail In-depth biochemical simulation of e.g. a whole cell is infeasible (so far) Less detailed network models are useful when data is scarce and/or network structure is unknown Once network structure has been determined, we can refine the model

Boolean or Continuous? Boolean Networks (Kauffman (1993), The Origins of Order) assumes ON/OFF gene states. Allows analysis at the network-level Provides useful insights in network dynamics Algorithms for network inference from binary data A B C C = A AND B 1

Boolean Formalism: Cons Boolean abstraction is poor fit to real data Cannot model important concepts: amplification of a signal subtraction and addition of signals compensating for smoothly varying environmental parameter (e.g. temperature, nutrients) varying dynamical behavior (e.g. cell cycle period) Feedback control: negative feedback is used to stabilize expression  causes oscillation in Boolean model

Boolean Formalism: Pros Studies give rise to qualitative phenomena, as observed by experimentalists. Some studied systems exhibit multiple steady states and “switchlike” transitions between them. It is experimentally shown that such systems are “robust” to exact values of kinetic parameters of individual reactions.

Concentrations or Molecules? Use of concentrations assumes individual molecules can be ignored Known examples (in prokaryotes) where stochastic fluctuations play an essential role (e.g. lysis-lysogeny in lambda) Requires stochastic simulation (Arkin et al. (1998), Genetics 149(4):1633-48), or modeling molecule counts (e.g. Petri nets, Goss and Peccoud (1998), PNAS 95(12):6750-5) Significantly increases model complexity

Concentrations or Molecules? Eukaryotes: larger cell volume, typically longer half-lives. Few known stochastic effects. Yeast: 80% of the transcriptome is expressed at 0.1-2 mRNA copies/cell Holstege, et al.(1998), Cell 95:717-728. Human: 95% of transcriptome is expressed at <5 copies/cell Velculescu et al.(1997), Cell 88:243-251

Spatial or Non-Spatial Spatiality introduces additional complexity: intercellular interactions spatial differentiation cell compartments cell types Spatial patterns also provide more data e.g. stripe formation in Drosophila: Mjolsness et al. (1991), J. Theor. Biol. 152: 429-454. Few (no?) large-scale spatial gene expression data sets available so far.

Example: Drosophila Segmentation eve (even-striped) expression anterior posterior high eve (stripe 2) hb gt Kr bcd low expression of transcription factors in embryo

Deterministic or Stochastic? Many sources of stochasticity Bioloical stochasticity Experimental noise Stochastic models can account for those Deterministic models are usually simpler to analyze (dynamics, steady states) and interpret

Modeling Approaches Boolean Networks Linear Models Bayesian Networks

Boolean Network

What is a Boolean Network? Boolean network is a kind of Graph G(V, F) – V is a set of nodes ( genes ) F is a list of Boolean functions Every node has only two values: ON ( 1 ) and OFF ( 0 ) Every function has the result value of each node : Representation: standard, wiring , automaton

What is a Boolean Network? Attractor : Certain states revisited infinitely often depending on the initial starting state. Basin of attraction Limit-cycle attractor

Boolean Network Example Time = t Time = t+1 Activate gene inactivate gene Wiring diagram G’(V’,F’) Nodes (genes) x1 x2 x3 1 Interation 1 2 3 4 5 6 X1 X2 X3 Trajectory example

Boolean Network Example Nodes (genes) Interation 1 2 3 4 5 6 X1 X2 X3 x1 x2 x3 1 111 011 110 000 001 010 100 101 Start! trajectory 1 trajectory 2

Basic Structure of Boolean Networks Each node is a gene 1 means active/expressed 0 means inactive/unexpressed A B Boolean function A B X 0 0 1 0 1 1 1 0 0 1 1 1 X In this example, two genes (A and B) regulate gene X. In principle, any number of “input” genes are possible. Positive/negative feedback is also common (and necessary for homeostasis).

Dynamics of Boolean Networks Time 1 1 1 1 A 1 B C 1 E 1 D F At a given time point, all the genes form a genome-wide gene activity pattern (GAP) (binary string of length n ). Consider the state space formed by all possible GAPs.

State Space of Boolean Networks Similar GAPs lie close together. There is an inherent directionality in the state space. Some states are attractors (or limit-cycle attractors). The system may alternate between several attractors. Other states are transient. Picture generated using the program DDLab.

Reverse Engineering Problem Can we infer the structure and rules of a genetic network from gene expression measurements?

Reverse Engineering Problem Input: Gene expression data Output: Network structure and parameters (or regulation rules)

Gene Expression Time Series Data Problem: how can these data be used to infer how these three genes influence each other?

Modelling Gene Expression Data assume that genes exist in two states: on and off if expression of gene i is above level ti consider it on, otherwise, consider it off

Modelling Gene Expression Data assume that genes exist in two states: on and off if expression of gene i is above level ti consider it on, otherwise, consider it off

Modelling Gene Expression Data off on on t3 off off off off off off off off off off off off off off off off off off assume that genes exist in two states: on and off if expression of gene i is above level ti consider it on, otherwise, consider it off

Modelling Gene Expression Data we obtain the following discretized gene expression data: time 5 10 15 20 25 30 35 40 45 50 55 gene 1 1 gene 2 gene 3 the gene expression data is now in the form of bit streams

Information Theoretic Tools we define some necessary information theoretic tools: Shannon entropy of data stream H(X) = - ∑ pi log(pi) where pi is the probability that a random element of data stream X is i (the base of the logarithm can be anything, but must be consistent throughout; usually we use base 2)

Information Theoretic Tools e.g. Shannon entropy of data streams X and Y X = [0, 1, 1, 1, 1, 1, 1, 0, 0, 0] Y = [0, 0, 0, 1, 1, 0, 0, 1, 1, 1] H(X) = - ∑ pi logn(pi) = -(pX=0 log2(pX=0) + pX=1 log2(pX=1)) = -(0.4 log2(0.4) + 0.6 log2(0.6)) = 0.971 H(Y) = - ∑ pi logn(pi) = -(0.5 log2(0.5) + 0.5 log2(0.5)) = 1.0

Information Theoretic Tools e.g. Shannon joint entropy of data streams X and Y X = [0, 1, 1, 1, 1, 1, 1, 0, 0, 0] Y = [0, 0, 0, 1, 1, 0, 0, 1, 1, 1] H(X, Y) = - ∑ pi logn(pi) = -(pX=0,Y=0 log2(pX=0,Y=0,) + pX=1,Y=0 log2(pX=1,Y=0) + pX=0,Y=1 log2(pX=0,Y=1,) + pX=1,Y=1 log2(pX=1,Y=1)) = -(0.1 log2(0.1) + 0.4 log2(0.4) + 0.3 log2(0.3) + 0.2 log2(0.2) = 1.85

Information Theoretic Tools Define: Conditional Entropy H(X|Y) = H(X, Y) – H(X) H(Y|X) = H(X, Y) – H(Y) Mutual Information M(X, Y) = H(Y) - H(Y|X) = H(X) - H(X|Y) = H(X) + H(Y) - H(X,Y)

Information Theoretic Tools It is easy to show that: Let X be an input data stream and Y be an output data stream If M(Y, X) = H(Y) then X exactly determines Y Look for pairs(x,y) where M(Yt+1, Xt) = H(Yt+1)

Identification of the Network Graph back to the data: time 1 2 3 4 5 6 gene A gene B gene C step 1: put data in “state transition table” form

Identification of the Network Graph state transition table: Input stream value Output stream value Ai-1 Bi-1 Ci-1 Ai Bi Ci 1 step 1: put data in “state transition table” form

Identification of the Network Graph state transition table tells us how to get from state i – 1 to state i as a lookup table however, it is difficult to discern functional relationships, so… step 2: use information theoretic tools to discover which inputs determine the outputs

Identification of the Network Graph step 2a: calculate entropies note: limx+0xx=1, therefore in the left-hand limit, (0)log(0) = 0. H(Ai) = -((0.25)log(0.25) + (0.75)log(0.75)) = 0.81 H(Bi) = -((0.75)log(0.75) + (0.25)log(0.25)) = 0.81 H(Ci) = -((0.5)log(0.5) + (0.5)log(0.5)) = 1 H(Ai-1) = H(Bi-1) = H(Ci-1) = -((0.5)log(0.5) + (0.5)log(0.5)) = 1 H(Ai-1, Ci-1) = -((0.25)log(0.25) + (0.25)log(0.25) + (0.25)log(0.25) + (0.25)log(0.25)) = 2

Identification of the Network Graph step 2a: calculate entropies H(Ai, Ai-1, Ci-1) = -((0.25)log(0.25) + (0.25)log(0.25) + (0.25)log(0.25) + (0.25)log(0.25)) = 2 H(Bi, Ai-1, Ci-1) = -((0.25)log(0.25) + (0.25)log(0.25) H(Ci, Ai-1) = -((0.5)log(0.5) + (0.5)log(0.5) = 1

Identification of the Network Graph step 2b: calculate mutual information M(Ai, [Ai-1, Ci-1]) = H(Ai) + H(Ai-1, Ci-1) - H(Ai, Ai-1, Ci-1) = 0.81 + 2 – 2 = 0.81 = H(Ai), therefore Ai-1 and Ci-1 determine Ai M(Bi, [Ai-1, Ci-1]) = H(Bi) + H(Ai-1, Ci-1) - H(Bi, Ai-1, Ci-1) = H(Bi), therefore Ai-1 and Ci-1 determine Bi M(Ci, Ai-1) = H(Ci) + H(Ai-1) - H(Ci, Ai-1) = 1 + 1 – 1 = 1 = H(Ci), therefore Ai-1 determines Ci

Identification of the Boolean Circuits step 3: determine functional relationship between variables (this is simply the truth table) Ai-1 Ci-1 Ai 1 Ai = Ai-1 OR Ci-1

Identification of the Boolean Circuits step 3: determine functional relationship between variables Ai-1 Ci-1 Bi 1 Bi = Ai-1 AND Ci-1

Identification of the Boolean Circuits step 3: determine functional relationship between variables Ai-1 Ci 1 Ci = NOT Ai-1

Problems With This Approach no theory exists for determining the discretization level ti the assumption that genes can be modeled as either ‘on’ or ‘off’ may be sufficient for some genes, but will certainly not be sufficient for all genes Ignores noise of all kinds (experimental, biological)

Boolean networks are inherently deterministic Conceptually, the regularity of genetic function and interaction is not due to “hard-wired” logical rules, but rather to the intrinsic self-organizing stability of the dynamical system. Additionally, we may want to model an open system with inputs (stimuli) that affect the dynamics of the network. From an empirical viewpoint, the assumption of only one logical rule per gene may lead to incorrect conclusions when inferring these rules from gene expression measurements, as the latter are typically noisy and the number of samples is small relative to the number of parameters to be inferred.

Linear Models Basic model: weighted sum of inputs Simple network representation: Only first-order approximation Parameters of the model: weight matrix containing NxN interaction weights “Fitting” the model: find the parameters wji, bi such that model best fits available data or w23 g1 g2 g3 g4 g5 w12 w55

Underdetermined problem! Assumes fully connected network: need at least as many data points (arrays, conditions) as variables (genes)! Underdetermined (underconstrained, ill-posed) model: we have many more parameters than data values to fit No single solution, rather infinite number of parameter settings that will all fit the data equally well

Solution 1: reduce N Rather than trying to model all genes, we can reduce the dimensionality of the problem: Network of clusters: construct a linear model based on the cluster centroids rat CNS data (4 clusters): Wahde and Hertz (2000), Biosystems 55, 1-3:129-136. yeast cell cycle (15-18 clusters): Mjolsness et al.(2000), NIPS 12; van Someren et al.(2000) ISMB2000, 355-366. Network of Principal Components: linear model between “characteristic modes” of the data Holter et al.(2001), PNAS 98(4):1693-1698.

Solution 2: Take advantage of additional information: replicates accuracy of measurements smoothness of time series … Most likely, the network will still be poorly constrained.  Need a method to identify and extract those parts of the model that are well-determined and robust

Danger of Overfitting The linear model assumes every gene is regulated by all other genes (i.e. full connectivity) This is the richest model of its kind Danger to over fit the training data Will result in poor prediction on new data Far from reality: only few regulators for each gene