VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Slides:

Advertisements

Similar presentations

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Advertisements

Problems and Their Classes

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.

DYNAMICS OF RANDOM BOOLEAN NETWORKS James F. Lynch Clarkson University.

Succession Model We talked about the facilitation, inhibition, and tolerance models, but all of these were verbal descriptions Today we are going to consider.

Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

1 Finite-Length Scaling and Error Floors Abdelaziz Amraoui Andrea Montanari Ruediger Urbanke Tom Richardson.

Topics Review of DTMC Classification of states Economic analysis

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Entropy Rates of a Stochastic Process

Markov Chains Lecture #5

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Humboldt- Universität Zu Berlin Edda Klipp, Humboldt-Universität zu Berlin Boolean Networks Edda Klipp Humboldt University Berlin SS 2009.

Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.

CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.

Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.

1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.

6. Gene Regulatory Networks

Copyright © Cengage Learning. All rights reserved.

Copyright © Cengage Learning. All rights reserved. CHAPTER 2 THE LOGIC OF COMPOUND STATEMENTS THE LOGIC OF COMPOUND STATEMENTS.

Itti: CS564 - Brain Theory and Artificial Intelligence. Systems Concepts 1 CS564 - Brain Theory and Artificial Intelligence University of Southern California.

Lecture II-2: Probability Review

Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Radial Basis Function Networks

Separate multivariate observations

INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.

Copyright © Cengage Learning. All rights reserved. 1 Functions and Limits.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.

Digital Logic Chapter 4 Presented by Prof Tim Johnson

DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.

Analysis of Algorithms

Copyright © 2014, 2010 Pearson Education, Inc. Chapter 2 Polynomials and Rational Functions Copyright © 2014, 2010 Pearson Education, Inc.

Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Motif finding with Gibbs sampling CS 466 Saurabh Sinha.

Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.

Channel Capacity.

COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.

Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.

Chapter 8: Simple Linear Regression Yang Zhenlin.

Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.

1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.

Humboldt- Universität Zu Berlin Edda Klipp, Humboldt-Universität zu Berlin Boolean Networks Edda Klipp Humboldt-Universität Berlin SS 2010.

1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.

Topics 1 Specific topics to be covered are: Discrete-time signals Z-transforms Sampling and reconstruction Aliasing and anti-aliasing filters Sampled-data.

Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.

Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.

Chapter 2. Signals and Linear Systems

CORRELATION-REGULATION ANALYSIS Томский политехнический университет.

Copyright © Cengage Learning. All rights reserved.

1 Department of Engineering, 2 Department of Mathematics,

Hidden Markov Models Part 2: Algorithms

1 Department of Engineering, 2 Department of Mathematics,

CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks

1 Department of Engineering, 2 Department of Mathematics,

CONTEXT DEPENDENT CLASSIFICATION

Boltzmann Machine (BM) (§6.4)

Copyright © Cengage Learning. All rights reserved.

Presentation transcript:

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism and Signaling Edda Klipp Humboldt University Berlin Lecture 4 / WS 2007/08 Boolean Networks

VL Netzwerke, WS 2007/08 Edda Klipp 2 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics One Network, Different Models gene agene bgene cgene d C A D B A B + + repression activation transcription translation gene protein ab cd Directed graphs V = {a,b,c,d} E = {(a,c,+),(b,c,+), (c,b,-),(c,d,-),(d,b,+)} ab cd Boolean network a(t+1) = a(t) b(t+1) = (not c(t)) and d(t) c(t+1) = a(t) and b(t) d(t+1) = not c(t) ab cd Bayesian network p(xa)p(xa) p(xb)p(xb) p(x c |x a,x b ), p(x d |x c ),

VL Netzwerke, WS 2007/08 Edda Klipp 3 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Simplification of Gene Expression Regulation Gene mRNA Protein Gene mRNA Protein Transcription Factor ABCDEFG

VL Netzwerke, WS 2007/08 Edda Klipp 4 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Boolean network is - a directed graph G(V,E) characterized by - the number of nodes („genes“): N - the number of inputs per node (regulatory interactions): k AB C E D F G N=7, k A =0, k B =1, k C =2,… in-degrees

VL Netzwerke, WS 2007/08 Edda Klipp 5 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Logic (George Boole, ) Each gene can assume one of two states: expressed („1“) or not expressed („0“) Background: Not enough information for more detailed description Increasing complexity and computational effort for more specific models Replacement of continuous functions (e.g. Hill function) by step function Boolean models are discrete (in state and time) and deterministic.

VL Netzwerke, WS 2007/08 Edda Klipp 6 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Boolean networks have always a finite number of possible states: 2 N and, therefore, a finite number of state transitions: AB C E D F G N=7, 2 7 states, theoretically possible state transition

VL Netzwerke, WS 2007/08 Edda Klipp 7 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Dynamics of Boolean NetworkS The dynamics are described by rules: „if input value/s at time t is/are...., then output value at t+1 is....“ AB A(t)B(t+1)

VL Netzwerke, WS 2007/08 Edda Klipp 8 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Models: Truth functions in output p p not p rule AB B(t+1) = not (A(t)) rule 2 A(t)B(t+1)

VL Netzwerke, WS 2007/08 Edda Klipp 9 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Dynamics of Boolean Networks with k=1 Linear chain ABCD A fixed (no input). Rules 0 and 3 not considered (since independence of input). A(t)  B(t+1) B(t+1)  C(t+2) C(t+2)  D(t+3) The system reaches a steady state after N-1 time steps.

VL Netzwerke, WS 2007/08 Edda Klipp 10 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Dynamics of Boolean Networks with k=1 Ring AB CD A B Again: Rules 0 and 3 not considered (since independence of input). A(t+1)=B(t) B(t+1)=A(t) Both rule 1 A BA BA BA B A(t+1)=not B(t) B(t+1)=A(t) Both rule 1 A BA BA BA B Fixpoint or cycle of length 2 depending on initial conditions Cycle of length 4 independent of initial conditions.

VL Netzwerke, WS 2007/08 Edda Klipp 11 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Attractor The trajectory connects the successive states for increasing time. An attractor is a region of a dynamical system's state space that the system can enter but not leave, and which contains no smaller such region (a special trajectory). Fixpoint – cycle of length 1 Cycles of length L Basin of attraction: is the surrounding region in state space such that all trajectories starting in that region end up in the attractor. Bifurcation: appearance of a boarder separating two basins of attraction.

VL Netzwerke, WS 2007/08 Edda Klipp 12 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Models: Truth functions k=2 input output p q A C C(t+1) = not (A(t)) and B(t) rule 4 B p=A(t), q=B(t)

VL Netzwerke, WS 2007/08 Edda Klipp 13 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example Network Three genes X, Y, and Z X Y Z Rules X(t+1) = X(t) and Y(t) Y(t+1) = X(t) or Y(t) Z(t+1) = X(t) or (not Y(t) and Z(t)) Current Next state

VL Netzwerke, WS 2007/08 Edda Klipp 14 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example Network X Y Z The number of accessible states is finite,. - Cyclic trajectories are possible. - Not every state must be approachable from every other state. - The successor state is unique, the predecessor state is not unique.

VL Netzwerke, WS 2007/08 Edda Klipp 15 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example Network as Boolean Model gene agene bgene cgene d C A D B A B + + repression activation transcription translation gene protein ab cd Boolean network a(t+1) = a(t) b(t+1) = (not c(t)) and d(t) c(t+1) = a(t) and b(t) d(t+1) = not c(t)

VL Netzwerke, WS 2007/08 Edda Klipp 16 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example Network as Boolean Model ab cd Boolean network a(t+1) = a(t) b(t+1) = (not c(t)) and d(t) c(t+1) = a(t) and b(t) d(t+1) = not c(t) 0000         0000 Steady state:         1010 Cycle: 1000  1001  1101  1111  1010  1000

VL Netzwerke, WS 2007/08 Edda Klipp 17 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Naïve Reconstruction of Boolean Models If it is known -the number of vertices, N, and -the number of inputs per vertex, k, -As well as a sufficient set of successive states, one can reconstruct the network List - List for each vertex all possible input combinations - List all respective outputs Experiments: - Delete after every “experiment” all “wrong” entries of the list

VL Netzwerke, WS 2007/08 Edda Klipp 18 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Naïve Reconstruction of Boolean Models AB N=2, k=1 Input Output A(A),B(A) A B rule Inout rule Input Output A(B),B(B) A B rule Input Output A(A),B(B) A B rule Input Output A(B),B(A) A B rule AB 1 2 “Experimente….” InOutA B

VL Netzwerke, WS 2007/08 Edda Klipp 19 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Boolean Networks If the rules for updating states are unknown  select rules randomly N nodes ½ pN (N-1) edges Rule 2 Rule 0 Rule 1 Rule 2

VL Netzwerke, WS 2007/08 Edda Klipp 20 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Kauffman’s NK Boolean Networks An NK automaton is an autonomous random network of N Boolean logic elements. Each element has K inputs and one output. The signals at inputs and outputs take binary (0 or 1) values. The Boolean elements of the network and the connections between elements are chosen in a random manner. There are no external inputs to the network. The number of elements N is assumed to be large. S.A. Kauffman, 1969, J Theor Biol. Metabolic Stability and Epigenesis in Randomly Constructed Genetic Nets S. A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution, Oxford University Press, New York, S.A. Kauffman, 2003, PNAS, Random Boolean Network Models and the Yeast Transcriptional Network

VL Netzwerke, WS 2007/08 Edda Klipp 21 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Kauffman’s NK Boolean Networks An automaton operates in discrete time. The set of the output signals of the Boolean elements at a given moment of time characterizes a current state of an automaton. During an automaton operation, the sequence of states converges to a cyclic attractor. The states of an attractor can be considered as a "program" of an automaton operation. The number of attractors M and the typical attractor length L are important characteristics of NK automata.

VL Netzwerke, WS 2007/08 Edda Klipp 22 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Kauffman’s Boolean Network Fundamental question: require metabolic stability and epigenesis the genetic regulatory circuits to be precisely constructed?? Has fortunate evolutionary history selected only nets of highly ordered circuits which alone insure metabolic stability; Or are stability and epigenesis, even in nets of randomly interconnected regulatory circuits, to be expected as the probable consequence of as yet unknown mathematical laws? Are living things more akin to precisely programmed automata selected by evolution, or to randomly assembled automata…? Note: cellular differentiation despite identical sets of genes

VL Netzwerke, WS 2007/08 Edda Klipp 23 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Kauffman’s Boolean Network

VL Netzwerke, WS 2007/08 Edda Klipp 24 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Further Properties K connections: 2 2K Boolean input functions Nets are free of external inputs. Once, connections and rules are selected, they remain constant and the time evolution is deterministic. Earlier work by Walker and Ashby (1965): same Boolean functions for all genes: Choice of Boolean function affects length of cycles: “and” yields short cycles, “exclusive or” yields cycles of immense length

VL Netzwerke, WS 2007/08 Edda Klipp 25 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Further Properties: Cycles State of the net: Row listing the present value of all N elements (0 or 1) Finite number of states (2N)  as system passes along a sequence of states from an arbitrary initial state, it must eventually re-enter a state previously passed  a cycle Cycle length: number of states on a re-enterant cycle of behavior Cycle of length 1 – equilibrial state Transient (or run-in) length: number of state between initial states and entering the cycle Confluent: set of states leading to or being part of a cycle

VL Netzwerke, WS 2007/08 Edda Klipp 26 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Further Properties: Number of Cycles Such a net must contain at least one cycle, it may have more. There number can be counted just be releasing the net from different initial states No state can diverge on to two different states, no state can be on two different cycles

VL Netzwerke, WS 2007/08 Edda Klipp 27 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Further Properties: Number of Cycles (a) A net of three binary elements, each of which receives inputs from the other two. The Boolean function assigned to each element is shown beside the element. (b) All possible states of the 3-element net are shown in the left 3 x 8 matrix below T. The subsequent state of the net at time T+ 1, shown in the matrix on the right, is derived from the inputs and functions shown in (a). (c) A kimatograph showing the sequence of state transitions leading into a state cycle of length 3. All states lie on one confluent. There are three run-ins to the single state cycle.

VL Netzwerke, WS 2007/08 Edda Klipp 28 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example: Net with N=10 Periodic attractor (yellow) and basin of attraction (cyan)

VL Netzwerke, WS 2007/08 Edda Klipp 29 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example: Net with N=10 The entire state space of an RBN with 10 nodes. Note: Self connections do not appear so a period-1 attractor appears to have no outputs although each network state must have exactly one output.

VL Netzwerke, WS 2007/08 Edda Klipp 30 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Further Properties: Distance Distance compares two states of the net Can be defined as the number of genes with different values in two states. For example N=5: state (00000) and state (00111) differ in the value of three elements This is used as measure of dissimilarity between - subsequent states on a transient - subsequent states on a cycle - cycles

VL Netzwerke, WS 2007/08 Edda Klipp 31 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Totally Connected Nets, K=N Is like random mapping of a finite set of numbers into itself. Expected length of cycle is E.g. net with N=200    states  expected cycle length ~ Compare to Hubbel’s age of the universe: If every transition would take only a second….  Such networks are biologically impossible

VL Netzwerke, WS 2007/08 Edda Klipp 32 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics One Connected Nets, K=1 Either one cycle of length N Or a number of disconnected cycles  for the full systems state cycles lengths are lowest common multiples of the individual loop lenghts the state cycle length becomes easily very large Again biologically not feasible

VL Netzwerke, WS 2007/08 Edda Klipp 33 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Two Connected Nets, K=2 Kauffman studied networks of N = 15, 50, 64,…, 400, 1024,.., 8191 Nets of 1000 elements possess ~ states 16 Boolean functions  Study of cycle length (surprisingly short)

VL Netzwerke, WS 2007/08 Edda Klipp 34 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Two Connected Nets, K=2: Cycle Length (a) A histogram of the lengths of state cycles in nets of 400 binary elements which used all 16 Boolean functions of two variables equiprobably. The distribution is skewed toward short cycles. (b) A histogram of the lengths of state cycles in nets of 400 binary elements which used neither tautology nor contradiction, but used the remaining 14 Boolean functions of 2 variables equiprobably. The distribution is skewed toward short cycles.

VL Netzwerke, WS 2007/08 Edda Klipp 35 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Two Connected Nets, K=2: Cycle Length Log median cycle length as a function of log N, in nets using all 16 Boolean functions of two inputs (all Boolean functions used), and in nets disallowing these two functions (tautology and contradiction not used). The asymptotic slopes are about 0.3 and 0.6.

VL Netzwerke, WS 2007/08 Edda Klipp 36 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics K=2: Transient Lengths A scattergram of run-in length and cycle length in nets of 400 binary elements using neither tautology nor contradiction. Run-in length appears uncorelated with cycle length. A log/log plot was used merely to accommodate the data.

VL Netzwerke, WS 2007/08 Edda Klipp 37 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics K=2: Number of Cycles A histogram of the number of cycles per net in nets of 400 elements using neither tautology nor contradiction, but the remaining Boolean functions of two inputs equiprobably. The median is 10 cycles per net. The distribution is skewed toward few cycles. Expected number of cycles:

VL Netzwerke, WS 2007/08 Edda Klipp 38 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics K=2: Activity After release from an arbitrary initial state: Number of elements changing their state per state transition decreases Example: net of 100 elements first step: about 0.4 N elements change exponential decay of this number minimum activity 0 to 0.25 N On a cycle: 0 to 35 of 100 elements change  most genes are constant during a cycle Bis hier 12. Nov 2007

VL Netzwerke, WS 2007/08 Edda Klipp 39 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Noise One unit of noise may be introduced by arbitrarily changing the value of a single gene for one time moment.  The system may return to the cycle perturbed or run into a different cycle. In a net of size N there are just N states which differ from any state in the value of just one gene Consider a net with several cycles: By perturbing all states on each cycle (distance 1) one obtains a matrix listing all cycles and how often they are reached from another one. By dividing all cells by the rows totals  transition probabilities The matrix is a Markov chain.

VL Netzwerke, WS 2007/08 Edda Klipp 40 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Noise: for the Example ab cd Boolean network a(t+1) = a(t) b(t+1) = (not c(t)) and d(t) c(t+1) = a(t) and b(t) d(t+1) = not c(t) Cycle         0000 Steady state: 0101 Cycle         1010 Cycle: 1000  1001  1101  1111  1010  1000 C1 C2 C1 ¾ ¼ C2 ¼ ¾ Transition Matrix

VL Netzwerke, WS 2007/08 Edda Klipp 41 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Noise (a) A matrix listing the 30 cycles of one net and the total number of times one unit of perturbation shifted the net from each cycle to each cycle. The system generally returns to the cycle perturbed. Division of the value in each cell of the matrix by the total of its row yields the matrix of transition probabilities between modes of behavior which constitute a Markov chain. The transition probabilities between cycles may be asymmetric. (b) Transitions between cycles in the net shown in (a). The solid arrows are the most probable transition to a cycle other than the cycle perturbed, the dotted arrows are the second most probable. The remaining transitions are not shown. Cycles 2, 7, 5 and 15 form an ergodic set into which the remaining cycles flow. If all the transitions between cycles are included, the ergodic set of cycles becomes: 1, 2, 3, 5, 6, 12, 13, 15, 16. The remainder are transient cycles leading into this single ergodic set-.

VL Netzwerke, WS 2007/08 Edda Klipp 42 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Noise The total number of cycles reached from each cycle after it was perturbed in all possible ways by one unit of noise correlated with the number of cycles in the net being perturbed. The data is from nets using neither tautology nor contradiction, with N = 191, and 400.

VL Netzwerke, WS 2007/08 Edda Klipp 43 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Application to Cell Cycle Logarithm of cell replication time in minutes against logarithm of estimated number of genes for various single cell organisms and cell types. Solid lines: connects medium replication times of bacteria, protozoa, chicken, mouse, dog, and man.

VL Netzwerke, WS 2007/08 Edda Klipp 44 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Application to Cellular Differentiation The logarithm of the number of cell types is plotted against the logarithm of the estimated number of genes per cell, and the logarithm of the median number of state cycles is plotted against logarithm N. The observed and theoretical slopes are about 0.5. Scale: 2 x lo6 genes per cell = 6 x 10-12g DNA per cell.

VL Netzwerke, WS 2007/08 Edda Klipp 45 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Distance: the Derrida Plot Recurrence relation showing the expected distance D T+1 between two states at time T+1 after each is acted upon by the network at time T, as a function of the distance DT between the two states at time T. Distance is normalized to the fraction of elements in different activity values in the two states being compared. For K=2, the recurrence curve is below the 45x line, and hence the distance between arbitrary initial states decreases toward zero over iterations. For K>2, states that are initially very close diverge to an asymptotic distance given by the crossing of the corresponding K curve at the 45x line. Thus K>2 networks exhibit sensitivity to initial conditions and chaos, not order. Example: N=3, at T two states (000) and (001) – distance 1 (or 1/3 normalized) Transition to T+1 : (000)  (100) and (001)  (010) distance 2 (or 2/3)

VL Netzwerke, WS 2007/08 Edda Klipp 46 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Ordered and Chaotic Regimes Series of states: White: changing Black: not changing Edge of Chaos Chaotic regime Ordered regime

VL Netzwerke, WS 2007/08 Edda Klipp 47 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Kauffman’s NK Boolean Networks Dependence of Behavior on Degree K K large (K = N): the behavior is essentially stochastic. The successive states are random with respect to the preceding ones. The "programs" are very sensitive to minimal disturbances (a minimal disturbance is a change of an output of a particular element during an automaton operation) and to mutations (changes in Boolean element types and in network connections). The attractor lengths L are very large: L ~ 2N/2. The number of attractors M is of the order of N. If the connection degree K is decreased, this stochastic type of behavior is still observed, until K ~ 2.

VL Netzwerke, WS 2007/08 Edda Klipp 48 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Kauffman’s NK Boolean Networks Dependence of Behavior on Degree K At K ~ 2 the network behavior changes drastically. The sensitivity to minimal disturbances is small. The mutations create typically only slight variations an automaton dynamics. Only some rare mutations evoke the radical, cascading changes in the automata "programs". The attractor length L and the number of attractors M are of the order of 1/2N. This is the behavior at the edge of chaos, at the borderland between chaos and order.

VL Netzwerke, WS 2007/08 Edda Klipp 49 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Dynamics – Scale Free Nets

VL Netzwerke, WS 2007/08 Edda Klipp 50 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Dynamics – Scale Free Nets Typical examples of directed networks are shown for the size of N = 64: (a),(b) random network with K = 2; (e),(f) scale-free network with =2. We show same network by two kinds of representation. For (a) and (e) the nodes are located on the circumference with equal distance. For (b) and (f) the nodes are randomly distributed in the square. Each node is represented as a bold point with size in proportion to the number of the input links. We represent input(output)-side of the links with deep(faint) color such that the direction of a link is denoted by the color gradation from deep color(output) to faint color(input).

VL Netzwerke, WS 2007/08 Edda Klipp 51 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Dynamics – Scale Free Nets Distribution of Cycle Lengths K=2 =2

VL Netzwerke, WS 2007/08 Edda Klipp 52 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Dynamics – Scale Free Nets Distribution of Cycle Lengths K=2 =2 N=40

VL Netzwerke, WS 2007/08 Edda Klipp 53 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Dynamics – Scale Free Nets Distribution of Cycle Lengths Histograms h of the lengths L c of state cycles in various types of the directed networks. The network size is N = 80. Each histogram is generated by 10 3 different sets of the Boolean functions and five different network structures. The maximum iteration number of the Boolean dynamics is 10 5 until the convergence to the cycle is realized. (a) the RBN with K ¼ 2; (b) the SFRBN with = 2.

VL Netzwerke, WS 2007/08 Edda Klipp 54 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Dynamics – Scale Free Nets Derrida Plots (Distance between States) Derrida plots of the SFRBN with = 1, = 2 and = 4. The analytical curves for the RBN with K = 1, K = 2 and K = 4 are also overplotted. A line H(t+1)=H(t) is the dividing line between order and chaos. It is clear that K = 2 lies directly on this line, the system size is N = 1024 and the number of the initial states for averaging is 2000.

VL Netzwerke, WS 2007/08 Edda Klipp 55 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Example: Yeast Cell Cycle Finish Start S Cell division M anaphase M metaphase G1   Cln2Clb5 Sic1Sic1P Sic1Clb5 Sic1Clb2 Clb2 Ccd20 SBFMBF Hct1 Budding APC Progression through cell cycle Production, degradation, complex formation Activation Inhibition Active protein or complex Inactive protein or complex APC

VL Netzwerke, WS 2007/08 Edda Klipp 56 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Cell Cycle Models Cyclin M+M+ M X+X+ X vivi vdvd v1v1 v2v2 v3v3 v4v4 Minimal model taking into account a cyclin, a cyclin dependent kinase (CDK = M) and a protease (X). M and X may assume active and inactive states. Model shows oscillations. (Goldbeter, 1991) ODE models of increasing complexity (Tyson & Novak groups, ), including cyclins, CDKs, transcriptional activators and repressors. Shows oscillations, with some tricks.

VL Netzwerke, WS 2007/08 Edda Klipp 57 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Yeast Cell Cycle – Data

VL Netzwerke, WS 2007/08 Edda Klipp 58 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Yeast Cell Cycle – Data

VL Netzwerke, WS 2007/08 Edda Klipp 59 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Yeast Cell Cycle – Model

VL Netzwerke, WS 2007/08 Edda Klipp 60 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Yeast Cell Cycle – Model

VL Netzwerke, WS 2007/08 Edda Klipp 61 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Yeast Cell Cycle – Model Regulatory interactions of 20 genes of S.cerevisiae. The full arcs represent activatory regulation, the dashed arcs represent inhibitory regulation. The relationship between genes regulating one common gene is described by ‘OR’-function.

VL Netzwerke, WS 2007/08 Edda Klipp 62 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Identification Given: Experimental data Demanded: Network connectivity and Boolean rules

VL Netzwerke, WS 2007/08 Edda Klipp 63 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Identification

VL Netzwerke, WS 2007/08 Edda Klipp 64 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Identification

VL Netzwerke, WS 2007/08 Edda Klipp 65 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification Problem Let (I j ;O j ) be a pair of expression patterns of {v 1 ; … ; v n }, where I j corresponds to the INPUT and O j corresponds to the OUTPUT. We call the pair (I j ;O j ) an example. Formally, it is defined the identification problem. Relating to the identification problem, it is also defined the consistency problem, the counting problem and the enumeration problem.  A node v i in a Boolean network G(V;F) is consistent with an example (I j ;O j ) if O j (v i ) = f i (I j (v i1 ); … ; I j (v ik )) holds.  A Boolean network G(V;F) is consistent with (I j ;O j ) if all nodes are consistent with (I j ;O j ).  For a set of examples EX = {(I 1 ;O 1 ); (I 2 ;O 2 );…; (I m ;O m )}, network G(V;F) (resp. node v i ) is consistent with EX if G(V;F) (resp. node v i ) is consistent with all (I j ;O j ) for 1≤ j ≤ m.

VL Netzwerke, WS 2007/08 Edda Klipp 66 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification Problem  CONSISTENCY: Given N (the number of nodes) and EX, decide whether or not there exists a Boolean network consistent with EX and output one if it exists;  COUNTING: Given N and EX, count the number of Boolean networks consistent with EX ;  ENUMERATION: Given N and EX, output all the Boolean networks consistent with EX ;  IDENTIFICATION: Given N and EX, decide whether or not there exists a unique Boolean network consistent with EX and output it if it exists.

VL Netzwerke, WS 2007/08 Edda Klipp 67 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification Problem

VL Netzwerke, WS 2007/08 Edda Klipp 68 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification: Algorithm for the consistency problem: The algorithm below is natural and conceptually very simple since it simply outputs Boolean functions consistent with given examples: (1) For each node v i  V, execute STEP (2). (2) If there exists a triplet (f i ; v k ; v h ) satisfying O j (v i ) = f i (I j (v k ); I j (v h )) for all j = 1; … ; m, output f i as a Boolean function assigned to v i and output v k ; v h as input nodes to v i. In order to find a triplet (f i ; v k ; v h ), we use a simple exhaustive search: for each pair of nodes (v k ; v h ) (k < h) and for each Boolean function f, we check whether or not O j (v i ) = f(I j (v k ); I j (v h )) holds for all j.

VL Netzwerke, WS 2007/08 Edda Klipp 69 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification: Algorithm For the enumeration problem: replace STEP (2) with the following: (2') Enumerate all triplets (f i ; v k ; v h ) satisfying O j (v i ) = f i (I j (v k ); I j (v h )) for all j = 1; … ;m. Then, any combination of triplets ((f 1 ; v k1 ; v h1 ); (f 2 ; v k2 ; v h2 ); … ; (f n ; v kn ; v hn )) can represent a consistent Boolean network. Of course, we carefully enumerate triplets since there exists more than two triplets which represent the same Boolean function (such as v k Λ v h and v h Λ v k ).

VL Netzwerke, WS 2007/08 Edda Klipp 70 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification: Algorithm For the counting problem: simply multiply the number of triplets consistent with each node. For the identification problem: replace STEP (2) with the following: (2") If there exists only one triplet (f i ; v k ; v h ) satisfying O j (v i ) = f i (I j (v k ); I j (v h )) for all j = 1; … ;m. output f i as a Boolean function assigned to v i and output v k ; v h as input nodes to v i.

VL Netzwerke, WS 2007/08 Edda Klipp 71 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Identification: Time Complexity There are Boolean functions with K input variables. There are (possible) combinations of input nodes per node. For each node, triplets are examined in the algorithm. For each triplets, m examples are examined. Therefore, pairs of Boolean functions and examples are examined in total. In order to examine one pair, O(K) time is required. Therefore, the algorithm works in time. Thus, the algorithm works in polynomial time for fixed K. Similarly, we can show that the algorithms for the counting problem and the identification problem work in polynomial time for fixed K. For any Boolean network of fixed K, O(log N) INPUT/OUTPUT pairs are sufficient with high probability.

VL Netzwerke, WS 2007/08 Edda Klipp 72 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Identification: Reveal REVEAL. The results suggested that only a small number of state transition pairs (100 pairs from ) were sufficient for inferring Boolean networks with 50 nodes (genes) whose indegree (the number of input nodes to a node) was bounded by 3.

VL Netzwerke, WS 2007/08 Edda Klipp 73 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Boolean Network Identification: Reveal Information theoretic principles of mutual information ( M ) analysis Information theory provides us with a quantitative information measure, the Shannon entropy, H. The Shannon entropy is defined in terms of the probability of observing a particular symbol or event, p i, within a given sequence (Shannon & Weaver, 1963), H= -  p i log p i.

VL Netzwerke, WS 2007/08 Edda Klipp 74 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Shannon Entropy In a binary system, an element, X, may be in either of s=2 states, say on or off. Over a particular sequence of events, the sum of the probabilities of X being on, p(1) or off, p(0) must be equal to unity, therefore p(1)=1-p(0), and H(X)=-p(0)*log[p(0)]-[1-p(0)] *log[1-p(0)]. H reaches its maximum when the on and off states are equiprobable, i.e. the system is using each information carrying state to its fullest possible extent. As one state becomes more probable than the other, H decreases - the system is becoming biased. In the limiting case, where one probability is unity (certainty) and the other(s) zero (impossibility), H is zero (no uncertainty - no freedom of choice - no information). The maximum entropy, H max, occurs when all states are equiprobable, i.e. p(0)=p(1) =1/2. Accordingly, H max =log(2). Entropies are commonly measured in “bits” (binary digits), when using the logarithm on base 2 ; e.g. H max =1 for a 2 state system.

VL Netzwerke, WS 2007/08 Edda Klipp 75 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Shannon Entropy

VL Netzwerke, WS 2007/08 Edda Klipp 76 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Shannon Entropy Determination of H. a) Single element. Probabilities are calculated from frequency of on/off values of X and Y.

VL Netzwerke, WS 2007/08 Edda Klipp 77 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Shannon Entropy: Co-Occurrence Determination of H. b) Distribution of value pairs. H is calculated from the probabilities of co-occurrence. H(X)= -  p i log p i, H(Y)= -  p j log p j, and H(X, Y) = -  p i, j log p i, j

VL Netzwerke, WS 2007/08 Edda Klipp 78 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Conditional Entropies There are 2 conditional entropies which capture the relationship between the sequences of X and Y, H(X|Y) and H(Y|X). These are related as follows (Shannon & Weaver, 1963): H(X,Y) = H(Y|X) + H(X) = H(X|Y) + H(Y). In words, the uncertainty of X and the remaining uncertainty of Y given knowledge of X, H(Y|X), i.e. the information contained in Y that is not shared with X, sum to the entropy of the combination of X and Y.

VL Netzwerke, WS 2007/08 Edda Klipp 79 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Mutual Information “Mutual information”, M(X,Y), also referred to as “rate of transmission” between an input/output channel pair (Shannon & Weaver, 1963) is defined as: M(X,Y) = H(Y) - H(Y|X) = H(X) - H(X|Y). The shared information between X and Y corresponds to the remaining information of X if we remove the information of X that is not shared with Y. Using the above equations, mutual information can be defined directly in terms of the original entropies; this formulation will be important for the considerations below: M(X,Y) = H(X) + H(Y) - H(X,Y).

VL Netzwerke, WS 2007/08 Edda Klipp 80 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Mutual Information Venn diagrams of information relationships. In each case, add the shaded portions of both squares to determine one of the following: [H(X)+H(Y)], H(X,Y), and M(X,Y). The small corner rectangles represent information that X and Y have in common. H(Y) is shown smaller than H(X) and with the corner rectangle on the left instead of the right to indicate that X and Y are different, although they have some mutual information.

VL Netzwerke, WS 2007/08 Edda Klipp 81 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics REVEAL Algorithm

VL Netzwerke, WS 2007/08 Edda Klipp 82 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics REVEAL Algorithm 1. Identification of perfect input-output state pairs of connectivity k=1 Compute the mutual information of all input-output state vector pairs. The calculation of the mutual information values reveals that H(a t+1 ;a t )=H(a t+1 ), i.e. at uniquely determines. Likewise, H(d t+1 ;c t )=H(d t+1 ), i.e. ct uniquely determines d t+1. For all other genes there is no perfect match. 2. Determination of the rules for the identified pairs at k=1. We retrieve the rules a(t+1)=a(t) and d(t+1)=not c(t) by the respective rule tables. 3. Identification of perfect input-output state pairs of connectivity k=2 If not all rules can be retrieved by k=1 we consider k=2, by comparing the output state vectors of the remaining genes all possible pairs of input state vectors. The calculation gives H(b t+1 ;c t,d t )=H(b t+1 ), i.e. the pair c t, d t determines b t+1. Likewise, H(c t+1 ;a t,d t )=H(b t+1 ), i.e. the pair a t,d t determines c t Determination of the rules for the identified pairs at k=2 We retrieve the rules b(t+1)= (not c(t)) and d(t) and likewise c(t+1)=a(t) and b(t). 5. Identification of perfect input-output state pairs of connectivity k=p 6. Determination of the rules for the identified pairs at k=p. Stop, if all genes have been assigned a rule, otherwise increment p and go to 5.

VL Netzwerke, WS 2007/08 Edda Klipp 83 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics REVEAL Algorithm: Other example

VL Netzwerke, WS 2007/08 Edda Klipp 84 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics REVEAL Algorithm: Other example