Download presentation
Presentation is loading. Please wait.
Published byTimothy Cooper Modified over 9 years ago
1
1 Pattern storage in gene-protein networks Ronald Westra Department of Mathematics Maastricht University
2
2 1. Problem formulation 2. Modeling of gene/proteins interactions 3. Information Processing in Gene-Protein Networks 4. Information Storage in Gene-Protein Networks 5. Conclusions Items in this Presentation
3
3 1. Problem formulation How much genome is required for an organism to survive in this World? Some observations...
4
4 Mycoplasma genitalium 500 nm 580 Kbp 477 genes 74% coding DNA Obligatory parasitic endosymbiont Nanoarchaeum equitans 400 nm 460 Kbp 487 ORFs 95% coding DNA Obligatory parasitic endosymbiont SARS CoV 100 nm 30 Kbp 5 ORFs 98% coding DNA Retro virus Minimal genome sizes
5
5 Organisms like Mycoplasma genitalium, Nanoarchaeum equitans, and the SARS Corona Virus are able to exhibit a large amount of complex and well-tuned behavioral patterns despite an extremely small genome A pattern of behaviour here is the adequate conditional sequence of responses of the gene-protein interaction network to an external input: light, oxygen-stress, pH, feromones, and numerous organic and anorganic molecules.
6
6 Questions: * How do gene-protein networks perform computations and how do they process real time information? * How is information stored in gene-protein networks? * How do processing speed, computation power, and storage capacity relate to network properties? Problem formulation
7
7 CENTRAL THOUGHT [1] What is the capacity of a gene-protein network to store input-output patterns, where the stimulus is the input and the behaviour is the output. How does the pattern storage capacity of an organism relate to the size of its genome n, and the number of external stimuli m?
8
8 CENTRAL THOUGHT [2] Conjecture: The task of reverse engineering a gene regulatory network from a time series of m observations, is actually identical to the task of storing m patterns in that network. In the first case an engineer tries to design a network that fits the observations; in the second case Nature selects those networks/organisms that best perform the input-output mapping.
9
9 Requirements For studying the pattern storage capacity of a gene- protein interaction system we need: 1. a suitable parametrized formal model 2. a method for fixing the model parameters with the given set of input-parameters We will visit these items in the following slides...
10
10 2. Modeling the Interactions between Genes and Proteins Prerequisite for the successful reconstruction of gene-protein networks is the way in which the dynamics of their interactions is modeled.
11
11 Components in Gene-Protein networks Genes:ON/OFF-switches RNA&Proteins: vectors of information exchange between genes External inputs:interact with higher-order proteins
12
12 General state space dynamics The evolution of the n-dimensional state space vector x (gene expressions) depend on p-dim inputs u, parameters θ and Gaussian white noise ξ.
13
13 external inputs genes/proteins input-coupling interaction- coupling Example of an general dynamics network topology
14
14 The general case is too complex Strongly dependent on unknown microscopic details Relevant parameters are unidentified and thus unknown Therefore approximate interaction potentials and qualitative methods seem appropriate
15
15 1. Linear stochastic state-space models Following Yeung et al. 2003 and others x :the vector (x 1, x 2,..., x n ) where x i is the relative gene expression of gene ‘í’ u :the vector (u 1, u 2,..., u p ) where u i is the value of external input ‘í’ (e.g. a toxic agent) νξ(t):white Gaussian noise
16
16 2. Piecewise Linear Models Following Mestl, Plahte, Omhold 1995 and others b il sum of step-functions s +,–
17
17 3. More complex non-linear interaction models Example: including quadratic terms;
18
18 Our mathematical framework for non-linear gene-protein interactions
19
19 3. Information processing in sparse Hierarchic gene-protein networks Consider a network as described before with only a few connections (=sparse) and where few genes/proteins control the a considerable amount of the others (=hierarchic)
20
20 Information Processing in random sparse Gene-Protein Interactions random sparse network, n=64, k=2largest cluster therein
21
21 Information Processing in random sparse Gene-Protein Interactions Now consider the information processing time (= #iterations) necesary to reach all nodes (proteins) as a function of: The number of connections (= #non-zero- elements) in the network
22
22 phase transition from slow to fast processing
23
23
24
24 * Ben-Hur, Siegelmann: Computation with Gene Networks, Chaos, January 2004 * Skarda and Freeman: How brains make chaos in order to make sense of the world, Behavioral and brain sciences, Vol. 10 1987 Philosophy: Information is stored in the network topology (weights, sparsity, hierarchy) and the system dynamics 4. Memory storage in gene-protein networks
25
25 We assume a hierarchic, non-symmetric, and sparse gene/protein network (with k out of n possible connections/node) with linear state space dynamics Suppose we want to store M patterns in the network Memory storage in gene-protein networks
26
26 Linearized form of a subsystem First order linear approximation of system separates state vector x and inputs u.
27
27 input-output pattern: The organism has (evolutionary) learned to react to an external input u (e.g. toxic agent, viral infection) with a gene-protein activity x(t). This combination (x,u) is the input-output PATTERN
28
28 Memory Storage = Network Reconstruction Using these definitions it is possible to map the problem of pattern storage to the * solved * problem of gene network reconstruction with sparse estimation
29
29 Information Pattern: Now, suppose that we have M patterns we want to store in the network:
30
30 The relation between the desired patterns (state derivatives, states and inputs) defines constraints on the data matrices A and B, which have to be computed. Pattern Storage: method 1.0
31
31 Computing the optimal A and B for storing the Patterns The matrices A and B, are sparse (most elements are zero): Using optimization techniques from robust/sparse optimization, this problem can be defined as: Pattern Storage: method 1.0
32
32 Number of retrieval errors as a function of the number of nonzero entries k, with: M = 150 patterns, N = 50000 genes. 1st order phase transition from error-free memory retrieval kCkC
33
33 kCkC Number of retrieval errors versus M with fixed N = 50000, k = 10. 1st order phase transition to error-free memory retrieval
34
34 Critical number of patterns M crit versus the problem size N,
35
35 Pattern Storage: method 2.0 A pattern corresponds to a converged state of the system hence: Therefore a sparse system ∑ = {A,B} is sought that maps the inputs to the patterns {U,X}, which leads to:
36
36 LP: subject to: 1.condition for stationary equilibrium: 2.condition to avoid A = B = 0 : 3.avoid A = 0 by using degradation of proteins and auto-decay of genes: diag(A) < 0 Computing optimal sparse matrices
37
37 The sparsity in the gene/protein interaction matrix A is k A : the number of non-zero elements in A This can be scaled to the size of A: N, and we obtain: p A = k A /N, Similarly for the input-coupling B: p B = k B /P. The sparsity in A and B
38
38 BA Results: A B gene-gene input-gene
39
39 BA A B gene-gene input-gene
40
40 sparsity versus the number of stored patterns There are three distinct regions with different ‘learning’ strategies separated by order transitions A B gene-gene input-gene
41
41 sparsity versus the number of stored patterns Region I : all information is exclusively stored in B. Region II : information is preferably stored in A. Region III : no clear preference for A or B, Highest ‘order’. Highest ‘disorder’. A B gene-gene input-gene
42
42 sparsity versus the number of stored patterns I : ‘impulsive’ II : ‘rational’ III : ‘hybrid’. A B gene-gene input-gene
43
43 The entropy of the macroscopic system relates to the relative fraction of connections p A and p B as: As A and B are indiscernible the total entropy is: Phase transitions and entropy
44
44 The entropy of the microscopic system A relates to the degree distribution: the number of connections f i of node i. Let P(v) be the probability that a given node has v outgoing connections: and Information entropy
45
45 With P the Laplace distribution for large networks the average entropy per node converges to: Information entropy [2] With Euler's constant.
46
46 This also allows the computation of the gain in information entropy if one connection is added: Information gain per node If this formalism is applied to our network structure we obtain:
47
47 Left: the entropy S versus for n=100, p=30, based on 1180 observations, Right: the gain in entropy for the same data set. Again the three learning strategies are clearly visible {impulsive, rational, hybrid} Information gain per node
48
48 Relation between p A = k A /n and p B = k B /p averaged for 10116 measurements.. Relation between sparsities
49
49 5. Conclusions Non-linear time-invariant state space models for gene- protein networks exhibit a range of complex behaviours for storing input-output patterns in sparse representations. In this model information processing (=computing) and pattern storage (=learning) exhibit multiple distinct 1st and 2nd order continuous phase transitions There are two second-order phase transitions that divide the network learning in three distinct regions, ‘impulsive’, ‘rational’, ‘hybrid’.
50
50 Other members of trans-national University Limburg - Bioinformatics Research Team University of Hasselt (Belgium): Goele Hollanders (PhD student) Geert Jan Bex Marc Gyssens University of Maastricht (Netherlands): Stef Zeemering (PhD student) Karl Tuyls Ralf Peeters
51
51 Discussion … Ronald Westra Department of Mathematics Maastricht University
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.