Discrete models of biological networks Segunda Escuela Argentina de Matematica y Biologia Cordoba, Argentina June 29, 2007 Reinhard Laubenbacher Virginia.

Slides:



Advertisements
Similar presentations
Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.
Advertisements

Stochastic algebraic models SAMSI Transition Workshop June 18, 2009 Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Slides from: Doug Gray, David Poole
Polynomial dynamical systems over finite fields, with applications to modeling and simulation of biological networks. IMA Workshop on Applications of.
A Mathematical Formalism for Agent- Based Modeling 22 nd Mini-Conference on Discrete Mathematics and Algorithms Clemson University October 11, 2007 Reinhard.
An Intro To Systems Biology: Design Principles of Biological Circuits Uri Alon Presented by: Sharon Harel.
Fast Algorithms For Hierarchical Range Histogram Constructions
DYNAMICS OF RANDOM BOOLEAN NETWORKS James F. Lynch Clarkson University.
Kick-off Meeting, July 28, 2008 ONR MURI: NexGeNetSci Distributed Coordination, Consensus, and Coverage in Networked Dynamic Systems Ali Jadbabaie Electrical.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
An Introduction to Variational Methods for Graphical Models.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.
Markov Chains Lecture #5
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Introduction to Gröbner Bases for Geometric Modeling Geometric & Solid Modeling 1989 Christoph M. Hoffmann.
Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
6. Gene Regulatory Networks
Discrete models of biochemical networks Algebraic Biology 2007 RISC Linz, Austria July 3, 2007 Reinhard Laubenbacher Virginia Bioinformatics Institute.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Annotation and Alignment of the Drosophila Genomes.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Annotation and Alignment of the Drosophila Genomes.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Important Problem Types and Fundamental Data Structures
Radial Basis Function Networks
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Bayes Net Perspectives on Causation and Causal Inference
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
What Is a Gene Network?. Gene Regulatory Systems “Programs built into the DNA of every animal.” Eric H. Davidson.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
ENM 503 Lesson 1 – Methods and Models The why’s, how’s, and what’s of mathematical modeling A model is a representation in mathematical terms of some real.
Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 February 2007 William.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
Algorithms and their Applications CS2004 ( ) 13.1 Further Evolutionary Computation.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Copyright © Cengage Learning. All rights reserved.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5: Power of Heuristic; non- conventional search.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Today Graphical Models Representing conditional dependence graphically
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Approximation Algorithms based on linear programming.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
OPERATING SYSTEMS CS 3502 Fall 2017
Learning Sequence Motif Models Using Expectation Maximization (EM)
Hidden Markov Models Part 2: Algorithms
Richard Anderson Lecture 25 NP-Completeness
V11 Metabolic networks - Graph connectivity
Markov Random Fields Presented by: Vladan Radosavljevic.
V11 Metabolic networks - Graph connectivity
V11 Metabolic networks - Graph connectivity
Presentation transcript:

Discrete models of biological networks Segunda Escuela Argentina de Matematica y Biologia Cordoba, Argentina June 29, 2007 Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech

Topics 1.Boolean networks and cellular automata (including probabilistic and sequential BNs) 2.Polynomial dynamical systems over finite fields 3.Logical models 4.Dynamic Bayesian networks

Boolean networks Definition. Let f 1,…,f n be Boolean functions in variables x 1,…,x n. A Boolean network is a time-discrete dynamical system f = (f 1,…,f n ) : {0, 1} n → {0, 1} n The state space of f is the directed graph with the elements of {0,1} n as nodes. There is a directed edge b → c iff f(b) = c.

f 1 = : x 2 f 2 = x 4 OR (x 1 AND x 3 ) f 3 = x 4 AND x 2 f 4 = x 2 OR x 3 Boolean networks

The phase plane Compound y Compound x dx /dt = f (x,y) dy /dt = g(x,y) (x o,y o ) dx = f (x o,y o ) dt dy = g(x o,y o ) dt Courtesy J. Tyson

Cellular automata Definition. A 1-dimensional (binary) cellular automaton (CA) f is a Boolean network f in which f i only depends on some or all of x i-1, x i, x i+1 (modulo n). Example. f i = x i-1 XOR x i+1.

t =1: t =2: Initial State: t =3: t =4: t =5: t =6: t =7: t =8: t =9: Example

Rule 90 with 5 nodes f(x 1,x 2,…,x 5 ) = (x 5 XOR x 2, x 1 XOR x 3, …, x 4 XOR x 1 )

Boolean network models in biology Stuart A. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets J. Theor. Biol. 22 (1969) Boolean networks as models for genetic regulatory networks: Nodes = genes, functions = gene regulation Variable states: 1 = ON, 0 = OFF

Polynomial dynamical systems Note: {0, 1} = k has a field structure (1+1=0). Fact: Any Boolean function in n variables can be expressed uniquely as a polynomial function in k[x 1,…,x n ] /, and conversely. Proof: x AND y = xy x OR y = x+y+xy NOT x = x+1 (x XOR y = x+y)

Polynomial dynamical systems Let k be a finite field and f 1, …, f n  k[x 1,…,x n ] f = (f 1, …, f n ) : k n → k n is an n-dimensional polynomial dynamical system over k. Natural generalization of Boolean networks. Fact: Every function k n → k can be represented by a polynomial, so all finite dynamical systems k n → k n are polynomial dynamical systems.

Example k = F 3 = {0, 1, 2}, n = 3 f 1 = x 1 x 2 2 +x 3, f 2 = x 2 +x 3, f 3 = x 1 2 +x 2 2. Dependency graph (wiring diagram)

Sequential polynomial systems k = F 3 = {0, 1, 2}, n = 3 f 1 = x 1 x 2 2 +x 3 f 2 = x 2 +x 3 f 3 = x 1 2 +x 2 2 σ = (2 3 1) update schedule: First update f 2. Then f 3, using the new value of x 2. Then f 1, using the new values of x 2 and x 3.

Sequential systems as biological models Different regulatory processes happen on different time scales Stochastic effects in the cell affect the “update order” of variables representing different chemical compounds at any given time Therefore, sequential update in models of regulatory networks adds realistic feature.

Stochastic models Polynomial dynamical systems (PDSs) can be modified: Choose random update order for each update (see Sontag et al. for Boolean case) Choose an update function at random from a collection at each update (see Shmulevich et al. for Boolean case)

Open mathematical problems Determine the relationship between the structure of the f i and the dynamics of the system for special classes of models (see later lectures). Determine the effect of the update schedule on dynamics. Develop a categorical framework for (sequential/stochastic) PDSs. Determine and study a good class of “biologically meaningful” polynomial functions.

Example A.Jarrah, B. Raposa, and R. Laubenbacher, Nested canalyzing, unate cascade, and polynomial functions, Physica D, in press

Logical models E. Snoussi and R. Thomas Logical identification of all steady states: the concept of feedback loop characteristic states Bull. Math. Biol. 55 (1993) Key model features: Time delays of different lengths for different variables are important Positive and negative feedback loops are important

Model description Basic structure of logical models: 1.Sets of variables x 1, …, x n ; X 1, …, X n (X i = genes and x i = gene products, e.g., proteins. A gene product x regulates a gene Y, with a certain time delay.) Each variable pair x i, X i takes on a finite number of distinct states or thresholds (possibly different for different i), corresponding to different modes of action of the variables for different concentration levels.

Model description (cont.) 2. A directed weighted graph with the x i as nodes and threshold levels, indicating regulatory relationships and at what levels they occur. Each edge has a sign, indicating activation (+) or inhibition (-). 3. A collection of “logical parameters” which can be used to determine the state transition of a given node for a given configuration of inputs.

Features of logical models Sophisticated models that include many features of real networks Ability to construct continuous models based on the logical model specification Models encode intuitive network properties Ability to relate structure (+ and - feedback loops) to dynamics (multistationarity, fixed pt vs. periods)

An Example x y z

Features of logical models Include many features of real biological networks Intuitive but complicated formalism and model description Difficult to study as a mathematical object Difficult to study dynamics for larger models

Dynamic Bayesian networks Definition. A Bayesian network (BN) is a representation of a joint probability distribution over a set X 1, …, X n of random variables. It consists of an acyclic graph with the X i as vertices. A directed edge indicates a conditional dependence relation a family of conditional distributions for each variable, given its parents in the graph

An example

Inference Bayes’ rule: P(R=r | e) = P(e | R=r)P(R=r)/P(e) Cond. Prob.: P(A | B) = P(A∩B)/P(B)

BN models of gene regulatory networks Can use BNs to model gene regulatory networks: Random variables X i ↔ genes Directed edges ↔ regulatory relationships Problem: BNs cannot have directed loops. Hence cannot model feedback loops.

Dynamic Bayesian networks Definition. A dynamic Bayesian network (DBN) is a representation of the stochastic evolution of a set of random variables {X i }, using discrete time. It has two components: a directed graph (V, E) encoding conditional dependence conditions (as before); a family of conditional probability distributions P(X i (t) | Pa i (t-1)), where Pa i = {X j | (X j, X i )  E} (Doyer et al., BMC Bioinformatics (2006) 7)

Dynamic Bayesian networks DBNs generalize Hidden Markov Models and linear dynamical systems. Recently used for inference of gene regulatory networks from time courses of microarray data.

Summary Modeling frameworks: Boolean networks Polynomial dynamical systems Logical models Dynamic Bayesian networks (Petri nets)

Model inference from data Goal: Given a set of experimental observations, infer the most likely model of the network that generated the data. Model framework: polynomial dynamical systems over a finite field

Data discretization Step 1: Discretize real-valued data into finitely many states. This is a difficult problem. E. Dimitrova, P. Vera-Licona, J. McGee, and R. Laubenbacher, Comparison of data discretization methods for inference of biochemical networks.

Model inference from data Variables x 1, …, x n with values in a finite field k. (s 1, t 1 ), …, (s r, t r ) state transition observations with s j, t j  k n. Goal: Identify a collection of “ best ” dynamical systems f=(f 1, …,f n ): k n → k n such that f(s j )=t j for all j.

Network inference Problem: Given D={(s j, t j )  k n ×k}, find the “most likely” model f: k n → k such that f(s j ) = t j Let M = {f: k n → k | f(s j ) = t j } be the subset of k[x 1, …, x n ] of all possible models for a particular variable.

Network inference Let f, g  M. Then f(s j ) = g(s j ) for all j. So (f-g)(s j ) = 0 for all j. Let I = {h  k[x 1, …, x n ] | h(s j )=0 for all j} Let f 0 be any element of M. Then M = f 0 +I. Note that I is an ideal, since it is closed under + and × by arbitrary polynomials.

Model selection In the absence of additional network information, choose a “minimal” model f from M ( f only reflects relationships among variables that are inherent in the data ) If f = hg +f’, with g  I and f’ is not divisible by any r  I, then f’ is preferable to f because hg vanishes on all s j.

Model selection Strategy: 1.Compute f 0  M and the coset f 0 +I. 2.Compute f  f 0 +I with the property that f is not divisible by any g  I. Could use other criteria for model selection: f must contain certain variables and can’t contain others. Could also require certain constraints on the dynamics.

Fundamental computational problem Given I and f, decide whether f  I. If not, compute the remainder of f under “division by I.” This is known as the “ideal membership problem.” This problem can be solved by Gröbner basis theory.

Wiring diagrams Goal: Compute all possible minimal wiring diagrams for a given data set. Wiring diagram: Vertices = variables Edges: x i → x j if x i is involved in the regulation of x j, that is, if x i appears in f j.

Wiring diagrams Problem: Given data (s i, t i ), i=1, …, r, ( a collection of state transitions for one node in the network ), find all minimal (wrt inclusion) sets of variables y 1, …, y m  {x 1, …, x n } such that (f 0 +I) ∩ k[y 1, …, y m ] ≠ Ø. Each such minimal set corresponds to a minimal wiring diagram for the variable under consideration.

The “minimal sets” algorithm For a  k, let X a = {s i | t i = a}. Let X = {X a | a  k}. Then f 0 +I = M = {f  k[x 1, … x n ] | f(p) = a for all p  X a }. Want to find f  M which involves a minimal number of variables, i.e., there is no g  M whose support is properly contained in the supp(f).

Example Let n = 5, k = F 5. Let (s 1, t 1 ) = [(3, 0, 0, 0, 0); 3] (s 2, t 2 ) = [(0, 1, 2, 1, 4); 1] (s 3, t 3 ) = [(0, 1, 2, 1, 0); 0] (s 4, t 4 ) = [(0, 1, 2, 1, 1); 0] (s 5, t 5 ) = [(1, 1, 1, 1, 3); 4] Then X 0 = {s 3, s 4 }, X 1 = {s 2 }, X 2 = Ø, X 3 = {s 1 }, X 4 = {s 5 }.

The algorithm Definitions. For F  {1, …, n}, let R F = k[x i | i  F]. Let Δ X = {F | M ∩ R F ≠ Ø }. For p  X a, q  X b, a ≠ b  k, let m(p, q) =  pi≠qi x i. Let M X = monomial ideal in k[x 1, …, x n ] generated by all monomials m(p, q) for all a, b  k. (Note that Δ X is a simplicial complex, and M X is the face ideal of the Alexander dual of Δ X.)

The algorithm Proposition. A subset F of {1, …, n} is in Δ X if and only if the ideal contains the ideal M X. Proof. Let F  Δ X. Then Y ∩ R F ≠ Ø. Let p  X a and q  X b, with a ≠ b. Then there is f  k[x i | i  F] such that f(p) = a and f(q) = b. So p and q differ in a coordinate j  F. Hence m(p, q) contains x j as a factor, so is contained in I =. Therefore, M X  I.

The algorithm Conversely, suppose M X . Then all generators m(p, q) are in terms of the x i, i  F. Therefore, p  X a and q  X b differ in coordinates i  F. For p  X a and for all a  k, define f to be the polynomial function f(p) = a for p  X a, for all a  k; f(p) = 0 otherwise. Then f  M and depends only on variables x i, i  F. Hence f  M ∩ R F. This completes the proof.

The algorithm Corollary. To find all possible minimal wiring diagrams, we need to find all minimal subsets of variables y 1, …, y m such that M X is contained in.

Example Let M X =. Then M X = ∩ = ∩ = ∩ ∩ ∩ = ∩ ∩. (primary decomposition of M X ) Therefore, the collection of minimal wiring diagrams includes {x 1, x 2 }, {x 1, x 3 }, {x 2, x 4 }. (minimal primes in the primary decomp.) Can be done algorithmically, implemented in computer algebra systems.

Model selection How do we choose a “best” one from this list? Example of a scoring method. (See alternative methods in (Jarrah, L., Stigler, Stillman)). First assign a score to each variable x i, i=1, …,n. Then use these scores to assign a score to each minimal variable set. Choose the minimal set with the highest score.

Scoring method Let F = {F 1, …, F t } be the output of the algorithm. For s = 1, …, n, let Z s = # sets in F with s elements. For i = 1, …, n, let W i (s) = # sets of size s which contain x i. S(x i ) := ΣW i (s) / sZ s where the sum extends over all s such that Z s ≠ 0. T(F j ) := Π xi  Fj S(x i ). Normalization  probability distribution on F of min. var. sets This scoring method has a bias toward small sets.

Example F 1 = {x 1, x 2 }, F 2 = {x 1, x 3 }, F 3 = {x 2, x 4 } Z 1 =0, Z 2 =3, Z 3 =Z 4 =0; W 1 (2) = 2, W 2 (2) = 2, W 3 (2) = 1, W 4 (2) = 1. S(x 1 ) = 2/2·3 = 1/3 = S(x 2 ), S(x 3 ) = 1/2·3 = 1/6 = S(x 4 ). T(F 1 ) = 1/9, T(F 2 ) = 1/18 = T(F 3 ).

Example with data x1: { {x1, x3}, {x1, x2, x4}, {x2, x3, x4}} x2: {{x1}, {x2, x3} } x3: { {x1, x3}, {x1, x2, x4}, {x2, x3, x4}} x4: { {x1, x3}, {x2, x3}, {x1, x2, x4} } returned from min. sets algorithm

Example with data Consider the variable sets for variable x1: F1 = {x1, x3} F2 = {x1, x2, x4} F3 = {x2, x3, x4} Highest scoring set(s) for each variable x1: {x1, x3} x2: {x1} x3: {x1, x3} x4: {{x2, x3}, {x1, x3}} S(x1) = 1/1 + 1/(2×3) = 7/6 S(x2) = 2/(2×3) = 1/3 S(x3) = 1/1 + 1/(2×3) = 7/6 S(x4) = 2/(2×3) = 1/3 T(F1) = (7/6)(7/6) = 49/36 T(F2) = (7/6)(1/3)(1/3) = 7/54 T(F3) = (1/3)(7/6)(1/3) = 7/54 winner

Method Validation: Segment polarity network in the fruitfly Network in cell: 21 genes, proteins Albert model: 21 Bool. functions –44 known interactions Time series data –Generated wildtype, knockout –< 0.01% of 2 21 total states Minimal Sets Algorithm –89% interactions –0 false positives, 5 false negatives –PDS: identified 19/21 functions J. Theor. Biol The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster Albert and Othmer

Pandapas network 10 genes, 3 external biochemicals 17 interactions Time series data: 9 time points Generated 8 time series for wildtype, knockouts G1, G2, G5 192 data points G6, G9 constant Data discretization 5 states per node 95 data points –49% reduction –< % of 5 13 total states Method Validation: Simulated gene network

Minimal Sets Algorithm 77% interactions Identified targets of P2, P3 (x12, x13) 11 false positives, 4 false negatives PandapasReverse engineered

Summary Algorithmic method to find all possible minimal wiring diagrams, given a data set: Finds all possible minimal sets of variables for which there exists a PDS that is consistent with the data. Provides a statistical measure to select most likely wiring diagram(s). This algorithm can be used as a preprocessing step for the previous algorithm that actually finds dynamical models. It improves algorithm performance by reducing the variables to be considered.

Optimization Goal: For a given data set, select a model from M which is optimal with respect to Model complexity Properties of wiring diagram Expected dynamic properties

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ TGTACTTAC Each alignment can be summarized by counting the number of matches ( #M ), mismatches ( #X ), gaps ( #G ), and spaces ( #S ). #M=31, #X=22, #G=3, #S=12 #M=27, #X=18, #G=3, #S=28 2(#M+#X)+#S=112 so #X,#G and #S suffice to specify a summary. This notation follows Chapter 7 (Parametric Sequence Alignment) by Colin Dewey and Kevin Woods in the book Algebraic Statistics for Computational Biology. Courtesy Lior Pachter

>mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC For the sequences: 49 #x=24, #S=10, #G=2 There are eight alignments that have this summary. the alignment polytope is: Courtesy Lior Pachter

Parametric sequence alignment Choose parameters a, b, c and minimize the linear functional f(M, G, S) = aM+bG+cS over the convex polytope spanned by the summaries of all possible alignments of the two sequences. Theorem (Pachter, Sturmfels) This polytope can be described as the Minkowski sum of the Newton polytopes of a collection of polynomials.

The Dynamotope (joint work with A. Jarrah, B. Sturmfels, P. Vera-Licona) Define the summary (S 1, S 2, S 3, S 4 ) of a polynomial model g  f + I: S 1 = w 1 ·(u 1, u 2, u 3, … ), where u i is the number of limit cycles of length i, and w 1 is a suitably chosen weight vector. + w 2 ·(v 1, v 2, v 3, … ), where v i is the number of trees of height i, and w 2 is a suitably chosen weight vector. S 2 = number of edges in the dependency graph of g. S 3 = “complexity” of g (including complexity of the polynomials g i and the “distance to being a normal form”).

Let w 1 and w 2 be (1, 1, …) S 1 = (1,1,…)·(1, 1, 0, 1) = 3 S 2 = (1,1,…)·(0, 0, 1, 1, 1) = 3

Optimization Choose parameters a, b, c, d. Minimize the linear functional F = aS 1 + bS 2 + cS 3 + dS 4 on the convex polytope (dynamotope) spanned by all summaries (S 1, S 2, S 3, S 4 ) of models in f + I.

Optimization Problem: Don’t know how to describe this polytope. Solution: Combinatorial optimization using an evolutionary algorithm.

Evolutionary algorithm For f=(f 1, …,f n ) Gene = f i Chromosome = f Genotype = {f}

Evolution Step 1: Choose an initial genotype Step 2: Use mutation, cross-over of fittest models (with respect to linear functional F) to compute the next generation genotype Step 3: Iterate many times Step 4: Choose local/global minimum (if found)

Future work Optimal parameter choices for different biological problems Further validation of the algorithm with real and simulated data sets Characterize the dynamotope computationally Study optimal experimental design for this type of network inference