Programming Languages for Biology Bor-Yuh Evan Chang November 25, 2003 OSQ Group Meeting
2 11/25/2003 Biological Perspective F [ FF [Matsudaira et al. Molecular Cell Biology 4.0. Freeman, 2000] F FF
3 11/25/2003 Traditional Biological Research Experiments must focus on a small, specific piece of a system –isolate the variable –feasibility Have led to an enormous wealth of (detailed) knowledge but in a fragmented form Cell Receptor Expert Virus Expert
4 11/25/2003 Systems Biology Emerging area of biology –study of the relationships and interactions between biological components –many thousand of molecules interact in complex series of reactions to perform some function (called a pathway) e.g., lactose interacting with a receptor triggers a series of actions to create the enzyme capable of breaking it down into usable form –“pathways” may overlap
5 11/25/2003 Approaching Systems Biology Need a common language of describing/modeling all components of a system –must be modular, compositional, and provided varying levels of abstraction AbstractionAbstraction is an absolute necessity –1 ribosome (eukaryotic) ¼ 82 proteins + rRNA 1 protein ¼ hundreds/thousands amino acids –1 membrane ¼ thousands of molecules (lipids, proteins, carbohydrates)
6 11/25/2003 The Biologist’s View How do biologists think about or view biological entities (e.g., proteins)? –an entity can interact with certain other types of entities –an entity can be in a certain “state” –interaction causes some action or state change computational processesAnalogous to a system of thousands of concurrent computational processes –Walter Fontana, a theoretical biologist, examined -calculus and linear logic for describing biological systems ( ¼ 1995).
7 11/25/2003 Example “Textbook” Description
8 11/25/2003 Our Role Finding suitable abstractions for describing computation is our specialty! Discovering/proving/checking properties of such descriptions (i.e., programs) is also our specialty! Goal: –Find a mathematical abstraction convenient for describing, reasoning, simulating biological systems DNA ! string over the alphabet {A,C,G,T} –enables the use of string comparison algorithms Cellular Pathways ! ?
9 11/25/2003 Outline Why PL is at all related to Biology? Previous Abstractions in Biology Possible Directions of Work PML Conclusion
10 11/25/2003 Previous Abstractions Chemical kinetic models –can derive differential equations –well-studied, with considerable theoretical basis –variables do not directly correspond with biological entities –may become difficult to see how multiple equations relate to each other
11 11/25/2003 Previous Abstractions Pathway Databases (e.g., EcoCyc, KEGG) –store information in a symbolic form and provide ways to query the database –behavior of biological entities not directly described Petri nets –directed bipartite multigraph (P,T,E) of places, transitions, and edges; places contain tokens –place = molecular species, token = molecule, transition = reaction 2
12 11/25/2003 Previous Abstractions Concurrent computational processes –each biological entity is a process that may carry some state and interacts with other processes –each process described by a “program” –prior proposals based on process algebras, such as the -calculus [Regev et al. ’01]
13 11/25/2003 Possible Directions of Work Biologically-motivated “process calculi” –finding a suitable machine model to serve as a common basis for describing biological systems –Cardelli, Danos, Laneve, … High-level languages –find suitable high-level languages to make descriptions closer to informal ones –[Chang and Sridharan ’03] Program analyses, simulation, and other tools –simulation will likely be insufficient Creating models for obtaining results in biology
14 11/25/2003 Outline Why PL is at all related to Biology? Previous Abstractions in Biology Possible Directions of Work PML Conclusion
15 11/25/2003 Modeling in the -calculus The -calculus is concise and compact, yet powerful [Milner ’90] –take this as the underlying machine model –not looking for another machine model However, it is far too low-level for direct modeling (ad-hoc structuring)
16 11/25/2003 Informal Graphical Diagrams Protein Enzyme ProteinEnzyme Protein k k -1 k cat sites domains rules
17 11/25/2003 PML: Enzyme Enzyme bind_substrate parameterized declared in outer scope interactions within the complex
18 11/25/2003 PML: Protein Protein bind_substratebind_product
19 11/25/2003 PML: A Simple System
20 11/25/2003 Larger Models Modeled a general description of ER cotranslational-translocation –unclearly or incompletely specified aspects became apparent e.g., can the signal sequence and translocon bind without SRP? Yes [Herskovits and Bibi ’00] Extended to model targeting ER membrane with minor modifications
21 11/25/2003 PML: Summary Domains –set of mutually dependent binding sites –defines at the lowest-level the reactions a biological entity can undergo Groups –static structure for controlling namespace –may represent a large biological entity large complex, a system, etc. [Compartments] –special groups that define boundaries Semantics defined via a translation to the - calculus
22 11/25/2003 PML: Summary Benefits –easier to write and understand because of a more direct biological metaphor –block structure for controlling namespace and modularity Future Work –naming? –proximity of molecules –integrating quantitative information (reaction rates, etc.) –type-checking PML specifications –exceptional / higher-level specifications –graphical and simulation tools
23 11/25/2003 Conclusion Systems biology needs a mathematical foundation –languages for describing concurrent computation seem like a step in the right direction all very preliminaryStatus: all very preliminary –biologically-motivated process calculi BioSPI, BioAmbients, Brane Calculus, … –high-level languages PML –analyses and tools (emerging) –creating models for results in biology (emerging)
24 11/25/2003 Conclusion Abundance of new challenges for PL –language design: biologically-motivated operators –analysis and simulation: dealing with the scale –… How much biology does one need to learn to begin?
Bonus Slides
Compartments
28 11/25/2003 Compartments Critical part of biological pathways –prevents interactions that would otherwise occur Description of the behavior of a molecule should not depend on the compartment Regev et al. use “private” channels in the - calculus for both complexing and compartmentalization
29 11/25/2003 PML: Simple Compartments Example MolA MolB bind_a
30 11/25/2003 PML: Simple Compartments Example MolA MolB ERCytosol CytERBridge
31 11/25/2003 PML: Simple Compartments Example MolB ERCytosol CytERBridge MolA
Semantics of PML
33 11/25/2003 Semantics of PML Defined in terms of the -calculus via two translations –from PML to CorePML “flattens” compartments, removes bridges
34 11/25/2003 Semantics of PML –from CorePML to the -calculus
Syntax of PML
36 11/25/2003 Syntax of PML
37 11/25/2003 Syntax of PML
Example: Cotranslational Translocation
39 11/25/2003 Example: Cotranslational Translocation Ribosome translates mRNA exposing a signal sequence Signal sequence attracts SRP stopping translation SRP receptor (on ER membrane) attracts SRP Signal sequence interacts with translocon, SRP disassociates resuming translation Signal peptidase cleaves the signal sequence in the ER lumen, Hsc70 chaperones aid in protein folding
40 11/25/2003 Example: Cotranslational Translocation
41 11/25/2003 Example: Cotranslational Translocation
42 11/25/2003 Example: Cotranslational Translocation
43 11/25/2003 Example: Cotranslational Translocation
44 11/25/2003 Example: Cotranslational Translocation
45 11/25/2003 Example: Cotranslational Translocation
46 11/25/2003 Example: Cotranslational Translocation