Developing Standards: Case Studies Herbert M Sauro blog.analogmachine.org Dept. of Bioengineering University of Washington, Seattle, WA 1
Importance of Standards Imagine a world where: Each company made its own incompatible nut, bold and screw? Every town had its own way to measure time. Every internet provider used different protocols for the ‘TCP/IP’ stack, , web etc. and so on Standards are vital for the normal functioning of society 2
At least two ways to start a standard: 1. Top-down: institutionalized stick and carrot 2. Grass Roots 3
Two Examples SBML: Systems Biology Markup Language SBOL: Synthetic Biology Open Language 4
Simulation of Computational Models Simulation 5
Why? Study Perturbations Change the activity of a Protein, e.g. P53 by adding an inhibitor What effect does this have on Cell death and/or proliferation? Apoptosis There may be multiple paths or multiple effects 6
How it started: SCAMP and Gepasi: 80/90s SCAMP X 7
Exchange of Computational Models In 1999/2000 a project was started at Caltech with initial funding from Japan to devise an interchange language: SBML: Systems Biology Markup Language 8
SBML SBML: Systems biology Markup Language Used to represent homogenous multi-compartmental Biochemical Systems 9
SBML in a Nutshell “Systems Biology Markup Language” A machine-readable format for representing computational models in systems biology Domain: systems of biochemical reactions Specified using XML Components in SBML reflect the natural conceptual constructs of the domain Now over 200 tools use SBML 10
SBML in a Nutshell “Systems Biology Markup Language” 11 Simple Compartments (well stirred reactor) Internal/External Species Reaction Schemes Global Parameters Arbitrary Rate Laws DAEs (ODE + Algebraic functions, Constraints) Physical Units/Model Notes Annotation – extension capability Events
SBML – Systems Biology Markup Language 12
Model Exchange Standards: SBML, CellML SBML is primarily a way to describe the biology of cellular networks from which the mathematical models can be automatically derived. CellML is a math based description from which the underling biological can be inferred. 13
There many modeling software tools that use SBML 14
SBML Ecosystem SBML Databases Unambiguous Model Exchange Semantic Annotations Simulator Comparison and Compliance Journals Diagrams SEDML: Simulation Experiment Description Language SBGN : Systems Biology Graphical Notation 15
Model repositories BioModels.net As of Sep 2011: 366 curated models 398 uncurated models. Nicolas Le Novere 16
MIRIAM: Minimum Information Requested in the Annotation of biochemical Models MIRIAM is not a file format but a minimum specification on how a model should be made available to the community: Reference correspondence – encoding a model in a recognized public standardized machine-readable format. Attribution annotation - A model has to provide the citation of the reference description, lists its creators, and be attached to some terms of distribution. External resource annotation - each component of a model must be annotated to allow its unambiguous identification. 17
Semantic Annotations 1.SBO: 1.SBO: Systems Biology Ontology (Quantitative terms) 2.MIASE: 2.MIASE: The Minimum Information About a Simulation Experiment 3.TEDDY: 3.TEDDY: The Terminology for the Description of Dynamics 4.KiSAO: 4.KiSAO: Simulation Algorithm Ontology 5.Missing 5.Missing: An audit trail of a modeling process. 18
SBO: Systems Biology Ontology 1.[Term] id: SBO: name: quantitative parameter def: "A number representing a quantity that defines certain characteristics of systems or functions. A parameter may be part of a calculation, but its value is not determined by the form of the equation itself, and may be arbitrarily assigned." [] relationship: part of SBO: ! Systems Biology Ontology 2.[Term] id: SBO: name: mass action kinetics def: "The Law of Mass Action, first expressed by Waage and Guldberg in 1864 (Waage, P., Guldberg, C. M. Forhandlinger: Videnskabs-Selskabet i Christiana 1864, 35) states that…..." [] is a: SBO: ! rate law. Terms can be queried programmatically via a web service 19
Systems Biology Ontology in SBML continuous framework substrate product enzyme Michaelis constant catalytic rate constant Briggs-Haldane equation European Bioinformatics Institute 20
Application: Simulator Compliance SBML Compliance 21
The Results 22
Other Proposed Standards Standardizing the diagrammatic notation 23
What we all learned 24
Fact: Developing a standard has both technical as well sociological challenges. The sociological challenges may be greater, :( 25
Rule #1: There must be a problem (i.e an actual need) that a particular community wants to solve. Clear scope Covers what is needed Doesn’t force you to deal with things that are not needed 26
Rule #2: Building a community from day one is of the utmost importance. Build Trust Build Consensus Build Enthusiasm Build Ownership 27
Rule #3: For a standard to succeed, the central players must provide tools and documentation to help the community use the standard. Easy to implement Low ‘buy in’ cost 28
Rule #4: The process is long and drawn out, far beyond the normal patience of review panels and funding agencies. 29
Summary Initial cost for the SBML development: Initial version was funded by JST (roughly 250K direct per year for three years). Could probably get by with 150K direct. This funds a core team which is involved in: 1. Documentation 2. Organizing two workshops per year 3. Developing the initial source libraries 4. Develop a governance model 5. Follow discussions on mailing lists/workshops to address the needs of the community 6. Maintain civility during discussions ! 30
Centralized development of supporting software libraries: 1)Prevented the standard from diverging 2) As extensions or modifications were agreed to by the community it was relatively easy for platform developers to incorporate the changes into their software. 3) Software developed in C/C++ to make the library cross-language (Java came later). 31
Current work of my group: Model Reproducibility SBML SEDML Simulation Tool Biology Data SEDML: What you did with the model 32
Synthetic Biology 33
Synthetic biology “The design and construction of new biological entities such as enzymes, genetic circuits, cells, and organs or the redesign of existing biological systems.” Drew Endy (Stanford) 34
The Immediate Need Take any current publication on a synthetic circuit and try to reproduce it, let me know how you get on. 35
Specification DesignBuild Testing/ Analysis GFP (RFU) time The long term vision: Design, Build, Test 36
Synthetic Biology Open Language (SBOL) – SBOL Semantic semantic Sequence Annotation 1-80 Terminator BioBrick Scar Terminator B0010 B0012 DNA Comp- onent B0015 Synthetic Biologist A Synthetic Biologist B Fabricate Engineer SBOL visual DNA Components New device describe and send 37
Some History The synthetic biology standardization effort was started with a grant from Microsoft in 2008 (100K). The first meeting was held in Seattle. The first draft proposal was called PoBoL but has since been renamed to SBOL – Systems Biology Open Language Since then we have (somehow) managed to organize two meetings a year since 2008, next one in Jan 2012 in Seattle. 38
Overall Aim of the Standardization Effort To support the synthetic biology workflow: 1.Laboratory parts management 2.Simulation/Analysis 3.Design 4.Codon optimization 5.Assembly 6.Repositories - preferably distributed 39
Overall Aim of the Standardization Effort Specifically: To allow researches to electronically exchange designs with round-tripping. To send designs to bio-fabrication centers for assembly. To allow storage of designs in repositories and for publication purposes. 40
Synthetic Biology Synthetic Biology is Engineering, i.e it is not biology* DesignBuildTest * Beware of sending synthetic biology grant proposals to a biology panel 41
Synthetic Biology Synthetic Biology is Engineering, i.e it is not biology* DesignBuildTest Debugging Verification * Beware of sending synthetic biology grant proposals to a biology panel 42
Synthetic Biology Synthetic Biology is Engineering, i.e it is not biology* DesignBuildTest Debugging Verification * Beware of sending synthetic biology grant proposals to a biology panel 43
A Real Network (E. coli) Increased Repression Simulation Increased Repression Entus et al, Systems and Synthetic Biology, Host Context Experimental Data Design/Construction 44
Synthetic Networks Concentration Detector Generic Design: If we control the level of feed-forward Inhibition we can tune the circuit: 45
Synthetic Networks Input: IPTG Output: GFP Concentration Detector Generic Design: 46
CAD Software- Engineering Cycle Simulation Design Fabrication Testing 47
Computational tools and information resources support each step TinkerCell CAD ApE Sequence Editor Laboratory Information Specification DesignBuild Analysis Clotho BIOFAB GDice iBioSim Public Data GenoCAD 48
Registry of Standard Biological Parts (BioBricks) Endy D, Nature 438: Provides free access to an open commons of basic biological functions that can be used to program synthetic biological systems Anybody may contribute, draw upon, or improve the parts maintained within the Registry. 49
Sequence Annotation type Sequence Feature B0015 type annotatio n 1-80 featur e Terminator BioBrick Scar featur e BioBrick Scar featur e Terminator annotatio n type subClassO f B0010 B0012 SBOL is extensible, allows us to form community subgroups Experimental Measurements Computational Models Sample Cell SS002 pUW4510 MG1655 type cell dna UW002 strain type DNA Plasmid subClassOf Core SBOL Physical and Host Context Assembly Methods Visualization 50
TinkerCell: Project to explore the potential of computer aided design in synthetic biology First prototype called Athena developed by Bergmann and Chandran 51
Layered Architecture: Based on C++/Qt Octave, 52
Each component in the TinkerCell diagram is associated with one or more tables 53
A TinkerCell model can be composed of sub-models 54
A TinkerCell model can be composed of sub-models ? ? ? ? ? ? 55
Availability (Windows, Mac and Linux, released under BSD) Contact author for details 56
Challenges in building SBOL Gaining consensus in a growing community – Identifying and engaging stakeholders Fast pace of in the field – Terminology evolution “BioBricks” “Parts” “DNA components” – Stability of use cases “Standard” and “Research needs” seem contradictory – Software for synthetic biology is new Scarcity of data sources – Quality “knowledge” about elements – Heterogeneity of existing annotations Funding 57
Who is the we? Boston University Douglas Densmore University of Utah Barry Moore Nicholas Roehner Chris J. Myers BIOFAB Cesar Rodriguez Akshay Maheshwari (now UCSD) Drew Endy (Stanford) Imperial College of London Guy-Bart Stan Virginia Bioinformatics Institute Laura Adam Matthew Lux Mandy Wilson Jean Peccoud University of Washington Deepak Chandran John Gennari Michal Galdzicki Herbert Sauro University of California, Berkeley J. Christopher Anderson University of Toronto Raik Gruenberg Joint BioEnergy Institute Timothy Ham Recent Commercial Interest BBN, DNA 2.0, Agilent Life Technologies, AutoDesk iBioSim Newcastle University (UK) Aniel 58
Acknowledgements: The People and the Support Hamid Bolouri Andrew Finney Mike Hucka Herbert Sauro Funding in chronological order(2000 -> 2011): Frank Bergmann Deepak Chandran Vijay Chickarmane Michal Galdzicki Lucian Smith …… 59
Textbook Enzyme Kinetics for Systems Biology Available as e-book or paperback on & 318 pages, 94 illustrations and 75 exercises E-book - $9.95 Paperback - $39.95 Author: H M Sauro 60