Building virtual cell using BioUML platform Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd. Novosibirsk, Russia Biosoft.Ru July 6th.

Slides:



Advertisements
Similar presentations
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Advertisements

Extreme Programming Alexander Kanavin Lappeenranta University of Technology.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
BioUML SOFTWARE FRAMEWORK FOR SYSTEMS BIOLOGY Overview  ITC Software All rights reserved.
MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
BioUML integrated platform for building virtual cell and virtual physiological human Fedor Kolpakov Institute of Systems Biology Laboratory of Bioinformatics,
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
The C++ Tracing Tutor: Visualizing Computer Program Behavior for Beginning Programming Courses Rika Yoshii Alastair Milne Computer Science Department California.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
NGS data analyses with BioUML Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd. Novosibirsk, Russia.
BioUML – open source integrated platform for collaborative and reproducible research in systems biology Fedor Kolpakov, Institute of Systems.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Free Mini Course: Applying SysML with MagicDraw
Demetris Kennes. Contents Aims Method(The Model) Genetic Component Cellular Component Evolution Test and results Conclusion Questions?
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Lecture 3: Pathway Generation Tool I: CellDesigner: A modeling tool of biochemical networks Y.Z. Chen Department of Pharmacy National University of Singapore.
History ChartGizmo was created by Max Kuchin and Galinkskiy Dmitriy, two software developers from Sankt- Petersburg, Russia. The first version of ChartGizmo.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
` Tangible Interaction with the R Software Environment Using the Meuse Dataset Rachel Bradford, Landon Rogge, Dr. Brygg Ullmer, Dr. Christopher White `
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
ITEC 3220M Using and Designing Database Systems
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
BioUML Fedor Kolpakov Institute of Systems Biology (spin-off of DevelopmentOnTheEdge.com) Laboratory of Bioinformatics, Design Technological Institute.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Encoding and exchanging graphical representation: architecture and formats Fedor Kolpakov Institute of Systems Biology Novosibirsk, Russia COMBINE-2010,
Chapter 10 Information Systems Analysis and Design
Comprehensive model for formalized description, visualization and simulation of biological systems Fedor A. Kolpakov Biosoft.Ru,
BioUML integrated platform for building virtual cell and virtual physiological human Fedor Kolpakov Institute of Systems Biology Laboratory of Bioinformatics,
Analysis of ribo-seq data for prediction translation efficiency and protein quantity from transcriptomics data Fedor Kolpakov Biosoft.Ru, Ltd. Institute.
The integrated model of apoptosis EO Kutumova, RN Sharipov, IN Lavrik, FA Kolpakov Design Technological Institute of Digital Techniques SB RAS, Institute.
BioUML ( Software framework for systems biology Overview Biosoft.Ru, Novosibirsk, Russia. Laboratory of Bioinformatics, Digital Design.
NGS data analysis CCM Seminar series Michael Liang:
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.

Building virtual cell using BioUML platform Ilya Kiselev (on behalf BioUML team) Institute of Systems Biology, Ltd. Novosibirsk, Russia Biosoft.Ru September.
The Optimization Plug-in for the BioUML Platform E. O. Kutumova 1,2,*, A. S. Ryabova 1,3, N. I. Tolstyh 1, F. A. Kolpakov 1,2 1 Institute of Systems Biology,
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Virtual Cell and CellML The Virtual Cell Group Center for Cell Analysis and Modeling University of Connecticut Health Center Farmington, CT – USA.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
BIological NetwOrk Manager Cytoscape plugin Andrei Zinovyev Institut Curie/INSERM/Ecole de Mines, UMR 900 “Computational Systems Biology of Cancer”
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
GRNmap and GRNsight June 24, Systems Biology Workflow DNA microarray data: wet lab-generated or published Generate gene regulatory network Modeling.
Modular Approach To Modeling Of The Apoptosis Machinery E. O. Kutumova 1,2,*, R. N. Sharipov 1,3,2, F. A. Kolpakov 1,2 1 Institute of Systems Biology,
New possibilities 1. EBI data pack – database modules for main databases supported by EBI: Ensembl, UniProt, ChEBI,Reactome, IntAct, GO, BioModels, SBO.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
The Software Development Process
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
A collaborative tool for sequence annotation. Contact:
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
BioUML – integrated platform for building virtual cell and virtual physiological human Fedor Kolpakov 1,2, Nikita Tolstykh 1,2, Elena Kutumova 1,2, Ilya.
Vertical Integration Across Biological Scales A New Framework for the Systematic Integration of Models in Systems Biology University College London CoMPLEX.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Chapter 1 Introduction(1.1)
Analysis models and design models
Joana Pinto Vieira, Julien Racle, Vassily Hatzimanikatis 
Presentation transcript:

Building virtual cell using BioUML platform Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd. Novosibirsk, Russia Biosoft.Ru July 6th - 11th 2013, St. Petersburg, RUSSIA

BioUML platform BioUML is an open source integrated platform for systems biology that spans the comprehensive range of capabilities including access to databases with experimental data, tools for formalized description, visual modeling and analyses of complex biological systems. Due to scripts (R, JavaScript) and workflow support it provides powerful possibilities for analyses of high- throughput data. Plug-in based architecture (Eclipse run time from IBM is used) allows to add new functionality using plug-ins. BioUML platform consists from 3 parts: BioUML server – provides access to biological databases; BioUML workbench – standalone application. BioUML web edition – web interface based on AJAX technology;

BioUML main features Supports access to main biological databases: –catalolgs: Ensembl, UniProt, ChEBI, GO… –pathways: KEGG, Reactome, EHMN, BioModels, SABIO-RK, TRANSPATH, EndoNet, BMOND… Supports main standards used in systems biology: SBML, SBGN, CellML, BioPAX, OBO, PSI-MI… database search: –full text search using Lucene engine –graph search graph layout engine visual modeling: –simulation engine supports (ODE, DAE, hybrid, stochastic, 1D PDE); –composite models; –agent based modeling, rule based modelling; –parameters fitting; genome browser (supports DAS protocol, tracks import/export); data analyses and workflows – specialized plug-ins for microarray analysis, integration with R/Bioconductor, JavaScript support, interactive script console.

Story 1: genome-scale model for prediction of synthesis rates of mRNAs and proteins Initial data: Schwanhäusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature, 2011, 473(7347): mouse fibroblasts, parallel metabolic pulse labelling - simultaneously measured absolute mRNA and protein abundance and turnover for genes - first genome-scale quantitative model for prediction of synthesis rates of mRNAs and proteins Bukharov Aleksandr, Kiselev Ilya

Experiment design

Schwanhäusser B., et al., Fig. 6:bComparison of synthesis rates of mRNA and proteins assuming the measured levels reflect averages over one cell cycle or steady-state values. For the synthesis rates of mRNA (light gray), the deviation between the two approaches is small, because mRNA half lives are mostly smaller than the cell cycle time. For protein synthesis (dark gray), the differences are substantial; they can differ for more than one order of magnitude.

Schwanhäusser B., et al., Fig. 6:bComparison of synthesis rates of mRNA and proteins assuming the measured levels reflect averages over one cell cycle or steady-state values. For the synthesis rates of mRNA (light gray), the deviation between the two approaches is small, because mRNA half lives are mostly smaller than the cell cycle time. For protein synthesis (dark gray), the differences are substantial; they can differ for more than one order of magnitude. P – protein, exp – population mean, ss – steady state;

“They do not take into account that gene expression in mammalian cells is non-continuous. In addition, the non-uniform age distribution of cells in culture as described in 19, 23 is neglected, since this effect is expected to be small compared to the deviation obtained by neglecting the cell cycle.“ Schwanhäusser B., et al., 20011, supplementary materials

Agent based model each cell is an agent 4247 blocks for protein synthesis

Numerical experiment. The initial size of population is 200 cells which divide within 108 hours. Average quantity of protein molecules were calculated. This experiment was repeated for 4247 proteins.

Correlation of experiment and numerical modeling is equal to R=0.99 Absolute values also were coordinated (so for 81,6% of proteins absolute values differ by less than 7% Main deviations from experimental values are observed for proteins with extremely low copy numbers, where experimental error can be significant.

Used features of BioUML platform tables support (import, calculated columns); JavaScript API for model generation (from tabular data) visual modeling – diagram visualization agent based modeling ODE solver

- tabIe with initial data (Schwannhauser B.et al., 2011); - calculated columns (k degradation)

Java script for diagram generation

Agent-based model

Story 2: the modular model of apoptosis Kutumova E.O. et. al., Advances in Experimental Medicine and Biology, 2012, 736(2): modules 279 species 372 reactions 459 parameters

Modules: clear specification of interfaces input/output contacts

Mitochondron module (BMOND ID: Int_Mitoch_module) Bagci EZ, et al, Biophysical J 2006 Albeck JG, et al, PLoS Biol 2008 Additions: Activation of CREB and deactivation of BAD by Akt- PP and ERK-PP Upregulation of Bcl-2 by CREB Bcl-2 suppression by p53

EGF module ( BMOND ID: Int_EGF_module) Schoeberl B, et al: Nature Biotechnology 2002 Borisov N, et al: Molecular Systems Biology 2009 Additions: Reactions of protein syntheses and degradations

top-down Modular model allows us to combine both up-down and bottom-up approaches bottom-up

Extreme programming (XP) methodology Planning –User stories –Make frequent small releases –The project is divided into iterations Coding –Code must be written to agreed standards –Unit test first –Integrate often –Collective ownership Testing –Unit/acceptance tests

XP adaptation to modeling Planning –User stories: requirements for the model based on the experimental data –Make frequent small releases –Iterations: after each iteration a new set of experimental data should be reproduced Coding –Standards: SBGN, SBML, modular extension –Unit test first –Integrate often: model is saved into public database –Collective ownership: collaborative editing and chat Testing –Unit/acceptance tests: BioUML provides the facilities for testing the models.

Execution phase of apoptosis

User stories: requirements for the model No external signals  concentration of the cleaved PARP is zero. CD95L signaling  dynamics of pro-8 and casp-8 according to Bentele et. al. TNF-a signaling  dynamics of pro-8 and casp-8 according to Janes et. al. Iteration 1 Iteration 2 Iteration 3

Types of the acceptance tests Steady-state Time course Control of the variable values

Iterations and acceptance tests Iteration 1 Iteration 2 Iteration 3

Modeling of the execution phase of apoptosis Bentele et. al., 2004

Iteration 1: execution phase Steady-state acceptance test

Iteration 2: CD95-signaling

Experiment time courses for pro-8 and casp-8 according to the data by Bentele, et. al., 2004

Used features of BioUML platform visual modeling (SBGN notation) modular modelling parameters fitting acceptance tests ODE solver

Parameters fitting – user interface

Parameters fitting - main features Experimental data – time courses or steady states expressed as exact or relative values of substance concentrations Different optimization methods for analysis Multi-experimentsfitting Constraint optimization Local/global parameters Parameters optimization using java script

Bentele M, 2004 Neumann L, 2010 CD95L module and results of fitting its dynamics to experimental data

Story 3 Predicting transcription level from ChIP-seq data Ivan Yevshin, Dmitry Levanov, Yuriy Kondrakhin

Initial row data, collected from literature, GEO, SRA and ENCODE databases were systematically collected and uniformly processed using specially developed workflow (pipeline) for BioUML platform: - sequenced reads were aligned to reference genome using Bowtie; - peaks were identified using MACS and SISSR algorithms - further refinement of obtained peaks - position weight matrices (PWM) were constructed by different methods (ChIPMunk, our own methods) - ROC curves were calculated to estimate and compare built PWM - site models (PWMs + thresholds) were constucted for recognition TF binding sites. TFClass database is used as a core for information about transcription factors, their classification and cross-linking with Ensembl. BioUML platform provides web interface for access to GTRD database: search information, browsing, different data views. Built-in genome browser provides powerful visualisation of ChIP-seq data. GTRD - Gene Transcription Regulation Database

Prediction of gene expression level by ChIP-seq data ChIP-seq peaks (MACS) for histones and transcriptio factor binding sites were extracted from GTRD database for 2 cell lines: GM12878 and K562. Machine learning - Random Forest algorithm. R – 0.77

Used features of BioUML platform integration with Galaxy to use 3-rd party command line tools (Bowtie, SISSR) integration with R for statistic analyses and plot visulisations build-in analyses methods NGS data control workflows for automated NGS data processing genome browser for ChIP-seq peaks visualisation web interface for GTRD –search possibilities –tree views –tables

Galaxy – analyses methods

Galaxy - workflow

An example: workflow for analyses of ChIP-Seq data

Genome browser Two BAM tracks are compared with each other (Example view on Human NCBI37 Chr.1) Profile is visible showing the coverage

Genome browser Upon zooming individual reads become visible. All information associated with selected read is displayed in the Info box

Genome browser In detailed scale phred qualities graph is displayed along with changed nucleotides between read and reference sequence

Story 4 Glicomics – rule based modeling Nikita Mandrik, Tagir Valeev Rule-based modeling involves the representation of molecules as structured objects and molecular interactions as rules for transforming the attributes of these objects. Data and approach: Bennun SV, Yarema KJ, Betenbaugh MJ, Krambeck FJ. Integration of the transcriptome and glycome for identification of glycan cell signatures. PLoS Comput Biol. 2013;9(1):e doi: /journal.pcbi BioNetGen – language and approach for rule based modeling

Bennun S.V. et al., 2013

Used features of BioUML platform visual modeling BioNetGen support (passed 8 available tests) synchrinisation between model code and diagram parameters fitting

Collaborative research goals Allow scientists to effectively work together even if they are separated far away from each other Help experimentalists and theorists to work together Engage third-party experts

Collaborative research 1.Share –Project data can be shared between several people –Changes are immediately visible by other participants –No need to ask administrator or somebody else to share the project 2.Communicate –Talk to each other in messenger –Conference chat 3.Collaborate –Edit together –Edit and chat –Review the changes of other participants 4.History –Monitor changes performed on the document –Rollback to older versions of the document –Compare different versions of the document –Project journal 5.Organize work (planned) –Create tasks (“works”) for other participants –Call for collaborators to solve particular problem –Review the results 6.Privacy –Your activity and data is invisible for people not belonging to your projects

BioStore BioStore is aimed to be the single place where researchers can register and obtain various products and services (either free or commercial). Having single BioStore account you may have an access to many BioUML servers, projects, downloads, etc. Servers Downloads Services Software

Biostore BioUML platform Developers - plug-ins: methods, visualization, etc. - databases Users - subscriptions - collaborative & reproducible research Experts -services for data analysis - on-line consultations BioUML ecosystem provide tools and databases use provide services

Challenges: – methodology - how we can build, modify, verify and apply complex mathematical models for practical purposes; – organization - how we can split main tasks into smaller, manageable parts and organize collaboration between researchers; – software - we need a software platform that will provide necessary infrastructure for collaborative work of many researchers and will support needed methodology. Building virtual cell

Answers: – methodology – modular modeling; – application of some practices from XP (eXtreme Programming): –user stories, –iterative development, –acceptance tests; – organization – Biostore - portal for collaborative research and crowd sourcing; – software - BioUML platform - integrated open source platform for collaborative research in systems biology. Building virtual cell

BioUML team Tagir ValeevIvan Yevshin Yuriy Kondrakhin Ilya KiselevDmitriy Levanov Elena Kutumova Alexandr BukharovRuslan Sharipov Nikita Mandrik