PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Slides:



Advertisements
Similar presentations
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
Advertisements

Click to see attributes.
New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
PiMS, xtalPiMS and beyond: proteins, crystals and data Chris Morris STFC Daresbury Laboratory… …and the PIMS development team CCP4 Study Weekend, Nottingham,
PiMS overview: version 0.3 & beyond Robert Esnouf, PiMS Project Sponsor, Oxford.
Update on PDB Data Deposition Specifications
The MEMOPS Programming Framework Wayne Boucher, Cambridge
ARCHER Overview October e-Research Challenges Acquiring data from instruments Storing and managing large quantities of data Processing large quantities.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Practical Object-Oriented Design with UML 2e Slide 1/1 ©The McGraw-Hill Companies, 2004 PRACTICAL OBJECT-ORIENTED DESIGN WITH UML 2e Chapter 5: Restaurant.
Requirements Specification
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
M1G Introduction to Database Development 1. Databases and Database Design.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 Tools of Software Development l 2 types of tools used by software engineers:
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Application of PDM Technologies for Enterprise Integration 1 SS 14/15 By - Vathsala Arabaghatta Shivarudrappa.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
PIMS: The Problems of Project Management Robert Esnouf, Scientific Sponsor for PIMS OPPF/STRUBI, University of Oxford strubi.ox.ac.uk.
UML Tools ● UML is a language, not a tool ● UML tools make use of UML possible ● Choice of tools, for individual or group use, has a large affect on acceptance.
Computers & Employment By Andrew Attard and Stephen Calleja.
Summary What is CCPN? What approach are we taking and why? What are (some of) the technical details? Software team –Cambridge (Rasmus Fogh, Tim Stevens)
Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *
ITEC224 Database Programming
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
® IBM Software Group © 2007 IBM Corporation J2EE Web Component Introduction
CCP4(i) Database Development Wanjuan (Wendy) Yang CCP4 Annual developer’s meeting March 28, 2006 York.
1 st -4 th December st BioXHIT Annual Meeting WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution Peter.
PiMS at the OPPF Jon Diprose EMBO Course EBI, 23/09/2008.
Selected Topics in Software Engineering - Distributed Software Development.
COMU114: Introduction to Database Development 1. Databases and Database Design.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
CASE Tools Union Palenshus. In the early days… ► Software engineering tools consisted solely of translators, compilers, assemblers, linkers, loaders,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data Integration and Management A PDB Perspective.
ModelPedia Model Driven Engineering Graphical User Interfaces for Web 2.0 Sites Centro de Informática – CIn/UFPe ORCAS Group Eclipse GMF Fábio M. Pereira.
Project Database Handler The Project Database Handler dbCCP4i is a brokering application that mediates interactions between the project database and an.
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
E-HTPX: A User Perspective Robert Esnouf, University of Oxford.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Rainbow: XML and Relational Database Design, Implementation, Test, and Evaluation Project Members: Tien Vu, Mirek Cymer, John Lee Advisor:
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
17 th October 2005CCP4 Database Meeting York University Database Requirements for CCP4 Projects Monday 17 th October 2005 Abstract Gather information on.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Project Database Handler The Project Database Handler is a brokering application, which will mediate interactions between the project database and other.
Towards a Structural Biology Work Bench Chris Morris, STFC.
Software automation – What STAB sees as key aims? 1.Brief review of activities and recommendations (so far) 2.Reality checks 3. Things to do…
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
Managing crystallization experiments within PIMS.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
Peter J. Briggs, Alun Ashton, Charles Ballard, Martyn Winn and Pryank Patel CCLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK The CCP4 project.
ArrayExpress Ugis Sarkans EMBL - EBI
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
Computational Aspects of the Protein Target Selection, Protein Production Management and Structure Analysis Pipeline.
BIOXHIT Proposals…. 21/11/2002. Martyn’s Original Summary “1. Crystallisation. Low-cost LIMS ("Mole") being developed. Biologists in SBL.
BIOXHIT Working Group 1 Co-ordinator / Developer
Grid Portal Services IeSE (the Integrated e-Science Environment)
Database Requirements for CCP4 17th October 2005
PiMS, xtalPiMS and beyond: proteins, crystals and data Chris Morris STFC Daresbury Laboratory… …and the PIMS development team CCP4 Study Weekend, Nottingham,
Project tracking system for the structure solution software pipeline
eHTPX crystallization, shipping and future
Developing PiMS 1.0 Bill Lin.
CCLRC Daresbury Laboratory
Presentation transcript:

PIMS data management and harvesting

General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Information Management System ■Information Management System (IMS) is a joint database and information management system ■A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data ■Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge

Scientific goals ■Recording laboratory information ■A lot of data keeping ■10,000s of experiments ■1,000,000s of samples ■Data interchange and interoperation ■Collaboration in protein production ■Share data between stages and sites ■Data transfer to beamline or NMR ops ■Data mining and reporting ■Analysis ■Negative results can be mined to improve methods ■Scientific publications ■Data deposition

PIMS ■Protein Information Management System ■Started in January 2005 ■5 years UK project, funded by the Biotechnology and Biological Sciences Research Council (BBSRC) ■Based on the Protein Production Data Model paper ■Proteins Feb 1;58(2): “Design of a data model for developing laboratory information management and analysis systems for protein production.”

Scope of PIMS Target selection Target optimisation CloningExpression Purification & Concentration Crystallisation Microcrystals Data collection Phasing Model building Refinement Bioinformatics Molecular Biology Crystallography import export

Stakeholders ■BBSRC SPoRT funding ■Scottish Structural Proteomics Facility (SSPF) ■Universities of Dundee, St. Andrews, Glasgow and Warwick. ■Membrane Protein Structure Initiative (MPSI) ■Universities of Glasgow, Leeds, Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury. ■Protein Information Management System (PIMS) ■CCP4, Diamond ■Oxford Protein Production Facility ■IBBMC, University Paris Sud ■European Bioinformatics Institute ■York Structural Biology Laboratory ■Daresbury Laboratory ■Other UK protein scientists ■Other protein scientists worldwide SSPF BBSRC funding MPSI PIMS

Collaborations ■Seamless data transfer and a consistent UI... ■... from target to structure deposition ■... so far as possible ■Bioinformatics: SSPF pipeline, EBI workflow ■Crystallization: NKI, EMBL Hamburg & Grenoble (BIOXHIT) ■Data transfer: e-HTPX ■Data collection: DNA, X-track ■Structure solution: CCP4, CCPN ■Instruments: Kendro, Csols

General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Design ■The data model ■focuses on what data should be stored ■is used to design the entities (classes or tables) that we are dealing with, their various attributes, and their relationships ■The goal of the data model is to make sure that the all data objects required are completely and accurately represented

Reliability ■Loss of data is inexcusable ■Must be able to correct wrong data ■Must keep audit trails ■Must allow future changes ■All made feasible by ■Data model ■Database ■Software engineering standards

Ancestry ■HalX: an open-source LIMS (Laboratory Information Management System) for small- to large- scale laboratories. ■Acta Crystallogr D Biol Crystallogr Jun;61(Pt 6): ■Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A. ■OPPF based on Nautilus ■MOLE: a data management application based on a protein production data model. ■Proteins Feb 1;58(2): ■Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW.

PIMS ■The aim is to provide a Laboratory Information Management System (LIMS) ■for Laboratories that produce proteins from target genes ■can be incorporated into commercial software in the area of biotech and protein production ■Improve the quality of the experimental data deposited into PDB ■by providing a software for lab scientists to harvest their daily experimental data from protein production to structure ■My roles ■Data Model ■Database / Persistence layer / Java API ■Java Applet development

General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Why is Data Modelling Important? ■A Data Model is a plan for building a database ■detailed enough to be used to create the physical structure ■simple enough to communicate to the end user the data structure ■The Unified Modelling Language (UML)

Data Model ■Related to protein production & crystallisation ■Suitable for large & small facilities ■Required to reproduce the samples & experiments involved ■Used for tracking samples, experiments & results ■Developed to help software developers to collect, store and exchange information through the provision of a common platform

Area covered ■Protein production work is generally the investigation of a particular protein, the Target ■The work often aims to produce a derivative of the Target, such as a single domain or complexes protein productiontarget crystallisation X-Ray phasing structure NMR tube NMR

The Core Data Model

Change Control Board ■The data model is a work in progress ■The science is developing too ■Local protocols, which are novel and confidential ■Not easy work ■Thanks to… ■Geoff Barton (Dundee) ■Steve Prince (Manchester) ■Anne Poupon (IBBMC) ■Jon Diprose (OPPF) ■Alun Ashton (Diamond) ■Rasmus Fogh (CCPN)

Generation machinery ■Implemented in UML (Object Domain) ■Developed within a framework provided by the CCPN project ■Information stored in the UML Data Model is used to generate automatically ■SQL schema, ■Java Application Program Interfaces (APIs) and ■Documentation Java API Python API Doc SQL schema XML schema UML Data Model framework

Architecture ■The API provides methods to access the underlying DB to store and retrieve data ■This allows applications to manipulate data without a detailed knowledge of the way in which the data is stored ■Various different applications make use of the API ■LIMS ■Any High Throughput applications (non-GUI) ■They are able to exchange data easily API Tools: GUI, standalone applications,… storage Java API Persistence layer DB SQL schema

From data model to application ■Data Model ■Use cases ■Scientific logic into requirements ■Specifications ■security, performance, usability, etc ■Java API ■Test data ■UI Design ■Application

Modular Construction ■ System Administration Setup & Configuration Access Rights Management Project Management Reference Data Instrument Management SchedulingData Capture Inventory Management Sample Management Bioinformatics Mobile Data Collection Reporting Visualisation Data Mining Training & Support Workflow

■Supplier details ■Protocols ■documenting set of editable default protocols ■user interface design with Ed Daniel ■Reagents ■protocol-related reference samples ■chemical hazard information ■e.g. R and S-phrases ■documenting lab chemicals as ‘MolComponents’ ■includes synonyms, formula, CAS-number and mass ■naming system under discussion with NKI ■~400 identified, ~180 based on crystallisation screens Reference data

Instrument management ■Analytical Data: A Tower of Babel ■Integration ■CSols ■produces a widely used Instrument Integration Package ■if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver ■Kendro/Thermo MS NMR IR LC

General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Not a lot right now Whatever you want, eventually as long as it's data management for protein production

Version 0.2 ■October 2005 ■Then incremental delivery ■… for one customer at a time and integrate with trunk ■… and repeat until project complete

Protocol Editor

Applet Protocol Editor ■Choose a step from a list ■Draw Temperature step ■List of the protocol's steps already done and reload them from the bottom of the screen ■Record the protocol in DB ■Display the protocol's list from DB in the explorer and reload anyone of them

Applet Workflow ■Select in tabulation the experiment categories ■Drag and drop the selected experiments ■Build a workflow or load an existing one ■Associate a protocol to an experiment

A collaborative framework ■… to develop a family of LIMSes ■Developers have difficulty in justifying the time required to create the software needed ■The biologist doesn't want to wait ■The result is a rapidly written LIMS that is fragile and cannot scale if the project grows up ■Need a generic LIMS ■helps to solve these problems by giving developers a tool that can scale to meet the needs of a large project ■And which welcome plugins for novel methods

Conclusion ■Each “Click” could be a lot of coding... ■What do molecular biologists really want? ■Expectations are High! ■Users make an indispensable contribution ■Tell us when it's not good enough... ■... we will respond

Acknowledgements ■PIMS developer group ■Chris Morris (CCP4) ■Anne Pajon (EBI) ■Ed Daniel (Daresbury) ■Peter Troshin (MPSI) ■Jo van Niekerk (SSPF) ■Susy Griffiths (YSBL) ■Jon Diprose (OPPF) ■Katherine Pilicheva (OPPF) ■Anne Poupon (IBBMC) ■Eric Oeuillet (IBBMC) ■Sabrina Haquin (IBBMC) ■Alun Ashton (Diamond) ■EBI-MSD ■Kim Henrick ■Wim Vranken ■John Ionides ■CCPN ■Wayne Boucher ■Rasmus Fogh ■Tim Stevens ■Dan