Presentation is loading. Please wait.

Presentation is loading. Please wait.

PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Similar presentations


Presentation on theme: "PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?"— Presentation transcript:

1 PIMS data management and harvesting

2 General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

3 Information Management System ■Information Management System (IMS) is a joint database and information management system ■A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data ■Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge

4 Scientific goals ■Recording laboratory information ■A lot of data keeping ■10,000s of experiments ■1,000,000s of samples ■Data interchange and interoperation ■Collaboration in protein production ■Share data between stages and sites ■Data transfer to beamline or NMR ops ■Data mining and reporting ■Analysis ■Negative results can be mined to improve methods ■Scientific publications ■Data deposition

5 PIMS ■Protein Information Management System ■Started in January 2005 ■5 years UK project, funded by the Biotechnology and Biological Sciences Research Council (BBSRC) ■Based on the Protein Production Data Model paper ■Proteins. 2005 Feb 1;58(2):278-84. “Design of a data model for developing laboratory information management and analysis systems for protein production.”

6 Scope of PIMS Target selection Target optimisation CloningExpression Purification & Concentration Crystallisation Microcrystals Data collection Phasing Model building Refinement Bioinformatics Molecular Biology Crystallography import export

7 Stakeholders ■BBSRC SPoRT funding ■Scottish Structural Proteomics Facility (SSPF) ■Universities of Dundee, St. Andrews, Glasgow and Warwick. ■Membrane Protein Structure Initiative (MPSI) ■Universities of Glasgow, Leeds, Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury. ■Protein Information Management System (PIMS) ■CCP4, Diamond ■Oxford Protein Production Facility ■IBBMC, University Paris Sud ■European Bioinformatics Institute ■York Structural Biology Laboratory ■Daresbury Laboratory ■Other UK protein scientists ■Other protein scientists worldwide SSPF BBSRC funding MPSI PIMS

8 Collaborations ■Seamless data transfer and a consistent UI... ■... from target to structure deposition ■... so far as possible ■Bioinformatics: SSPF pipeline, EBI workflow ■Crystallization: NKI, EMBL Hamburg & Grenoble (BIOXHIT) ■Data transfer: e-HTPX ■Data collection: DNA, X-track ■Structure solution: CCP4, CCPN ■Instruments: Kendro, Csols

9 General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

10 Design ■The data model ■focuses on what data should be stored ■is used to design the entities (classes or tables) that we are dealing with, their various attributes, and their relationships ■The goal of the data model is to make sure that the all data objects required are completely and accurately represented

11 Reliability ■Loss of data is inexcusable ■Must be able to correct wrong data ■Must keep audit trails ■Must allow future changes ■All made feasible by ■Data model ■Database ■Software engineering standards

12 Ancestry ■HalX: an open-source LIMS (Laboratory Information Management System) for small- to large- scale laboratories. ■Acta Crystallogr D Biol Crystallogr. 2005 Jun;61(Pt 6):671-8. ■Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A. ■OPPF based on Nautilus ■MOLE: a data management application based on a protein production data model. ■Proteins. 2005 Feb 1;58(2):285-9. ■Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW.

13 PIMS ■The aim is to provide a Laboratory Information Management System (LIMS) ■for Laboratories that produce proteins from target genes ■can be incorporated into commercial software in the area of biotech and protein production ■Improve the quality of the experimental data deposited into PDB ■by providing a software for lab scientists to harvest their daily experimental data from protein production to structure ■My roles ■Data Model ■Database / Persistence layer / Java API ■Java Applet development

14 General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

15 Why is Data Modelling Important? ■A Data Model is a plan for building a database ■detailed enough to be used to create the physical structure ■simple enough to communicate to the end user the data structure ■The Unified Modelling Language (UML)

16 Data Model ■Related to protein production & crystallisation ■Suitable for large & small facilities ■Required to reproduce the samples & experiments involved ■Used for tracking samples, experiments & results ■Developed to help software developers to collect, store and exchange information through the provision of a common platform

17 Area covered ■Protein production work is generally the investigation of a particular protein, the Target ■The work often aims to produce a derivative of the Target, such as a single domain or complexes protein productiontarget crystallisation X-Ray phasing structure NMR tube NMR

18 The Core Data Model

19 Change Control Board ■The data model is a work in progress ■The science is developing too ■Local protocols, which are novel and confidential ■Not easy work ■Thanks to… ■Geoff Barton (Dundee) ■Steve Prince (Manchester) ■Anne Poupon (IBBMC) ■Jon Diprose (OPPF) ■Alun Ashton (Diamond) ■Rasmus Fogh (CCPN)

20 Generation machinery ■Implemented in UML (Object Domain) ■Developed within a framework provided by the CCPN project ■Information stored in the UML Data Model is used to generate automatically ■SQL schema, ■Java Application Program Interfaces (APIs) and ■Documentation Java API Python API Doc SQL schema XML schema UML Data Model framework www.ccpn.ac.uk

21 Architecture ■The API provides methods to access the underlying DB to store and retrieve data ■This allows applications to manipulate data without a detailed knowledge of the way in which the data is stored ■Various different applications make use of the API ■LIMS ■Any High Throughput applications (non-GUI) ■They are able to exchange data easily API Tools: GUI, standalone applications,… storage Java API Persistence layer DB SQL schema

22 From data model to application ■Data Model ■Use cases ■Scientific logic into requirements ■Specifications ■security, performance, usability, etc ■Java API ■Test data ■UI Design ■Application

23 Modular Construction ■http://www.pims-lims.org/project/use-case-suite.html System Administration Setup & Configuration Access Rights Management Project Management Reference Data Instrument Management SchedulingData Capture Inventory Management Sample Management Bioinformatics Mobile Data Collection Reporting Visualisation Data Mining Training & Support Workflow

24 ■Supplier details ■Protocols ■documenting set of editable default protocols ■user interface design with Ed Daniel ■Reagents ■protocol-related reference samples ■chemical hazard information ■e.g. R and S-phrases ■documenting lab chemicals as ‘MolComponents’ ■includes synonyms, formula, CAS-number and mass ■naming system under discussion with NKI ■~400 identified, ~180 based on crystallisation screens Reference data

25 Instrument management ■Analytical Data: A Tower of Babel ■Integration ■CSols ■produces a widely used Instrument Integration Package ■if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver ■Kendro/Thermo MS NMR IR LC

26 General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

27 Not a lot right now Whatever you want, eventually...... as long as it's data management for protein production

28 Version 0.2 ■October 2005 ■Then incremental delivery ■… for one customer at a time and integrate with trunk ■… and repeat until project complete

29 Protocol Editor

30 Applet Protocol Editor ■Choose a step from a list ■Draw Temperature step ■List of the protocol's steps already done and reload them from the bottom of the screen ■Record the protocol in DB ■Display the protocol's list from DB in the explorer and reload anyone of them

31 Applet Workflow ■Select in tabulation the experiment categories ■Drag and drop the selected experiments ■Build a workflow or load an existing one ■Associate a protocol to an experiment

32 A collaborative framework ■… to develop a family of LIMSes ■Developers have difficulty in justifying the time required to create the software needed ■The biologist doesn't want to wait ■The result is a rapidly written LIMS that is fragile and cannot scale if the project grows up ■Need a generic LIMS ■helps to solve these problems by giving developers a tool that can scale to meet the needs of a large project ■And which welcome plugins for novel methods

33 Conclusion ■Each “Click” could be a lot of coding... ■What do molecular biologists really want? ■Expectations are High! ■Users make an indispensable contribution ■Tell us when it's not good enough... ■... we will respond

34 Acknowledgements ■PIMS developer group ■Chris Morris (CCP4) ■Anne Pajon (EBI) ■Ed Daniel (Daresbury) ■Peter Troshin (MPSI) ■Jo van Niekerk (SSPF) ■Susy Griffiths (YSBL) ■Jon Diprose (OPPF) ■Katherine Pilicheva (OPPF) ■Anne Poupon (IBBMC) ■Eric Oeuillet (IBBMC) ■Sabrina Haquin (IBBMC) ■Alun Ashton (Diamond) ■EBI-MSD ■Kim Henrick ■Wim Vranken ■John Ionides ■CCPN ■Wayne Boucher ■Rasmus Fogh ■Tim Stevens ■Dan


Download ppt "PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?"

Similar presentations


Ads by Google