Database Requirements for CCP4 17th October 2005

Slides:



Advertisements
Similar presentations
Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Advertisements

Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
1 Introduction Introduction to database systems Database Management Systems (DBMS) Type of Databases Database Design Database Design Considerations.
Data Analysis I19 Upgrade Workshop 11 Feb Overview Short history of automated processing for Diamond MX beamlines Effects of adding Pilatus detectors.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
A GENERIC PROCESS FOR REQUIREMENTS ENGINEERING Chapter 2 1 These slides are prepared by Enas Naffar to be used in Software requirements course - Philadelphia.
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
1 st -4 th December st BioXHIT Annual Meeting WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution Peter.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Event Data History David Adams BNL Atlas Software Week December 2001.
Now, please open your book to page 60, and let’s talk about chapter 9: How Data is Stored.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data Integration and Management A PDB Perspective.
Digital Logic Design Lecture # 19 University of Tehran.
In context…. xia2: what is it? Automated expert data reduction – images in, reflections suitable for phasing out. Handles: –MAD data –Multiple passes.
Project Database Handler The Project Database Handler dbCCP4i is a brokering application that mediates interactions between the project database and an.
E-HTPX: A User Perspective Robert Esnouf, University of Oxford.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Foundation year Lec.5: Lec.5: Database Management System Lec.5: Lec.5: Database Management System Lecturer: Fatma El-Zahraa Mohamed Year: 2015/2016.
17 th October 2005CCP4 Database Meeting York University Database Requirements for CCP4 Projects Monday 17 th October 2005 Abstract Gather information on.
Project Database Handler The Project Database Handler is a brokering application, which will mediate interactions between the project database and other.
17 th October 2005CCP4 Database Meeting (York) CCP4i Database Overview Peter Briggs.
Learning Objectives Understand the concepts of Information systems.
AUTOMATION OF MACROMOLECULAR DATA COLLECTION - INTEGRATION OF DATA COLLECTION AND DATA PROCESSING Harold R. Powell 1, Graeme Winter 1, Andrew G.W. Leslie.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Victoria Ibarra Mat:  Generally, Computer hardware is divided into four main functional areas. These are:  Input devices Input devices  Output.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
Computational Aspects of the Protein Target Selection, Protein Production Management and Structure Analysis Pipeline.
Lesson 5 New Pages and Links. Objectives In this tutorial we will: ● Provide an overview of the "networked" structure of a wiki ● Demonstrate how to create.
Page 1 Phase Determination by Creative BiostructureCreative Biostructure.
Advanced Higher Computing Science
BIOXHIT Working Group 1 Co-ordinator / Developer
Stony Brook Integrative Structural Biology Organization
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
ISPyB December 4th, 2013 From sample to data analysis: how to track every step of an experiment in the ISPyB database. Marjolaine Bodin, ESRF/EXP/Structural.
Algorithms and Problem Solving
Arab Open University 2nd Semester, M301 Unit 5
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Developing Information Systems
Complete automation in CCP4 What do we need and how to achieve it?
CHAPTER 3 Architectures for Distributed Systems
Graeme Winter STFC Computational Science & Engineering
Database Database is a large collection of related data that can be stored, generally describes activities of an organization. An organised collection.
Experimental Definition in SynchWeb for XPDF
Database Processing: David M. Kroenke’s Chapter One: Introduction
Data Management: Documentation & Metadata
Database Processing: David M. Kroenke’s Chapter One: Introduction
Project tracking system for the structure solution software pipeline
Automation from a user perspective
Stephen Hess Dr. Jeffery Heer Discussion for 4/21 CS 376.
PLC / SCADA / HMI Controllers: Name : Muhammad Zunair Comsats University Date: 28-October-2018.
Coding Concepts (Basics)
Lecture 1 File Systems and Databases.
Algorithms and Problem Solving
Introduction to Estimation
Regression testing Tor Stållhane.
Spreadsheets, Modelling & Databases
CCLRC Daresbury Laboratory
Data Illustrated by Tag Clouds
DATABASES WHAT IS A DATABASE?
Background: Currently CCP4i puts each structure determination into a separate project directory, and automatically keeps a “Project History Database” recording.
Rational Publishing Engine RQM Multi Level Report Tutorial
The site to download BALBES:
Chapter 13 Building Systems.
overview today’s ideas relational databases
Presentation transcript:

Database Requirements for CCP4 17th October 2005 Databases in e-HTPX Database Requirements for CCP4 17th October 2005

What is e-HTPX? “An e-Science resource for high throughput protein crystallography” Start at crystallisation, end at deposition Includes a lot of “project management” since operations are being performed at a number of remote sites Should be able to talk to PIMS, beamline, CCP4 &c.

Relevant Areas in e-HTPX Data collection & processing – XIA-DPA Structure solution via MR – BMP Structure solution via (M/S)AD – XIA-HA Deposition – Autodep

Components with DB Needs Data Collection (e.g. DNA/ISPyB) Automated data processing Data exchange, internal data management BMP External (EBI) databases, internal job management Experimental Phasing Finding input, storing results

Particular Examples 1 During crystal characterisation we decide that the crystal is probably tetragonal Collect 75 degrees of data & process when you get home Suddenly discover that the crystal is orthorhombic, and anyway the solvent content would have been 11% Kick yourself, apply for more beam time

Particular Examples 2 During crystal characterisation we decide that the crystal is probably tetragonal However querying the database says that that would result in a solvent content of 11% - jolly unlikely Collect & process a little data Decide point group & store away some where – then compute strategy & collect new set

Particular Challenges People: they never fill things in! Software: needs to be able to find things out all by itself – so in the previous example program X needs to be able to find out about the molecule Consistency& robustness: we may find out later on that in fact we’ve only got about half the molecule – we need to be able handle this

Another Example I have just collected a bunch of data sets and I wish to process them automatically Three wavelengths, with a high resolution remote sweep which overloaded the detector at lower resolutions Want to be able to combine the two remote sweeps into a single data set, then scale the other two against this set

Data Processing Data Locations of files Derived “facts” for future reference Useful feedback to data collection Hooks to get downstream processes going Useful statistics for “Table 1” of your publication

MR Example Procedure Generate a large number of search models Starting with the best try each and then stop Record results – both for user interaction and future reference (e.g. learning what makes a “good model” or likelihood of success) Could allow jobs to be tracked more easily and also rerun manually if desired

MR Data Pointers to PDB files Sequences & identities Progress & job tracking

Yet Another Example I have a 3 wavelength data set, which is phasing badly in ${automated pipeline} The “system” says there may be radiation damage, so we need to be able to find out which set was collected last and try phasing from just that

Organization What makes a project? Probably all of the above… Solve BRT1? Collect 3 wavelength MAD set? Process peak? Figure out scaling parameters for peak? Probably all of the above… Project brt1.peak.scale.refine_parameters?

So What? So we need to be able to express and record the relationship between different data sets – once these are properly expressed we can proceed This may require some kind of “import” mechanism where ${user} has an opportunity to provide a description of the data and the f’, f’’, correct beam & so on

What Else? Critical that things “discovered” at one stage are not lost thereafter e.g. data processing step asserts that the space group is probably P43212 or P41212, so don’t bother with P4122 at the phasing stage Critical also that later “discoveries” can be fed back to earlier stages

Can Databases Solve This? No! But the are probably a part of the solution …