Download presentation
Presentation is loading. Please wait.
Published byErin King Modified over 9 years ago
1
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York
2
Outline Introduction Database System manager Scientific programs Calibrating the System A Example Release and Development Plan
3
Introduction The number of entries in the Protein data bank (PDB) is increasing every year. It has many implications to Macromolecular crystallography. One challenge is how to use them efficiently in development of a structure solution software. Analysis of the PDB shows that this year around 67% of all the deposited structures reported to be solved by molecular replacement. With better algorithms and organisation of data bank it is expected that the above number can be substantially higher. Our system contains three main components, (1)reorganised database, (2) a manager written in PYTHON that makes decision and (3) scientific programs such as MOLREP and REFMAC
4
Database: Reorganisation of PDB All entries in the PDB have been analysed according to their homology and only non-redundant set of structures were stored. Hierarchical database was organized according to sequence identities If domains are present, information about them was stored Multimiers of a structure Fragments of various lengths (under way) Intensity curves for various types of macromolecules(later)
5
Database: (continue) A Database of portable size is created, which enables fast search for similar structure (less than 10 seconds in a typical MAC G5 processor for most test cases so far) all action performed locally (independent on internet) provide required information of the similar structures(domains, tertiary structures)
7
System Manager It is written using PYTHON and relies on files of XML format for information exchange: 1.Data Twinning Pseudotranslation Resolution for molecular replacement Completeness and other properties 2.Sequence Finds template structures with their domain and multimeric organisations Finds number of molecules in the asymmetric unit “Corrects” template molecules using sequence alignment 3.Protocols Runs various protocols with molecular replacement and refinement and makes decisions accordingly
8
Scientific programs MOLREP - molecular replacement Simple molecular replacement, Phased rotation, translation functions, spherically averaged phased translation function, dyad search, search with one model fixed etc REFMAC Maximum likelihood refinement, phased refinement, rigid body refinement, extensive dictionary, map coefficients etc SFCHECK Twinning tests, psuedotranslation, optical resolution, optimal resolution for molecular replacement, analysis of coordinates against electron density etc Auxiliary programs: Alignment, search in DB, analysis of sequence and data to suggest number of expected monomers, removal of bits of structure from coordinates according to fit into electron density, semiautomatic domain definition etc
9
Calibrating the System Step 1: Making the database In the PDB there were more than 30,000 structures deposited up to end of 2004, but only ~10,000 were non-redundant. These 10,000 were used to construct our database of known structures. Step 2: Testing the system: ~1000 structures were deposited between Jan-May 2005. We tried to solve all of these with our automated approach. The success rate was ~75% with our current version. This is actually higher than the proportion reported as solved using MR!
10
Overall test results Reported in PDB Note that not all structures that were used as a search model are present in our DB 87.989102OTHER 39.0923MIR 50510SIR 34.140117MAD 28.82380SAD 87.6609695MR 75.67771027ALL Rate (%) Success Cases Case Number Method Test Case Statistics
11
All 100% Reported to be solved by MR 67% Solved automatically by our system - 75% Schematic view of the success rate of our system
12
Progress to date We are analysing all failed cases and have already significantly enhanced the system as a result. We have developed several new techniques by carefully analysing these results. Success is great for funding! Failure is great for future developments!
13
Example: Addition of domains Search with the whole molecule Is it solution ? Yes Refine and exit Are there domains? No Other protocols No Yes MR for each domain and find the best Refine and produce map Mask out found domain(s) Use SPTF, PRF, PTF to find missing domains Is it solution? No Other protocols Yes Is solution complete? Yes Refine and exit No
14
Example: Domain motions - 1tj3 Finding whole molecule was problematic. Finding the large domain refining and then using SPTF/PT/TF using masked map was straightforward
15
Conclusions 1.Database is an essential ingredient of efficient automation 2.With relatively simple protocols it will be possible to solve more than 80% of structure automatically 3.Interplay of different protocols is very promising 4.Huge number of tests help to prioritise developments and generate ideas
16
Development Plans Development currently under way and in immediate future: Update database by adding entries based on PDB files deposited in 2005 (Thanks Eugene for PISA, which we use for multimer analysis) Add multichain domain definitions Test the system against PDB files deposited in 2006 Target release date: May-June 2006 Combine with some protocols from experimental phasing and automatic model building (Foadi, Cowtan) Future: Combine with automatic model building Make decision during refinement about twinning and other properties Pass information about search templates to refinement Combine with experimental phasing Regular update
17
Acknowledgements All CCP4 and YSBL people Wellcome Trust, BBSRC, EU BIOXHIT, NIH for support
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.