1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006.

Slides:

Advertisements

Similar presentations

Molecular Replacement in CCP4

Advertisements

Molecular Replacement

Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.

12 th April 2007 What’s new and Automation developments in CCP4 Ronan Keegan CCP4, STFC Daresbury Laboratory, U.K.

Clustal W and Clustal X version 2.0 김영호, 박준호, 최현희 The 9 th Protein Folding Winter School.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.

Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.

A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 

Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.

Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.

Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.

Protein Structures.

The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.

Refinement with REFMAC

Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)

Protein Interfaces, Surfaces and Assemblies

Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)

CCP4 Study Weekend 3rd January 2003 CCP4i - “Tricks and Tools” Peter Briggs CCP4 Daresbury.

Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *

Protein Sequence Alignment and Database Searching.

28 th March 2007 MrBUMP – Automated Molecular Replacement Ronan Keegan, Martyn Winn CCP4, Daresbury Laboratory.

28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.

Modelling binding site with 3DLigandSite Mark Wass

Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK.

Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.

A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 

A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 

BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.

1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.

Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.

Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.

Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.

R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.

MrBUMP – Molecular Replacement with Bulk Model Preparation Automated search model discovery and preparation for structure solution by molecular replacement.

Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates

Developments with CCP4i & the Database Handler Peter Briggs.

Structure prediction: Homology modeling

Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.

Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.

Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe

Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.

SR Users Meeting 10-11th September 2003 CCP4 Release 5.0 Peter Briggs CCP4/CCLRC Daresbury Laboratory.

17 th March 2008 MrBUMP progress report Ronan Keegan & Martyn Winn Daresbury Laboratory.

Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.

Software automation – What STAB sees as key aims? 1.Brief review of activities and recommendations (so far) 2.Reality checks 3. Things to do…

CCP4 Molecular Replacement Model Generation Create a CCP4i task for generating Molecular Replacement models. - Selecting suitable PDB entries, based on.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.

EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)

HANDS-ON ConSurf! Web-Server: The ConSurf webserver.

Peter J. Briggs, Alun Ashton, Charles Ballard, Martyn Winn and Pryank Patel CCLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK The CCP4 project.

Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,

What does the future hold? SAPHIRE CCP4 libraries Program Developments More automation 3D viewer Project CCP4 Study Weekend 2003 BAR!

Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.

Stony Brook Integrative Structural Biology Organization

CCP4 6.1 and beyond: Tools for Macromolecular Crystallography

Complete automation in CCP4 What do we need and how to achieve it?

Protein Structures.

Protein structure prediction.

Basic Local Alignment Search Tool (BLAST)

CCLRC Daresbury Laboratory

Automated Molecular Replacement

MrBUMP: progress and plans

The temporary site to download BALBES:

The site to download BALBES:

Database for MR.

Presentation transcript:

1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

2 The aim of Mr Bump An automation framework for Molecular Replacement. Particular emphasis on generating a variety of search models. Can be used to generate models only. Wraps Phaser and/or Molrep. Also uses a variety of helper applications (e.g. Chainsaw) and bioinformatics tools (e.g. Fasta, Mafft) Uses on-line databases (e.g. PDB, Scop) In favourable cases, gives “one-button” solution In unfavourable cases, will suggest likely search models for manual investigation (lead generation)

3 ` Target MTZ & Sequence Target Details Currently: –Number of residues and molecular weight –Matthews Coefficient. –Estimated number of molecules in the a.s.u.

4 ` ` Target MTZ & Sequence Target Details Model Search Generate a list of structures that are possible templates for search models

5 Search for homologous proteins FASTA search of PDB –Sequence based search using sequence of target structure. –Can be run locally if user has fasta34 program installed or remotely using the OCA web-based service hosted by the EBI. –Local search is done against the complete list of PDB sequences derived from ATOM records in the PDB structure files. –All of the resulting PDB id codes are added to a list –Not interested in the alignment to target at this stage.

6 Search for similar structures Secondary Structure based search (optional) –Top hit from the FASTA search is used as the template structure for a secondary structure based search. –Uses the SSM webservice provided by the EBI. –Any new structures found that aren’t included in the list of matches from the FASTA search are added to the list. –Provides structural variation, not based on direct sequence similarity to target Manual addition –Can add additional PDB id codes to the list, e.g. from FFAS or psiBLAST searches

7 Multiple Alignment After the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequence Aims: –Score template structures in a consistent manner, in order to prioritise them for subsequent steps –Extract pairwise alignment between template and target for use in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments

8 Multiple Alignment target model templates pairwise alignment Jalview Barton group, Dundee currently support ClustalW or MAFFT for multiple alignment

9 Template Model Scoring Sequence identity: –Ungapped sequence identity i.e. sequence identity of aligned target residues Alignment quality: –Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps. –The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher. The top scoring models are then used for further processing Alignment Scoring: score = sequence identity X alignment quality

10 Domains Suitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domains In case of relative domain motion, may want to solve domains separately

11 Domains Domains search: –Top scoring templates from multiple alignment are tested to see if they contain any domains. –Uses the SCOP database. This only lists domains that appear more than once in the PDB. –The database is scanned to to see if domains exist for each of the PDBs in the list of templates –Domains are then extracted from the parent PDB structure file and added to the list of template models as additional search models for MR.

12 Multimers Multimer search: –Search for quaternary structures that may be used as search models. –Better signal-to-noise ratio than monomer, if assembly is correct for the target. –Multimeric structures based on top templates are retrieved using the PQS service at the EBI, and added to the list of search models –PQS will soon be replaced by the use of the PISA service at the EBI (Eugene Krissinel) 1n5a SPLIT-ASU into 4 Oligomeric files of type TRIMERIC 1n5b SPLIT-ASU into 2 Oligomeric files of type DIMERIC 1n5c SYMMETRY-COMPLEX Oligomeric file of type DIMERIC 1n5d SYMMETRY-COMPLEX Oligomeric file of type DIMERIC

13 ` ` ` Target MTZ & Sequence Target Details Model Search Model Preparation

14 Search Model Preparation Search models prepared in four ways: 1.PDBclip –original PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing. 2.Molrep –Molrep contains a model preparation function which will align the template sequence with the target sequence and prune the non- conserved side chains accordingly. –Chainsaw –Can be given any alignment between the target and template sequences. –Non-conserved residues are pruned back to the gamma atom. 1.Polyalanine –Created by excluding all of the side chain atoms beyond the CB atom using the Pdbset program

15 Search Model Preparation Ensemble for Phaser: Top scoring search models are “superposed” to create a ensemble model. This may provide a better search model than any of the individual models on their own. Currently the default is to use the top 5 scoring search models but plan to create dynamically based on MW and RMSDs of constituent search models

16 ` ` ` ` Target MTZ & Sequence Target Details Model Search Model Preparation Molecular Replacement & Refinement

17 The search models can be processed with Molrep or Phaser or both. The resulting models from molecular replacement are passed to Refmac for restrained refinement. The change in the Rfree value during refinement is used to determine how good the resulting model is. If the final value for Rfree is less than 0.35 or it is less than 0.5 and has fallen by more than 20 % from the initial Rfree, a solution is deemed to have been found. Models that produce an Rfree below 0.5 and the value looks to be falling will be highlighted as “marginal solutions” that are worthy of further investigation if no solution is found using the other search models. Molecular Replacement and Refinement

18 ` ` ` ` Target MTZ & Sequence Target Details Model Search Model Preparation Molecular Replacement & Refinement Serial mode: Check Scores and exit or select the next model

19 ` ` ` ` Target MTZ & Sequence Target Details Model Search Model Preparation Molecular Replacement & Refinement ` Molecular Replacement & Refinement ` Molecular Replacement & Refinement ` Molecular Replacement & Refinement ` Molecular Replacement & Refinement Parallel mode: Start multiple MR jobs and exit when one finds a solution

20 MrBUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs. Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand. MrBUMP on compute clusters

21 Pre-release made available in Jan 06 Simple installation Currently runs on Linux and OSX. Windows version almost ready. Comes with CCP4 GUI. Can also be run from the command line with keyword input Good deal of interest and some successes Regular updates (currently version 0.3) Pre-release version of MrBUMP

22 Example 1 1vlw: 3 chains of 205aa. Data in C2221 to 2.3Å. Using Molrep.

23 Example 2 Anon: yes = arp/warp builds and docks entire molecule no = arp/warp fails = wrong MR solution MrBUMP marginal solution solution used

24 A few observations... In difficult cases, success in MrBUMP may depend on particular template, chain and model preparation method Nevertheless, may get several putative solutions Ease of subsequent model re-building, model completion may depend on choice of solution First solution or check everything? Expectation that quick solution required - in fact, most users seem happy to let MrBUMP run for long time (hours, days) Worth checking “failed” solutions!

25 Future developments Windows support (almost done) Complexes (in progress) –Processing of multiple target sequences Improved alignment: Multiple alignment against larger sequence database Alignment from profile-based search User-supplied alignment Incorporate PISA multimer determining service (in progress) Model generation: Identification of flexible loops Normal mode generated conformations Develop web-service version to allow CCP4i users to run jobs on CCP4 cluster

26 Ronan Keegan, Daresbury Thanks to authors of all underlying programs and services Other suggestions from: Dave Meredith, Graeme Winter, Daresbury Laboratory. Eugene Krissinel, EBI, Cambridge. Eleanor Dobson, YSBL, York University Geoff Barton, Charlie Bond, University of Dundee Randy Read, Airlie McCoy, Cambridge Funding: BBSRC (e-HTPX, CCP4) Acknowledgements