Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan

Slides:



Advertisements
Similar presentations
Antony Lewis Institute of Astronomy, Cambridge
Advertisements

Kevin Cowtan, CCP4 March Pirate applications... Pirate:phase improvement software Brigantine:bias removal.
Active Appearance Models
Analysis of High-Throughput Screening Data C371 Fall 2004.
Introduction to Haplotype Estimation Stat/Biostat 550.
CCP4 Molecular Graphics (CCP4MG)
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Phasing Goal is to calculate phases using isomorphous and anomalous differences from PCMBS and GdCl3 derivatives --MIRAS. How many phasing triangles will.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.
CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University.
In honor of Professor B.C. Wang receiving the 2008 Patterson Award In honor of Professor B.C. Wang receiving the 2008 Patterson Award Direct Methods and.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
The Calibration Process
TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.
The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,
MOLECULAR REPLACEMENT Basic approach Thoughtful approach Many many thanks to Airlie McCoy.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Molecular Replacement
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
Kevin Cowtan, DevMeet CCP4 Wiki ccp4wiki.org Maintainer: YOU.
Model-Building with Coot An Introduction Bernhard Lohkamp Karolinska Institute June 2009 Chicago (Paul Emsley) (University of Oxford)
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Coot Tools for Model Building and Validation
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Overview of MR in CCP4 II. Roadmap
Ligand fitting and Validation with Coot Bernhard Lohkamp Karolinska Institute June 2009 Chicago (Paul Emsley) (University of Oxford)
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
1. Diffraction intensity 2. Patterson map Lecture
Zhang, T., He, Y., Wang, J.W., Wu, L.J., Zheng, C.D., Hao, Q., Gu, Y.X. and Fan, H.F. (2012) Institute of Physics, Chinese Academy of Sciences Beijing,
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Atomic structure model
Fitting EM maps into X-ray Data Alexei Vagin York Structural Biology Laboratory University of York.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
RAPPER Nick Furnham Blundell Group – Department of Biochemistry Cambridge University UK
CCP4 Version The most recent version of the CCP4 suite is 4.1, which was released at the end of January 2001, with a minor patch release shortly.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Molecular Replacement
Planck working group 2.1 diffuse component separation review Paris november 2005.
Kevin Cowtan, Automated phase improvement and model building Kevin Cowtan
Stony Brook Integrative Structural Biology Organization
Score maps improve clarity of density maps
Common Coot (Fulica atra).
Solving Crystal Structures
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Complete automation in CCP4 What do we need and how to achieve it?
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
Reduce the need for human intervention in protein model building
CCP4 from a user perspective
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
Search for gravitational waves from binary black hole mergers:
Not your average density
Automated Molecular Replacement
Introduction to Sensor Interpretation
Introduction to Sensor Interpretation
Presentation transcript:

Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan

X-ray structure solution pipeline... Data collection Data processing Experimental phasing Model building Refinement Rebuilding Validation Density Modification Molecular Replacement

Kevin Cowtan, Oulu Density modification Density modification is a problem in combining information:

Kevin Cowtan, Oulu Density modification 1. Rudimentary calculation: |F|, φ ρ mod (x) ρ(x) |F mod |, φ mod φ=φ mod FFT FFT -1 Modify ρ Real spaceReciprocal space

Kevin Cowtan, Oulu Density modification 3. Phase probability distributions: |F|, P(φ) ρ mod (x) ρ(x) P mod (φ) P(φ)=P exp (φ),P mod (φ) FFT FFT -1 Modify ρ Real spaceReciprocal space |F best |, φ best |F mod |, φ mod centroid likelihood

Kevin Cowtan, Oulu Density modification 4. Bias reduction (gamma-correction): |F|, P(φ) ρ γ (x) ρ(x) P mod (φ) P(φ)=P exp (φ),P mod (φ) FFT FFT -1 Modify ρ |F best |, φ best |F mod |, φ mod centroid likelihood ρ mod (x) γ-correct J.P.Abrahams DM, SOLOMON, (CNS)

Kevin Cowtan, Oulu Density modification 5. Maximum Likelihood H-L: |F|, P(φ) ρ γ (x) ρ(x) FFT FFT -1 Modify ρ |F best |, φ best |F mod |, φ mod centroid MLHL ρ mod (x) γ-correct PARROT

Kevin Cowtan, Oulu Density modification Traditional density modification techniques: Solvent flattening Histogram matching Non-crystallographic symmetry (NCS) averaging

Kevin Cowtan, Oulu Solvent flattening

Kevin Cowtan, Oulu Histogram matching A technique from image processing for modifying the protein region. Noise maps have Gaussian histogram. Well phased maps have a skewed distribution: sharper peaks and bigger gaps. Sharpen the protein density by a transform which matches the histogram of a well phased map. Useful at better than 4A. P(  )  Noise True

Kevin Cowtan, Oulu Non-crystallographic symmetry If the molecule has internal symmetry, we can average together related regions. In the averaged map, the signal-noise level is improved. If a full density modification calculation is performed, powerful phase relationships are formed. With 4-fold NCS, can phase from random!

Kevin Cowtan, Oulu Non-crystallographic symmetry How do you know if you have NCS?  Cell content analysis – how many monomers in ASU?  Self-rotation function.  Difference Pattersons (pseudo-translation only). How do you determine the NCS?  From heavy atoms.  From initial model building.  From molecular replacement.  From density MR (hard). Mask determined automatically.

Density modification in Parrot Builds on existing ideas: DM:  Solvent flattening  Histogram matching  NCS averaging  Perturbation gamma Solomon:  Gamma correction  Local variance solvent mask  Weighted averaging mask

Density modification in Parrot New developments: MLHL phase combination  (as used in refinement: refmac, cns) Anisotropy correction Problem-specific density histograms  (rather than a standard library) Pairwise-weighted NCS averaging...

Estimating phase probabilities Solution: MLHL-type likelihood target function. Perform the error estimation and phase combination in a single step, using a likelihood function which incorporates the experimental phase information as a prior. This is the same MLHL-type like likelihood refinement target used in modern refinement software such as refmac or cns.

Recent Developments: Pairwise-weighted NCS averaging: Average each pair of NCS related molecules separately with its own mask. Generalisation and automation of multi- domain averaging. C B A C B A C B A

Parrot

Parrot: Rice vs MLHL Map correlations Comparing old and new likelihood functions.

Parrot: simple vs NCS averaged Map correlations Comparing with and without NCS averaging.

DM vs PARROT vs PIRATE % residues autobuilt and sequenced 50 JCSG structures, A resolution 74.2% 78.4% 79.1% DMPARROTPIRATE

DM vs PARROT vs PIRATE Mean time taken 50 JCSG structures, A resolution 6s 10s 887s DMPARROTPIRATE

Kevin Cowtan, Oulu DM vs PARROT vs PIRATE % residues autobuilt and sequenced 50 JCSG structures, A resolution 74.2% 78.4% 79.1% DMPARROTPIRATE

Kevin Cowtan, Oulu DM vs PARROT vs PIRATE Mean time taken 50 JCSG structures, A resolution 6s 10s 887s DMPARROTPIRATE

Kevin Cowtan, Oulu Buccaneer Statistical model building software based on the use of a reference structure to construct likelihood targets for protein features. Buccaneer-Refmac pipeline NCS auto-completion Improved sequencing

Kevin Cowtan, Oulu Buccaneer: Latest Buccaneer 1.2 Use of Se atoms, MR model in sequencing. Improved numbering of output sequences (ins/del) Favour more probable sidechain rotamers Prune clashing side chains Optionally fix the model in the ASU Performance improvements (1.5 x)  Including 'Fast mode' (2-3 x for good maps) Multi-threading (not in CCP ) Buccaneer 1.3 Molecular replacement rebuild mode Performance improvements, more cycles.

Kevin Cowtan, Oulu Buccaneer: Method Compare simulated map and known model to obtain likelihood target, then search for this target in the unknown map. Reference structure:Work structure: LLK

Kevin Cowtan, Oulu Buccaneer: Method Compile statistics for reference map in 4A sphere about C  => LLK target. 4A sphere about Ca also used by 'CAPRA' Ioeger et al. (but different target function). Use mean/variance.

Kevin Cowtan, Oulu Buccaneer 10 stages: Find candidate C-alpha positions Grow them into chain fragments Join and merge the fragments, resolving branches Link nearby N and C terminii (if possible) Sequence the chains (i.e. dock sequence) Correct insertions/deletions Filter based on poor density NCS Rebuild to complete NCS copies of chains Prune any remaining clashing chains Rebuild side chains

Kevin Cowtan, Oulu Buccaneer Use a likelihood function based on conserved density features. The same likelihood function is used several times. This makes the program very simple (<3000 lines), and the whole calculation works over a range of resolutions. ALA CYSHISMETTHR... x20 Finding, growing: Look for C-alpha environment Sequencing: Look for C-beta environment

Kevin Cowtan, Oulu Buccaneer Case Study: A difficult loop in a 2.9A map, calculated using real data from the JCSG.

Kevin Cowtan, Oulu Find candidate C-alpha positions

Kevin Cowtan, Oulu Grow into chain fragments

Kevin Cowtan, Oulu Join and merge chain fragments

Kevin Cowtan, Oulu Sequence the chains

Kevin Cowtan, Oulu Correct insertions/deletions

Kevin Cowtan, Oulu Prune any remaining clashing chains

Kevin Cowtan, Oulu Rebuild side chains

Kevin Cowtan, Oulu Comparison to the final model

Kevin Cowtan, Oulu Buccaneer: Results Model completeness not very dependent on resolution:

Kevin Cowtan, Oulu Buccaneer: Results Model completeness dependent on initial phases:

Kevin Cowtan, Oulu Buccaneer Cycle BUCCANEER and REFMAC for most complete model Single run of BUCCANEER only (more options) quick assessment/advanced use

Kevin Cowtan, Oulu Buccaneer

Kevin Cowtan, Oulu Buccaneer What it does: Trace protein chains (trans-peptides only) Link across small gaps Sequence Apply NCS Build side chains (roughly) Refine (if recycled) WORK AT LOW RESOLUTIONS  3.7A with good phases

Kevin Cowtan, Oulu Buccaneer What it does not do (yet): Cis-peptides Waters Ligands Loop fitting Tidy up the resulting model In other words, it is an ideal component for use in larger pipelines.

Kevin Cowtan, Oulu Buccaneer What you need to do afterwards: Tidy up with Coot.  Or ARP/wARP when resolution is good.  Buccaneer/ARP/wARP better+faster than ARP/wARP. Typical Coot steps:  Connect up any broken chains.  Use density fit and rotamer analysis to check rotamers.  Check Ramachandran, molprobity, etc.  Add waters, ligands, check un-modeled blobs..  Re-refine, examine difference maps.

Kevin Cowtan, Oulu Buccaneer: Summary A simple, fast, easy to use (i.e. MTZ and sequence) method of model building which is robust against resolution. User reports for structures down to 3.7A when phasing is good. Results can be further improved by iterating with refinement in refmac (and in future, density modification). Proven on real world problems.

Kevin Cowtan, Oulu Achnowledgements Help: JCSG data archive: Eleanor Dodson, Paul Emsley, Randy Read, Clemens Vonrhein, Raj Pannu Funding: The Royal Society