Direct Use of Phase Information in Refmac Abingdon, 18.3.2008 University of Leiden P. Skubák.

Slides:



Advertisements
Similar presentations
Kevin Cowtan, CCP4 March Pirate applications... Pirate:phase improvement software Brigantine:bias removal.
Advertisements

Functions. COMP104 Functions / Slide 2 Introduction to Functions * A complex problem is often easier to solve by dividing it into several smaller parts,
Phasing Goal is to calculate phases using isomorphous and anomalous differences from PCMBS and GdCl3 derivatives --MIRAS. How many phasing triangles will.
H(s) x(t)y(t) 8.b Laplace Transform: Y(s)=X(s) H(s) The Laplace transform can be used in the solution of ordinary linear differential equations. Let’s.
Kinematic Synthesis of Robotic Manipulators from Task Descriptions June 2003 By: Tarek Sobh, Daniel Toundykov.
Unsupervised Learning
Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
Design Concepts and Principles
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Artificial Learning Approaches for Multi-target Tracking Jesse McCrosky Nikki Hu.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Experimental Phasing stuff. Centric reflections |F P | |F PH | FHFH Isomorphous replacement F P + F H = F PH FPFP F PH FHFH.
Refinement of Macromolecular structures using REFMAC5 Garib N Murshudov York Structural Laboratory Chemistry Department University of York.
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Generalized Communication System: Error Control Coding Occurs In Right Column. 6.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Refinement with REFMAC
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
Kevin Cowtan, DevMeet CCP4 Wiki ccp4wiki.org Maintainer: YOU.
Database structure for the European Integrated Tokamak Modelling Task Force F. Imbeaux On behalf of the Data Coordination Project.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Chem Structure Factors Until now, we have only typically considered reflections arising from planes in a hypothetical lattice containing one atom.
Crank and Databases Steven Ness Leiden University The Netherlands.
Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
Zhang, T., He, Y., Wang, J.W., Wu, L.J., Zheng, C.D., Hao, Q., Gu, Y.X. and Fan, H.F. (2012) Institute of Physics, Chinese Academy of Sciences Beijing,
GEOMETRIC OPERATIONS. Transformations and directions Affine (linear) transformations Translation, rotation and scaling Non linear (Warping transformations)
Procedural Programming Criteria: P2 Task: 1.2 Thomas Jazwinski.
What is in my contribution area Nick Sinev, University of Oregon.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
Lesson 23 Some final comments on structure solution Non-linear least squares SHELXL.
Ligand Building with ARP/wARP. Automated Model Building Given the native X-ray diffraction data and a phase-set To rapidly deliver a complete, accurate.
Progress report on Crank: Model building Biophysical Structural Chemistry Leiden University, The Netherlands.
Fitting EM maps into X-ray Data Alexei Vagin York Structural Biology Laboratory University of York.
Software automation – What STAB sees as key aims? 1.Brief review of activities and recommendations (so far) 2.Reality checks 3. Things to do…
Testing and Evaluating Software Solutions Introduction.
BlueFin Best Linear Unbiased Estimate Fisher Information aNalysis Andrea Valassi (IT-SDC) based on the work done with Roberto Chierici TOPLHCWG meeting.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
SEARCH FOR DIRECT PRODUCTION OF SUPERSYMMETRIC PAIRS OF TOP QUARKS AT √ S = 8 TEV, WITH ONE LEPTON IN THE FINAL STATE. Juan Pablo Gómez Cardona PhD Candidate.
Mesh Refinement: Aiding Research in Synthetic Jet Actuation By: Brian Cowley.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
Hierarchical Segmentation of Polarimetric SAR Images
reduction data treatment for ARCS
Wrapping Fortran libraries
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
Productivity Tools for Scientific Computing
CCP4 from a user perspective
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Principles of the Global Positioning System Lecture 11
Emulator of Cosmological Simulation for Initial Parameters Study
Presentation transcript:

Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák

SAD EXPERIMENT PHASING and DENSITY MODIFICATION REFINEMENT and MODEL BUILDING  |F||F| |F + |, |F - | |F| = ( |F + | + |F - | ) ‏ REFINEMENT WITHOUT PRIOR PHASE INFORMATION

SAD EXPERIMENT PHASING and DENSITY MODIFICATION REFINEMENT and MODEL BUILDING  , P e (  ) ‏ |F||F| |F + |, |F - | REFINEMENT WITH INDIRECT PRIOR PHASE INFORMATION P e (  ) = e A cos(  ) + B sin(  ) + C cos(2.  ) + D sin(2.  )‏ |F| = ( |F + | + |F - | ) ‏

SAD EXPERIMENT PHASING and DENSITY MODIFICATION REFINEMENT and MODEL BUILDING , heavy atom model |F + |, |F - | REFINEMENT WITH DIRECT PRIOR PHASE INFORMATION |F + |, |F - | |F| = ( |F + | + |F - | ) ‏

Rice refinement target P( |F o |,  o, |F c |,  c )‏ integration over all  o P( |F o |, |F c |,  c )‏ P( |F o | ; |F c |,  c )‏ division by P( |F c |,  c )‏  conditional probability distribution P( |F o | ; |F c |,  c )‏  maximum likelihood refinement target with no prior phase information

MLHL refinement target P( |F o |,  o, |F c |,  c )‏ weighted integration over all  o P( |F o |, |F c |,  c )‏ P( |F o | ; |F c |,  c )‏ division by P( |F c |,  c )‏  conditional probability distribution P( |F o | ; |F c |,  c )‏  maximum likelihood refinement target indirectly incorporating prior phase information

P(|F o - |, |F o + | ; |F c - |,  c -, |F c + |,  c + )‏ P(|F o - |, |F o + |, |F c - |,  c -, |F c + |,  c + )‏ SAD refinement target P(|F o - |,  o -, |F o + |,  o +, |F c - |,  c -, |F c + |,  c + )‏ integration over all  o -,  o + division by P( |F c - |,  c -, |F c + |,  c + )‏  maximum likelihood refinement target directly incorporating prior phase information  conditional probability distribution P( |F o - |, |F o + | ; |F c - |,  c -, |F c + |,  c + )‏

SAD distribution P( |F o + |, |F o - | ; A c, B c, A Hc, B Hc ) ‏ (strong prior phase information) ‏ Rice distribution P( |F o | ; A c, B c ) ‏ (no prior phase information) ‏

SAD distribution (weak prior phase information) ‏

SAD REFINEMENT TARGET USE IN REFMAC  iterated automated model building with SAD function refinement  substructure refinement and scaling  refinement of models in the final stages

 iterated automated model building with SAD function refinement  substructure refinement and scaling  refinement of models in the final stages SAD REFINEMENT TARGET USE IN REFMAC

 automated model building programs do not support SAD target (yet), workarounds needed in order to test:  the heavy atoms parameters file inputed to model building program separately by a script which also calls Refmac with the extra keywords needed for SAD refinement  this workaround used in CRANK for ARP/wARP+Refmac_sad implementation  better integration of ARP/wARP with Refmac SAD is on the way MODEL BUILDING WITH SAD REFINEMENT

Fraction of ARP/wARP built residues to total number of residues resolution lower than 2.4 Åresolution higher than 2.4 Å

 iterated automated model building with SAD function refinement  substructure refinement and scaling  refinement of models in the final stages SAD REFINEMENT TARGET USE IN REFMAC

SAD SUBSTRUCTURE REFINEMENT & SCALING IN REFMAC (VERY PRELIMINARY RESULTS) ‏  being tested on ~ 200 JSCG datasets using CRANK package with pipeline: Refmac5_sad for scaling, Solomon for DM and Refmac5_sad for model building  average phase error after refmac phasing 75.4 deg  70 runs finished, of which 22 with successful model building  similar results ( 67 runs finished of which 25 with successful model building ) achieved with the same pipeline using BP3 instead of Refmac5_sad for phasing

 iterated automated model building with SAD function refinement  substructure refinement and scaling  refinement of models in the final stages SAD REFINEMENT TARGET USE IN REFMAC

SAD REFINEMENT – CLOSE TO FINAL MODEL

SIRAS EXPERIMENT DIRECT USE OF PRIOR PHASES SIRAS X-RAY EXPERIMENT PHASING and DENSITY MODIFICATION REFINEMENT and MODEL BUILDING  substructure model |F N |, |F D + |,|F D - |  P( |F oN |,|F oD - |, |F oD + | ; |F cN |,  cN,|F cD - |,  cD -, |F cD + |,  cD + )‏

SIRAS IMPLEMENTATION REQUIREMENTS AND TODO  numerical approximations to the 3-dimensional SIRAS integral – done for the function and first derivatives evaluation  second derivatives of SIRAS function should be calculated and used in minimisation too  modelling of non-isomorphism:  more models in Refmac with restraints between them and their parts

SIRAS VERY PRELIMINARY RESULTS –number of protein residues correctly built : –results from Refmac5D - not modeling non-isomorphism (sharing protein part for native and derivative model), heavy atom refinement outside of Refmac, only first derivatives etc

Plans for the coming months  run and analyze massive JCSG tests on both Refmac SAD substructure refinement and scaling and protein model building with iterative Refmac SAD refinement  analyze the SAD target improvements for close to final models  better integration of SAD with model building programs  anisotropic ATP's refinement for SAD target  simultaneous refinement of occupancies and ATP's for all targets  more models in Refmac (input, output, refinement etc) ‏  geometry restraints between more models  SIRAS target implementation and testing for substructure refinement and scaling and protein model building  target for joint refinement of protein and ligand P( |F oP |,|F oPL |; |F cP |,  cP,|F cPL |,  cPL )‏

I. Original Refmac5 code files II. Modified Refmac5 code files III. Bridge code files – layer between Refmac5 and SAD function itself IV. SAD function code files Refmac5 code organisation fortran C/C++

SAD/SIRAS function implementation  standalone C++ template class with double or single precision  general likelihood function for 1 or 2 observed structure factors and N model structure factors (includes a.o. SAD, SIR or Rice functions for both centric and acentric cases) ‏  possibility to define arbitrary covariance matrices for different experiments/situations, with real or complex terms  calculation of functional value, 1. and 2. derivatives with regards to calculated structure factors and Luzzati D parameters  Gaussian integration over unknown observed phases  use of tabulated Sin, Cos, Exp and modified Bessel I 0, I 1 functions to increase the evaluation speed  use of LAPACK package for calculation of eigenvalues of covariance matrices

I. Original Refmac5 code files II. Modified Refmac5 code files III. Bridge code files – layer between Refmac5 and SAD function itself IV. SAD function code files Refmac5 code organisation fortran C/C++

Tasks performed by bridge layer  passing the calls and parameters between Refmac5 part and likelihood function part in both directions  place of instantiation and “life” of likelihood class  transformation of derivatives with regards to structure factors amplitudes and phases (polar coordinates) to derivates with regards to real and imaginary structure factore part (as used by Refmac5) ‏  role in read/write of substructure files  checks of reasonability and/or correctness of some input and output likelihood function parameters

I. Original Refmac5 code files II. Modified Refmac5 code files III. Bridge code files – layer between Refmac5 and SAD function itself IV. SAD function code files Refmac5 code organisation fortran C/C++

Tasks performed by modified Refmac5 files  input, output and availability in code of observed |F + |, |F - | columns (via standard CCP4 libraries to read and write mtz files) ‏  input, output and availability in code of substructure parameters (standard pdb file format and new internally used refmac5 format for both input and output) ‏  gathering and precomputation of all information required as input by SAD function  calling of SAD function passing all required input information(via bridge functions) ‏  replacement and/or modification of all original Refmac5 subroutines requiring different treatment with SAD function  harvesting of input keywords specific for SAD refinement  all original tasks performed by these files