Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart
Computational Crystallography InitiativePhysical Biosciences Division Overview Exploring metric symmetry –iotbx.explore_metric_symmetry Outlier detection –mmtbx.remove_outliers Twinning –mmtbx.twin_map_utils –Actually: cctbx.python $MMTBX_DIST/mmtbx/twinning/twin_map_utils.py
Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Protein crystals grown under various conditions can sometimes exhibit drastic changes in symmetry and unit cell dimensions Sometimes, the crystal symmetries are related –The relation is not always obvious –Finding the relation between two unit cells can be not so straightforward Knowing the relations between the different crystal forms can be helpful during structure solution
Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry How to find relations between unit cells? –A sub-lattice formalism allows one to generate a family of related lattices from a given lattice The number of unique unit cells that are N times larger than the original unit cell is quite small Rutherford, Acta Cryst. (2006). A62, –Unit cells of approximate equal volume can be compared to each other by checking a large number of uni-modular transforms Ralfs work
Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Sub lattice? –Given all lattice points, ignore some of them while ensuring that the remaining lattice points form a regular lattice
Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Examples Native : P SeMet1 : P SeMet2 : C Poulsen, et al, (2001). Acta Cryst. D57,
Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Future –Provide reindexing methods between related unit cells. Would make molecular replacement of related structures easier Useful for multi crystal averaging –Obtain non-merohedral twin laws from this analyses
Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Outliers can have a detrimental effect on the progress of structure solution and refinement –Read, Acta Cryst. (1999). D55, The detection of outliers should be performed on the basis of all information available. –Use model info if you can One would like to have the flexibility of correcting for mistakes made earlier –Those reflection with E-values larger then 5 could have been valid observations!
Computational Crystallography InitiativePhysical Biosciences Division Outlier detection What is an outlier? –A data point that does not fit a model because of an abnormal situation such as an erroneous measurement. How to spot them? –If Fobs is not reconcilable with Fcalc, Fobs might be an outlier Reconcilable? –Fobs should be explainable from Fcalc and the current quality of the model ( A )
Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Model based outlier detection is done in a similar way to the method described by Read (Acta Cryst. (1999). D55, ) –Fobs and Fcalc are normalized to get Eobs & Ecalc – A is estimated for each reflection Combining standard likelihood techniques with kernel methods to obtain smooth varying estimates –Find : –Compute :
Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Q is approximately 2 distributed Acceptable values of Q are determined by the size of the dataset –If the dataset is large, large deviations are expected A p-value is computed for each reflection –The p-value is the probability that if this particular Q- value was the largest in the dataset, a Q value of equal or larger value is observed by chance. Observations for which the p-value is smaller than 5% are considered outliers.
Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Example: 1ty3 Wilson statistics indicate 1 outlier (25,6,-43) Eobs = centric = True p-wilson = 1.83E-07 p-extreme = 9.0E-03 Model based outlier detection indicate that the (25,6,-43) is a valid observation
Computational Crystallography InitiativePhysical Biosciences Division Outlier Detection The outlier detection algorithm is embedded in a class that caches the original observed data. This will allow one to perform outlier detection during different macro- cycles/rebuilding states and update Will be incorporated in phenix.refine at the appropriate juncture –Command line tool available
Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Routines available –Least squares target functions Both intensity and amplitude Target values and first derivatives –Detwinning Standard and a la Sheldrick –R-values –Map coefficients 2mFo-DFc & gradient maps –Bulk solvent scaling Estimation of twin fraction, k sol B sol, U * and overall scale on twinned data –Using global optimizer (differential evolution) for the moment
Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Bulk solvent scaling and detwinned map generation available as a command line tool mmtbx.twin_map_utils Results similar to CNS mmtbx.twin_map_utils should be seen as the first step to full integration of twin utilities in phenix.refine
Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report mmtbx.twin_map_utilsCNS
Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Twinning not taken into account 1eyx: twin fraction = 0.47; difference maps at 2.5 sigma Ligands and waters deleted (10% of total model) Twinning taken into account
Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Twinning not taken into account Difference in 2mF O -DF C density is less striking Twinning taken into account
Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Future plans –Likelihood based map coefficients in collaboration with Randy Read –Incorporation of least squares targets in phenix.refine –Likelihood based targets in collaboration with Randy Read
Computational Crystallography InitiativePhysical Biosciences Division Ackowledgements Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn Cambridge Randy Read Airlie McCoy Los Alamos Tom Terwilliger Li Wei Hung Texas A&M Univeristy Jim Sacchettini Tom Ioerger Eric McKee Duke University Jane Richardson David Richardson Phenix industrial Consortium Robert Nolte Eric Vogan Funding: –LBNL (DE-AC03-76SF00098) –NIH/NIGMS (P01GM063210) –PHENIX Industrial Consortium
Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Discrete binning of X-ray data introduces discontinuous jumps of properties that are continuously varying properties –Mean intensity (normalisation) –The estimation of A Possible remedies: –Spline functions Used extensively by K. Cowtan –Kernel methods
Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Discreet binning assumes a constant value in a certain range
Computational Crystallography InitiativePhysical Biosciences Division Kernel methods With Kernel methods, the estimate at each position is based on a full dataset. –The amount that each datum contributes is determined by a weighting function (usually depending on the squared distance)
Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Kernel method available for normalisation –Used by xtriage in intensity statistics Kernel method available for of A estimation –Used in the outlier detection
Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Determination of alpha from A estimated using kernel methods results in values similar as those obtained by what is available in phenix.refine Similar results for beta