Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart.

Slides:



Advertisements
Similar presentations
SOME PRACTICAL ISSUES IN COMPOSITE INDEX CONSTRUCTION Lino Briguglio and Nadia Farrugia Department of Economics, University of Malta Prepared for the INTERNATIONAL.
Advertisements

Linear Regression.
Chapter 6 Sampling and Sampling Distributions
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Normal Distribution * Numerous continuous variables have distribution closely resemble the normal distribution. * The normal distribution can be used to.
Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Kernel methods - overview
A Brief Description of the Crystallographic Experiment
Image processing. Image operations Operations on an image –Linear filtering –Non-linear filtering –Transformations –Noise removal –Segmentation.
Twinning in protein crystals NCI, Macromolecular Crystallography Laboratory, Synchrotron Radiation Research ANL Title Zbigniew Dauter.
Evaluating Hypotheses
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
Linear and generalised linear models
Experimental Evaluation
Chapter 7 Probability and Samples: The Distribution of Sample Means
Lecture II-2: Probability Review
Classification and Prediction: Regression Analysis
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Solving Centrosymmetric Crystal Structures in Non-Centrosymmetric Space Groups Michael Shatruk September 12, 2011.
The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,
Not retired: Hall symbols + CIF or How Syd influenced my life without me noticing it. Ralf W. Grosse-Kunstleve Computational Crystallography Initiative.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
Traffic Modeling.
Cab55342 Autobuild model Density-modified map Autobuilding starting with morphed model.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
First Aid & Pathology Data quality assessment in PHENIX
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Sample Size Determination CHAPTER thirteen.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Chem Patterson Methods In 1935, Patterson showed that the unknown phase information in the equation for electron density:  (xyz) = 1/V ∑ h ∑ k.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 7 Probability and Samples: The Distribution of Sample Means
1. Diffraction intensity 2. Patterson map Lecture
The PLATON/TwinRotMat Tool for Twinning Detection Ton Spek National Single Crystal Service Facility, Utrecht University, The Netherlands. Delft, 29-Sept-2008.
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart.
Sampling and estimation Petter Mostad
Machine Learning 5. Parametric Methods.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter 13 Understanding research results: statistical inference.
CHAPTER- 3.2 ERROR ANALYSIS. 3.3 SPECIFIC ERROR FORMULAS  The expressions of Equations (3.13) and (3.14) were derived for the general relationship of.
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Sample Size Determination
Statistics: The Z score and the normal distribution
Break and Noise Variance
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
Statistical Methods For Engineers
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Support Vector Machines
LECTURE 09: BAYESIAN LEARNING
MGS 3100 Business Analysis Regression Feb 18, 2016
The PLATON/TwinRotMat Tool for Twinning Detection
Presentation transcript:

Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography InitiativePhysical Biosciences Division Overview Exploring metric symmetry –iotbx.explore_metric_symmetry Outlier detection –mmtbx.remove_outliers Twinning –mmtbx.twin_map_utils –Actually: cctbx.python $MMTBX_DIST/mmtbx/twinning/twin_map_utils.py

Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Protein crystals grown under various conditions can sometimes exhibit drastic changes in symmetry and unit cell dimensions Sometimes, the crystal symmetries are related –The relation is not always obvious –Finding the relation between two unit cells can be not so straightforward Knowing the relations between the different crystal forms can be helpful during structure solution

Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry How to find relations between unit cells? –A sub-lattice formalism allows one to generate a family of related lattices from a given lattice The number of unique unit cells that are N times larger than the original unit cell is quite small Rutherford, Acta Cryst. (2006). A62, –Unit cells of approximate equal volume can be compared to each other by checking a large number of uni-modular transforms Ralfs work

Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Sub lattice? –Given all lattice points, ignore some of them while ensuring that the remaining lattice points form a regular lattice

Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Examples Native : P SeMet1 : P SeMet2 : C Poulsen, et al, (2001). Acta Cryst. D57,

Computational Crystallography InitiativePhysical Biosciences Division Exploring metric symmetry Future –Provide reindexing methods between related unit cells. Would make molecular replacement of related structures easier Useful for multi crystal averaging –Obtain non-merohedral twin laws from this analyses

Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Outliers can have a detrimental effect on the progress of structure solution and refinement –Read, Acta Cryst. (1999). D55, The detection of outliers should be performed on the basis of all information available. –Use model info if you can One would like to have the flexibility of correcting for mistakes made earlier –Those reflection with E-values larger then 5 could have been valid observations!

Computational Crystallography InitiativePhysical Biosciences Division Outlier detection What is an outlier? –A data point that does not fit a model because of an abnormal situation such as an erroneous measurement. How to spot them? –If Fobs is not reconcilable with Fcalc, Fobs might be an outlier Reconcilable? –Fobs should be explainable from Fcalc and the current quality of the model (  A )

Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Model based outlier detection is done in a similar way to the method described by Read (Acta Cryst. (1999). D55, ) –Fobs and Fcalc are normalized to get Eobs & Ecalc –  A is estimated for each reflection Combining standard likelihood techniques with kernel methods to obtain smooth varying estimates –Find : –Compute :

Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Q is approximately  2 distributed Acceptable values of Q are determined by the size of the dataset –If the dataset is large, large deviations are expected A p-value is computed for each reflection –The p-value is the probability that if this particular Q- value was the largest in the dataset, a Q value of equal or larger value is observed by chance. Observations for which the p-value is smaller than 5% are considered outliers.

Computational Crystallography InitiativePhysical Biosciences Division Outlier detection Example: 1ty3 Wilson statistics indicate 1 outlier (25,6,-43) Eobs = centric = True p-wilson = 1.83E-07 p-extreme = 9.0E-03 Model based outlier detection indicate that the (25,6,-43) is a valid observation

Computational Crystallography InitiativePhysical Biosciences Division Outlier Detection The outlier detection algorithm is embedded in a class that caches the original observed data. This will allow one to perform outlier detection during different macro- cycles/rebuilding states and update Will be incorporated in phenix.refine at the appropriate juncture –Command line tool available

Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Routines available –Least squares target functions Both intensity and amplitude Target values and first derivatives –Detwinning Standard and a la Sheldrick –R-values –Map coefficients 2mFo-DFc & gradient maps –Bulk solvent scaling Estimation of twin fraction, k sol B sol, U * and overall scale on twinned data –Using global optimizer (differential evolution) for the moment

Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Bulk solvent scaling and detwinned map generation available as a command line tool mmtbx.twin_map_utils Results similar to CNS mmtbx.twin_map_utils should be seen as the first step to full integration of twin utilities in phenix.refine

Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report mmtbx.twin_map_utilsCNS

Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Twinning not taken into account 1eyx: twin fraction = 0.47; difference maps at 2.5 sigma Ligands and waters deleted (10% of total model) Twinning taken into account

Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Twinning not taken into account Difference in 2mF O -DF C density is less striking Twinning taken into account

Computational Crystallography InitiativePhysical Biosciences Division Twinning progress report Future plans –Likelihood based map coefficients in collaboration with Randy Read –Incorporation of least squares targets in phenix.refine –Likelihood based targets in collaboration with Randy Read

Computational Crystallography InitiativePhysical Biosciences Division Ackowledgements Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn Cambridge Randy Read Airlie McCoy Los Alamos Tom Terwilliger Li Wei Hung Texas A&M Univeristy Jim Sacchettini Tom Ioerger Eric McKee Duke University Jane Richardson David Richardson Phenix industrial Consortium Robert Nolte Eric Vogan Funding: –LBNL (DE-AC03-76SF00098) –NIH/NIGMS (P01GM063210) –PHENIX Industrial Consortium

Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Discrete binning of X-ray data introduces discontinuous jumps of properties that are continuously varying properties –Mean intensity (normalisation) –The estimation of  A Possible remedies: –Spline functions Used extensively by K. Cowtan –Kernel methods

Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Discreet binning assumes a constant value in a certain range

Computational Crystallography InitiativePhysical Biosciences Division Kernel methods With Kernel methods, the estimate at each position is based on a full dataset. –The amount that each datum contributes is determined by a weighting function (usually depending on the squared distance)

Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Kernel method available for normalisation –Used by xtriage in intensity statistics Kernel method available for of  A estimation –Used in the outlier detection

Computational Crystallography InitiativePhysical Biosciences Division Kernel methods Determination of alpha from  A estimated using kernel methods results in values similar as those obtained by what is available in phenix.refine Similar results for beta