Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart.

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Twinning and other pathologies Andrey Lebedev University of York.
Lecture 6 CS5661 Pairwise Sequence Analysis-V Relatedness –“Not just important, but everything” Modeling Alignment Scores –Coin Tosses –Unit Distributions.
CTRUNCATE Norman Stein CCP4 Daresbury Laboratory Abingdon 18/3/08.
Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.
Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey.
Twinning in protein crystals NCI, Macromolecular Crystallography Laboratory, Synchrotron Radiation Research ANL Title Zbigniew Dauter.
Curve-Fitting Regression
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Pseudo translation and Twinning. Crystal peculiarities Pseudo translation Twin Order-disorder.
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Phasing based on anomalous diffraction Zbigniew Dauter.
Relationships Among Variables
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
Radiation-damage- induced phasing with anomalous scattering Peter Zwart Physical biosciences division Lawrence Berkeley National Laboratories Not long.
The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,
Not retired: Hall symbols + CIF or How Syd influenced my life without me noticing it. Ralf W. Grosse-Kunstleve Computational Crystallography Initiative.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Cab55342 Autobuild model Density-modified map Autobuilding starting with morphed model.
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
First Aid & Pathology Data quality assessment in PHENIX
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Bayesian Analysis and Applications of A Cure Rate Model.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chem Patterson Methods In 1935, Patterson showed that the unknown phase information in the equation for electron density:  (xyz) = 1/V ∑ h ∑ k.
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
Overview of MR in CCP4 II. Roadmap
Structure of Oxalyl-CoA decarboxylase solved from a hemihedrally twinned crystal Supervisor Prof. Ylva Lindqvist Molecular Structural Biology Medical Biochemistry.
1. Diffraction intensity 2. Patterson map Lecture
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
POINTLESS & SCALA Phil Evans. POINTLESS What does it do? 1. Determination of Laue group & space group from unmerged data i. Finds highest symmetry lattice.
Siena Computational Crystallography School 2005
Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart.
3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results 1. Introduction5. Refinement Background mask and plane.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Anomalous Differences Bijvoet differences (hkl) vs (-h-k-l) Dispersive Differences 1 (hkl) vs 2 (hkl) From merged (hkl)’s.
Machine Learning 5. Parametric Methods.
Fourier transform from r to k: Ã(k) =  A(r) e  i k r d 3 r Inverse FT from k to r: A(k) = (2  )  3  Ã(k) e +i k r d 3 k X-rays scatter off the charge.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Crystallography : How do you do? From Diffraction to structure…. Normally one would use a microscope to view very small objects. If we use a light microscope.
Amyloid Precursor Protein (APP)
Stony Brook Integrative Structural Biology Organization
The Crystal Screening Interface at ALS
Break and Noise Variance
Analyzing Redistribution Matrix with Wavelet
Complete automation in CCP4 What do we need and how to achieve it?
Statistical Methods For Engineers
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Nonlinear regression.
Segmentation by fitting a model: robust estimators and RANSAC
Honors Statistics Review Chapters 4 - 5
CCP4 Daresbury Laboratory
The Normal Distribution
The PLATON/TwinRotMat Tool for Twinning Detection
Presentation transcript:

Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart

Computational Crystallography InitiativePhysical Biosciences Division Introduction PHENIX: –Software for bio-molecular crystallography Molecular replacement (PHASER) Substructure solution (SOLVE, HYSS) Phasing (SOLVE; PHASER) Model building (RESOLVE) Refinement (phenix.refine) Ligand building (ELBOW and RESOLVE)

Computational Crystallography InitiativePhysical Biosciences Division Introduction GUI shapshots

Computational Crystallography InitiativePhysical Biosciences Division Introduction Structure solution can be enhanced by the knowledge of the quality of the merged data –Presence of absence of anomalous signal –Completeness –Twinning –Anisotropy –Pseudo centering –… Adapt data solution/refinement strategy or even recollect data

Computational Crystallography InitiativePhysical Biosciences Division Likelihood based Wilson Scaling Both Wilson B and nominal resolution determine the ‘looks’ of the map Zwart & Lamzin (2003). Acta Cryst. D50, B wil : 50 Å 2 ; d min : 2Å B wil : 9 Å 2 ; d min : 2Å

Computational Crystallography InitiativePhysical Biosciences Division Likelihood based Wilson Scaling Data can be anisotropic Traditional ‘straight line fitting’ not reliable at low resolution Solution: Likelihood based Wilson scaling –Similar to maximum likelihood refinement, but with absence of knowledge of positional parameters –Results in estimate of anisotropic overall B value. Zwart, Grosse-Kunstleve & Adams, CCP4 newletter, 2005.

Computational Crystallography InitiativePhysical Biosciences Division Likelihood based Wilson Scaling Likelihood based scaling not extremely sensitive to resolution cut-off, whereas classic straight line fitting is.

Computational Crystallography InitiativePhysical Biosciences Division Likelihood based Wilson Scaling Anisotropy is easily detected and can be ‘corrected’ for. –Useful for molecular replacement and possibly for substructure solution Anisotropy correction cleans up your N(Z) plots

Computational Crystallography InitiativePhysical Biosciences Division Likelihood based Wilson Scaling Useful by products –For the ML Wilson scaling an ‘expected Wilson plot’ is needed Using correction term formalism Zwart & Lamzin (2004) Acta Cryst D60, –Obtained from over 2000 high quality experimental datasets –‘Expected intensity’ and its standard deviation obtained

Computational Crystallography InitiativePhysical Biosciences Division Likelihood based Wilson Scaling Resolution dependent problems can be easily/automatical ly spotted –Ice rings Empirical Wilson plots available for protein and DNA/RNA. Data is from DNA structure

Computational Crystallography InitiativePhysical Biosciences Division Pseudo Translational Symmetry Can cause problems in refinement and MR –Incorrect likelihood function due to effects of extra translational symmetry on intensity Can cause problems or be helpful during MR –Effective ASU is smaller is T-NCS info is used. The presence of pseudo centering can be detected from an analyses of the Patterson map. –A F obs Patterson with truncated resolution should reveal a significant off-origin peak.

Computational Crystallography InitiativePhysical Biosciences Division Pseudo Translational Symmetry A database analyses reveal that the height of the largest off- origin peaks in truncated X-ray data set are distributed according to: Relative peak height Q max F(Q max )

Computational Crystallography InitiativePhysical Biosciences Division Pseudo Translational Symmetry 1-F(Q max ): The probability that the largest off origin peak in your Patterson map is not due to translational NCS; This is a so-called p value If a significance level of 0.01 is set, all off origin Patterson vectors larger than 20% of the height of the origin are suspected T-NCS vectors. PDBIDHeight (%) P-value (%) 1sct779* ihr451* c8u201 1ee2105

Computational Crystallography InitiativePhysical Biosciences Division Twinning Merohedral twinning can occur when the lattice has a higher symmetry than the intensities. When twinning does occur, the recorded intensities are the sum of two independent intensities. –Normal Wilson statistics break down Detect twinning using intensity statistics

Computational Crystallography InitiativePhysical Biosciences Division Twinning Cumulative intensity distribution can be used to identify twinning (acentric data) Pseudo centering Normal Perfect twin Z N(Z)

Computational Crystallography InitiativePhysical Biosciences Division Twinning Pseudo centering + twinning = N(Z) looks normal Anisotropy in diffraction data produces similar trend to Pseudo centering –Anisotropy can however be removed How to detect twinning in presence of T-NCS? –Partition miller indices on basis of detected T-NCS vectors Intensities of subgroups follow normal Wilson statistics (approximately)

Computational Crystallography InitiativePhysical Biosciences Division Twinning ; /N

Computational Crystallography InitiativePhysical Biosciences Division Twinning A data base analyses on highly quality, untwinned datasets reveals that the values of the first and second moment of L follow a narrow distribution This distribution can be used to determine a multivariate Z- score –Large values indicate twinning

Computational Crystallography InitiativePhysical Biosciences Division Twinning Determination of twin laws –From first principles No twin law will be overlooked PDB analyses: 36% of structures has at least 1 possible twin law –50.9% merohedral; 48.2% pseudo merohedral;0.9% both 27% of cases with twin laws is suspected to be twinned –10% of whole PDB(!) Determination of twin fraction –Fully automated Britton and H analyses as well as ML estimate of twin fraction of basis of L statistic.

Computational Crystallography InitiativePhysical Biosciences Division Twinning Conflicting information –Twin law is present lattice has higher symmetry than assumed symmetry of intensities –Estimated twin fraction is close to 0.5 ‘twin’ related intensities are very similar – test does not indicate twinning Very strong NCS Space group too low

Computational Crystallography InitiativePhysical Biosciences Division Twinning Maybe an example of a too low symmetry?

Computational Crystallography InitiativePhysical Biosciences Division Anomalous data Structure solution via experimental methods (especially SAD) is on the rise. How to identify the presence of anomalous signal? – ; VERY sensitive to noise – ; 2? –Measurability Fraction of Bijvoet differences for which –  I/   I >3 and (I + /  I(+) and I(-)/  I(-) > 3) Easy to interpret –At 3 Angstrom 6% of Bijvoet pairs are significantly larger than zero

Computational Crystallography InitiativePhysical Biosciences Division Anomalous data Measurability and are closely related of course Measurability more directly translates to the number of ‘useful’ Bijvoet differences in substructure solution/phasing

Computational Crystallography InitiativePhysical Biosciences Division Anomalous data The quality of the data determines the success of structure solution Redundancy SnB success rate Weiss, (2000). J. App. Cryst, 34, Measurability Obtained via numerical methods

Computational Crystallography InitiativePhysical Biosciences Division Anomalous data The quality of the data determines the success of structure solution Measurability 1/resolution 2 6 (partially occupied) Iodines in thaumatin at =1.5Å. Raw SAD phases, straight after PHASER A B

Computational Crystallography InitiativePhysical Biosciences Division Anomalous data The quality of the data determines the success of structure solution Measurability 1/resolution 2 6 (partially occupied) Iodines in thaumatin at =1.5Å. Density modified phases A B

Computational Crystallography InitiativePhysical Biosciences Division Anomalous data LysOs PHASER maps Ferrodoxin PHASER maps

Computational Crystallography InitiativePhysical Biosciences Division Discussion & Conclusions Software tools are available to point out specific problems –mmtbx.xtriage [params] Log file are not just numbers, but also contains an extensive interpretation of the statistics Knowing the idiosyncrasies of your X-ray data might avoid falling in certain pitfalls. –Undetected twinning for instance

Computational Crystallography InitiativePhysical Biosciences Division Discussion & Conclusions mmtbx.xtriage at the beamline If problem are detected while at the beamline, possible problems could be solved by recollecting data or adpating the data collection strategy. The Surgeon and the Peasant – Lucas van Leyden

Computational Crystallography InitiativePhysical Biosciences Division Discussion & Conclusions mmtbx.xtriage at home The anatomical lesson of dr. Nicolaes Tulp Rembrandt van Rijn.

Computational Crystallography InitiativePhysical Biosciences Division Ackowledgements Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn Cambridge Randy Read Airlie McCoy Laurent Storonoy Los Alamos Tom Terwilliger Li Wei Hung Thirumugan Rhadakanan Texas A&M Univeristy Jim Sachetini Tom Ioerger Eric McKee