First Aid & Pathology Data quality assessment in PHENIX

Slides:



Advertisements
Similar presentations
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Advertisements

Twinning and other pathologies Andrey Lebedev University of York.
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Methods: X-ray Crystallography
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.
Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov.
A Brief Description of the Crystallographic Experiment
Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Twinning in protein crystals NCI, Macromolecular Crystallography Laboratory, Synchrotron Radiation Research ANL Title Zbigniew Dauter.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Pseudo translation and Twinning. Crystal peculiarities Pseudo translation Twin Order-disorder.
In Macromolecular Crystallography Use of anomalous signal in phasing
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Phasing based on anomalous diffraction Zbigniew Dauter.
Relationships Among Variables
Radiation-damage- induced phasing with anomalous scattering Peter Zwart Physical biosciences division Lawrence Berkeley National Laboratories Not long.
The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,
Not retired: Hall symbols + CIF or How Syd influenced my life without me noticing it. Ralf W. Grosse-Kunstleve Computational Crystallography Initiative.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Cab55342 Autobuild model Density-modified map Autobuilding starting with morphed model.
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Patterson Space and Heavy Atom Isomorphous Replacement
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results Gwyndaf Evans 1, Graeme Winter 1, David Waterman 2,
Bayesian Analysis and Applications of A Cure Rate Model.
The ‘phase problem’ in X-ray crystallography What is ‘the problem’? How can we overcome ‘the problem’?
Chem Patterson Methods In 1935, Patterson showed that the unknown phase information in the equation for electron density:  (xyz) = 1/V ∑ h ∑ k.
Overview of MR in CCP4 II. Roadmap
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
1. Diffraction intensity 2. Patterson map Lecture
Spatially Assessing Model Error Using Geographically Weighted Regression Shawn Laffan Geography Dept ANU.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart.
3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results 1. Introduction5. Refinement Background mask and plane.
Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Pattersons The “third space” of crystallography. The “phase problem”
Atomic structure model
Anomalous Differences Bijvoet differences (hkl) vs (-h-k-l) Dispersive Differences 1 (hkl) vs 2 (hkl) From merged (hkl)’s.
X-Ray Diffraction Spring 2011.
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 9 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
Phasing in Macromolecular Crystallography
Fourier transform from r to k: Ã(k) =  A(r) e  i k r d 3 r Inverse FT from k to r: A(k) = (2  )  3  Ã(k) e +i k r d 3 k X-rays scatter off the charge.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Crystallography : How do you do? From Diffraction to structure…. Normally one would use a microscope to view very small objects. If we use a light microscope.
Step 1: Specify a null hypothesis
Stony Brook Integrative Structural Biology Organization
Regression and Correlation
The Crystal Screening Interface at ALS
Complete automation in CCP4 What do we need and how to achieve it?
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
Statistical Methods For Engineers
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CCP4 Daresbury Laboratory
CHAPTER 3 Describing Relationships
Zheng Liu, Fei Guo, Feng Wang, Tian-Cheng Li, Wen Jiang  Structure 
CHAPTER 3 Describing Relationships
The PLATON/TwinRotMat Tool for Twinning Detection
Presentation transcript:

First Aid & Pathology Data quality assessment in PHENIX Peter Zwart

Introduction Structure solution can be enhanced by the knowledge of the quality and idiosyncrasies of the merged data Anomalous signal? Twinning Pseudo centering Data characterization should extend beyond standard quantities as Rmerge and nominal resolution A full characterization of a data set might provide expert systems, such as wizards, useful information on how to most optimally solve a structure

Introduction Xtriage is a program that aims to characterize a merged X-ray dataset Probabilistic unit cell content analyses Likelihood based Wilson scaling Analyses of mean intensity Ice ring detection Outlier analyses Twinning / pseudo centering Anomalous signal

Likelihood based Wilson Scaling Both Wilson B and nominal resolution determine the ‘looks’ of the map Zwart & Lamzin (2003). Acta Cryst. D50, 2104-2113. Bwil : 9 Å2; dmin: 2Å Bwil : 50 Å2; dmin: 2Å

Likelihood based Wilson Scaling Data can be anisotropic Traditional ‘straight-line fitting’ not reliable at low resolution Solution: Likelihood based Wilson scaling Results in estimate of anisotropic overall B value. Zwart, Grosse-Kunstleve & Adams, CCP4 newletter, 2005.

Likelihood based Wilson Scaling Likelihood based scaling not extremely sensitive to resolution cut-off, whereas classic straight line fitting is.

Likelihood based Wilson Scaling Anisotropy is easily detected and can be ‘corrected’ for. Useful for molecular replacement and possibly for substructure solution Anisotropy correction cleans up your N(Z) plots

Likelihood based Wilson Scaling For the ML Wilson scaling an ‘expected Wilson plot’ is needed Obtained from over 2000 high quality experimental datasets ‘Expected intensity’ and its standard deviation can be obtained

Likelihood based Wilson Scaling Resolution dependent problems can be easily/automatically spotted Ice rings Empirical Wilson plots available for protein and DNA/RNA. Data is from DNA structure

Outlier analyses Assume amplitudes are distributed according to Wilson distribution For a dataset of a given size, the cumulative distribution function of the largest |E| values in the dataset can be used to detect outliers

Pseudo Translational Symmetry Can cause problems in refinement and MR Incorrect likelihood function due to effects of extra translational symmetry on intensity Can be helpful during MR Effective ASU is smaller is T-NCS info is used. The presence of pseudo centering can be detected from an analyses of the Patterson map. A Fobs Patterson with truncated resolution should reveal a significant off-origin peak.

Pseudo Translational Symmetry Relative peak height Qmax F(Qmax) A database analyses reveal that the height of the largest off-origin peaks in truncated X-ray data set are distributed according to:

Pseudo Translational Symmetry 1-F(Qmax): The probability that the largest off origin peak in your Patterson map is not due to translational NCS; This is a so-called p value If a significance level of 0.01 is set, all off origin Patterson vectors larger than 20% of the height of the origin are suspected T-NCS vectors. PDBID Height (%) P-value (%) 1sct 77 9*10-6 1ihr 45 1*10-3 1c8u 20 1 1ee2 10 5

Twinning Merohedral twinning can occur when the lattice has a higher symmetry than the intensities. When twinning does occur, the recorded intensities are the sum of two independent intensities. Normal Wilson statistics break down Detect twinning using intensity statistics

Twinning Cumulative intensity distribution can be used to identify twinning (acentric data) Pseudo centering Normal Perfect twin Z N(Z)

Twinning Pseudo centering + twinning = N(Z) looks normal Anisotropy in diffraction data produces similar trend to Pseudo centering Anisotropy can however be removed How to detect twinning in presence of T-NCS? Partition miller indices on basis of detected T-NCS vectors Intensities of subgroups follow normal Wilson statistics (approximately) Use L-test for twin detection Not very sensitive to T-NCS if partitioning of miller indices is done properly. No need to know twin laws: not sensitive to pseudo symmetry or certain data processing problems.

Twinning - + 2 - + 2 +; /N <L>

Twinning A data base analyses on high quality, untwinned datasets reveals that the values of the first and second moment of L follow a narrow distribution This distribution can be used to determine a multivariate Z-score Large values indicate twinning

Twinning Determination of twin laws Determination of twin fraction From first principles No twin law will be overlooked PDB analyses: 36% of structures has at least 1 possible twin law 50.9% merohedral; 48.2% pseudo merohedral;0.9% both 27% of cases with twin laws has intensity statistics that warrant further investigation on whether or not the data is twinned 10% of whole PDB(!) Determination of twin fraction Fully automated Britton and H analyses as well as ML estimate of twin fraction of basis of L statistic.

Conflicting information PDBID: 1??? Unit cell: 99.5 60.9 70.96 90 134.5 90 Space group : C 2 Twin laws and estimated twin fractions: H,-K,-H-L : 0.44 H+2L,-K,-L : 0.01 -H-2L, K, H+L : 0.01 <I2>/<I>2 = 2.10 (theory for untwinned data : 2.0); Data does not appear to be twinned <L> = 0.49 (theory for untwinned data : 0.5); Multivariate Z-score of L test: 0.963

Conflicting information What is going on? Estimated twin fraction is large, but data does not seem to be twinned: Twin law H,-K,-H-L is parallel to an existing NCS axis or Twin law H,-K,-H-L is a symmetry axis, and the space group is too low It should be : C2 + H,-K,-H-L = F222 http://www.phenix-online.org/cctbx Need images to make decision

Conflicting information A DNA example: Space group: P65; 1 twin law Resolution: 1.87A Native Patterson analyses indicates several significant off-origin peaks Intensity statistics indicate pseudo translation symmetry: <I^2>/<I>^2 :4.243 N(Z) plot not very informative

Conflicting information However L test: <L>=0.46; Data might be twinned. Partitioned data might not follow Wilson statistics however. Britton and H analyses estimate of twin fraction is about 40% Wrong spacegroup? Monomer would not fit in ASU Twinning, pseudo symmetry, or both? Not clear from experimental data only, use deposited coordinates Rwork=28%; Rfree=34% Twin fractions via Britton plot From Fcalc: 11% (due to pseudo symmetry only) From Fobs: 41% (pseudo symmetry + twinning) See Lebedev, Vagin, Murshudov (2006) Acta D62, 83-95. Data likely to be twinned. Difficult to spot due to TPS and RPS effects on intensity statistics

Anomalous data Structure solution via experimental methods (especially SAD) is on the rise. Presence of anomalous signal indicated by a quantity called Measurability: Fraction of Bijvoet differences for which DI/sDI>3 and (I+/sI(+) and I(-)/sI(-) > 3) Easy to interpret At 3 Angstrom 6% of Bijvoet pairs are significantly larger than zero

Anomalous data Measurability and <DI/sDI> are closely related Measurability more directly translates to the number of ‘useful’ Bijvoet differences in substructure solution/phasing

Anomalous data 6 (partially occupied) Iodines in thaumatin at l=1.5Å. Raw SAD phases, straight after PHASER A Measurability 1/resolution2 A B B

Anomalous data 6 (partially occupied) Iodines in thaumatin at l=1.5Å. Density modified phases A Measurability 1/resolution2 A B B

Anomalous data SAD phasing with PHASER Very sensitive residual maps Residual map indicates where a certain type of anomalous scatterers need to be placed to improve fit between observed and expected F(+) and F(-) Lysozyme soaked with solution containing (NH4)2(OsCl6) Wilson B: 13.7; dmin=1.7 Data collected at Os L-III edge (f”>10) Measurability at 3.0 is 67% Anomalous signal is strong Partial structure is large Zheavy2/(Zheavy2+Zprotein2)=35% PHASER residual map indicating location of main chain atoms

Anomalous data SAD phasing with PHASER Very sensitive residual maps Residual map indicates where a certain type of anomalous scatterers need to be placed to improve fit between observed and expected F(+) and F(-) Lysozyme soaked with solution containing (NH4)2(OsCl6) Wilson B: 13.7; dmin=1.7 Data collected at Os L-III edge (f”>10) Measurability at 3.0 is 67% Anomalous signal is strong Partial structure is large Zheavy2/(Zheavy2+Zprotein2)=35% Raw PHASER SAD phases

Anomalous data Another extreme 2 Fe4S4 clusters in 60 residues Wilson B: 6.5Å2; dmin=1.2Å Measurability at 3.0Å: 6% Data not terribly strong ZFe2/(ZFe2+ZS2+Zprotein2)=17% Fe f ”=1.25 e; S f ”=0.35 e PHASER residual map from Fe SAD phases clearly show S positions SAD on Fe, residual maps indicate S positions (green balls)

Anomalous data Inclusion of Sulfurs improves phasing (ZFe2+ZS2)/(ZFe2+ZS2+Zprotein2)=32% <FOM>=0.67 (was 0.53) Residual maps show almost all non-hydrogen atoms Inclusion of non hydrogen atoms results in <FOM>=0.98. SAD on Fe, S. Residual maps (purple) and FOM weighted Fobs map (blue).

Discussion & Conclusions Software tools are available to point out specific problems mmtbx.xtriage <input_reflection_file> [params] Log file are not just numbers, but also contains an extensive interpretation of the statistics Knowing the idiosyncrasies of your X-ray data might avoid falling in certain pitfalls. Undetected twinning for instance

First Aid Analyses at the beamline If problem are detected while at the beam line, possible problems could be solved by recollecting data or adapting the data collection strategy. The Surgeon and the Peasant – 1524. Lucas van Leyden

Pathology/Autopsy Analyses at home The anatomical lesson of dr. Nicolaes Tulp - 1632. Rembrandt van Rijn.

Ackowledgements Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Cambridge Randy Read Airlie McCoy Laurent Storoni Los Alamos Tom Terwilliger Li Wei Hung Thirumugan Rhadakanan Texas A&M Univeristy Jim Sacchettini Tom Ioerger Eric McKee Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn Funding: LBNL (DE-AC03-76SF00098) NIH/NIGMS (P01GM063210) PHENIX Industrial Consortium

W W W Phenix Xtriage tutorials CCTBX www.phenix-online.org www.phenix-online.org/tutorials CCTBX cctbx.sf.net