Phil Evans MRC Laboratory of Molecular Biology Cambridge

Slides:



Advertisements
Similar presentations
More on symmetry Learning Outcomes:
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Structural Equation Modeling
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Twinning and other pathologies Andrey Lebedev University of York.
VI. Reciprocal lattice 6-1. Definition of reciprocal lattice from a lattice with periodicities in real space Remind what we have learned in chapter.
MRC Laboratory of Molecular Biology Cambridge UK
Indexing cubic powder patterns
Differentially expressed genes
Things to do in XPREP Higher metric symmery search Space grup determination Judging the quality of the data High reolution cutoff Anomalous scattering.
CRYSTALLOGRAPHY TRIVIA FINAL ROUND!. Round 3 – Question 1 Twins are said to add another level of symmetry to a crystal. Why is this?
The goal of Data Reduction From a series of diffraction images (films), obtain a file containing the intensity ( I ) and standard deviation (  ( I ))
The Effects of Symmetry in Real and Reciprocal Space Sven Hovmöller, Stockholm Univertsity Mirror symmetry 4-fold symmetry.
I. Structural Aspects Space GroupsFranzen, pp Combine Translational and Rotational symmetry operations  230 Space Groups Both types must be compatible.
Introduction to Crystallography
Objectives of Multiple Regression
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
CCP4 school at APS, June 2011 Diffraction Data Processing with iMOSFLM, POINTLESS and SCALA Andrew GW Leslie, MRC LMB, Cambridge.
4.3 Diagnostic Checks VO Verallgemeinerte lineare Regressionsmodelle.
1. Diffraction intensity 2. Patterson map Lecture
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Lecture 12 Crystallography
Lesson 12—Working with Space Groups
POINTLESS & SCALA Phil Evans. POINTLESS What does it do? 1. Determination of Laue group & space group from unmerged data i. Finds highest symmetry lattice.
Lesson 13 How the reciprocal cell appears in reciprocal space. How the non-translational symmetry elements appear in real space How translational symmetry.
Lesson 13 How the reciprocal cell appears in reciprocal space. How the non-translational symmetry elements appear in real space How translational symmetry.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
Point Groups (Crystal Classes)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Crystal Structure and Crystallography of Materials
Crystal Systems and Space Groups Paul D
Step 1: Specify a null hypothesis
Regression Analysis AGEC 784.
Selecting the Best Measure for Your Study
ASEN 5070: Statistical Orbit Determination I Fall 2014
General Linear Model & Classical Inference
ASEN 5070: Statistical Orbit Determination I Fall 2015
Miller indices/crystal forms/space groups
c Symmetry b  a   a b The unit cell in three dimensions.
Crystal Structure and Crystallography of Materials
Subject Name: File Structures
Methods in Chemistry III – Part 1 Modul M. Che
Symmetry, Groups and Crystal Structures
Clustering Evaluation The EM Algorithm
A special case of calibration
CSE 4705 Artificial Intelligence
Diagnostics and Transformation for SLR
Crystals Crystal consist of the periodic arrangement of building blocks Each building block, called a basis, is an atom, a molecule, or a group of atoms.
CHAPTER 26: Inference for Regression
Symmetry, Groups and Crystal Structures
X-ray Neutron Electron
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Inferential Statistics
Regression Forecasting and Model Building
Axis of Rotation Crystal Structure. Axis of Rotation Crystal Structure.
MILLER PLANES Atoms form periodically arranged planes Any set of planes is characterized by: (1) their orientation in the crystal (hkl) – Miller indices.
Introduction to Sensor Interpretation
MODULE 2 - Introduction to Basic Crystallography
Diagnostics and Transformation for SLR
Crystallography.
William Hallowes Miller
Introduction to Sensor Interpretation
Calibration and homographies
Space Groups.
Testing Causal Hypotheses
Presentation transcript:

Phil Evans MRC Laboratory of Molecular Biology Cambridge • Developments in Scala: Two approaches to correlation analysis • New program Pointless

Correlation analysis The problem: how to measure correlation between two measurements, eg two observations of (I+ - I-) from random half datasets? No signal Scatter plots of DI1 v. DI2 give a clear indication of whether there is a correlation (ie a signal), but what is the best number to look at? Signal

Traditional measure is the correlation coefficient which equivalent to fitting a straight line through the data points. But this is dominated by outliers, which have a large leverage, and can be misleading or confusing when the correlation is poor. We know that if there is a signal, the two measurements should be the same, so we should fit a straight line with slope = 1 Along this diagonal line, the distribution is the “signal” Perpendicular to line is “error” “Unitary Correlation Ratio” UCR = RMS(“signal”)/RMS(“error”) = RMS(I+ - I-)/RMS(I+ + I-) CC -0.08 UCR 0.91 No signal CC +0.61 UCR 2.00 Signal

Now output from Scala Another way of looking to see if there is an anomalous signal

Scoring for Laue group determination Another approach to correlation scoring We want a scoring system which is robust to limited data (eg from a few preliminary images 90° apart), when there may be very few reflections which sample a symmetry element. Accidentally good agreement between a few observations should not give a high score. We need to estimate the significance of the score. One solution: score all pairs of observations related by a potential symmetry element, and compare these with pairs which cannot be related by any symmetry element (but are matched in resolution). This works even for small samples (and any score function). Example: if we have 10 potentially related pairs of observations, we can take the (many) unrelated pairs in random groups of 10, generate the scores for each group of 10, and calculate the mean score and its standard deviation. Then the Z-score = number of standard deviations from mean. Z = (Score(related) - Mean(Score(unrelated)))/s(Score(unrelated)) This gives a Z-score which is self-adjusting for unknown errors and for small samples.

Some possible score functions 1. Correlation coefficient Relatively insensitive to unknown scale. Use normalised intensities (E2) to avoid artificial correlation due to change in <I> with resolution 2. Difference functions Sensitive to unknown scale, often less discriminating than correlation coefficient (in tests so far) - S (I1 - I2)2 / [s2(I1) + s2(I2)] Related to log(probability) if s2(I) reflects only random errors We do not know the scales as we can only determine the scales when we know the Laue group!

Pointless: a program for determining Laue groups work in progress! From the unit cell dimensions, find the highest compatible lattice symmetry Score each symmetry element belonging to lattice symmetry using all pairs of observations related by that element Score combinations of symmetry elements for all possible sub-groups (Laue groups) of lattice symmetry group. Net Z-score for each possible Laue group is Net Z = Z(for) - Z(against) = Z+ - Z- Z(for) score for all symmetry elements belonging to subgroup Z(against) score for all symmetry elements belonging to the lattice group but not the subgroup

Examples: Orthorhombic with a  b Cell: 44.67 46.10 117.89 90.00 90.00 90.00 Tested in lattice pointgroup P422 (P4/mmm), but 4-fold is absent Scores for images 1-5 only (5°) Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) > 1 1.51 0.48 1.24 22 -17.13 0.262 2-fold l ( 0 0 1) {-h,-k,+l} > 2 2.85 0.73 1.63 33 -18.17 0.313 2-fold h ( 1 0 0) {+h,-k,-l} 3 -1.02 -0.13 -1.07 45 -39.47 1.179 2-fold ( 1 1 0) {+k,+h,-l} > 4 -1.36 -0.76 -2.64 4 -80.58 0.828 2-fold k ( 0 1 0) {-h,+k,-l} 5 -1.10 -0.15 1.94 37 -16.41 0.783 2-fold ( 1-1 0) {-k,-h,-l} 6 -0.68 -0.01 -1.74 72 -41.05 1.150 4-fold l ( 0 0 1) {-k,+h,+l} Correct Laue group Pmmm found despite limited data for dyad around k (pointgroup P222) Laue Group NetZcc Zcc+ Zcc- CC NetZrms Zrms+ Zrms- Rmerge ReindexOperator > 1 P m m m * 3.79 3.01 -0.78 0.57 1.86 1.00 -0.85 0.39 [h,k,l] 2 P 1 2/m 1 * 3.43 2.85 -0.58 0.73 2.46 1.63 -0.82 0.31 [l,h,k] 3 P 1 2/m 1 1.63 1.51 -0.12 0.48 1.89 1.24 -0.65 0.26 [k,l,h] 4 P 4/m 0.24 0.28 0.05 0.09 -1.32 -1.02 0.30 0.91 [h,k,l] 5 P 4/m m m 0.04 0.04 0.00 0.05 -0.31 -0.31 0.00 0.81 [h,k,l] 6 P -1 -0.04 0.00 0.04 0.00 0.31 0.00 -0.31 0.00 [h,k,l] 7 C 1 2/m 1 -1.26 -1.10 0.16 -0.15 2.88 1.94 -0.94 0.78 [h+k,-h+k,l] 8 P 1 2/m 1 -1.44 -1.36 0.08 -0.76 -2.71 -2.64 0.07 0.83 [h,k,l] 9 C m m m -1.75 -0.65 1.10 -0.05 2.22 0.90 -1.32 0.82 [h+k,-h+k,l] 10 C 1 2/m 1 -1.88 -1.02 0.86 -0.13 -1.17 -1.07 0.10 1.18 [h-k,h+k,l]

Discrimination between Laue groups is clearer with a full dataset: the monoclinic possibilities (P2/m) have a greater score against them (Zcc-) than the correct Pmmm orthorhombic solution Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) 1 10.99 0.73 5.17 24337 -19.19 0.374 *** 2-fold l ( 0 0 1) {-h,-k,+l} 2 11.38 0.75 6.04 33259 -15.91 0.322 *** 2-fold h ( 1 0 0) {+h,-k,-l} 3 2.23 0.14 1.41 26701 -33.38 0.719 2-fold ( 1 1 0) {+k,+h,-l} 4 11.02 0.73 5.67 19199 -17.33 0.354 *** 2-fold k ( 0 1 0) {-h,+k,-l} 5 0.93 0.05 1.43 28477 -33.29 0.794 2-fold ( 1-1 0) {-k,-h,-l} 6 3.72 0.24 1.07 60928 -34.64 0.784 * 4-fold l ( 0 0 1) {-k,+h,+l} {+k,-h,+l} Laue Group NetZcc Zcc+ Zcc- CC NetZrms Zrms+ Zrms- Rmerge ReindexOperator 1 P m m m *** 8.15 11.12 2.97 0.73 4.44 5.65 1.22 0.35 [h,k,l] 2 P 1 2/m 1 ** 6.44 11.38 4.94 0.75 3.96 6.04 2.08 0.32 [l,h,k] 3 P 1 2/m 1 ** 6.03 10.99 4.96 0.73 2.89 5.17 2.28 0.37 [k,l,h] 4 P 1 2/m 1 ** 5.86 11.02 5.17 0.73 3.34 5.67 2.33 0.35 [h,k,l] 5 P 4/m m m ** 5.70 5.70 0.00 0.37 2.58 2.58 0.00 0.59 [h,k,l] 6 P 4/m -1.02 5.28 6.30 0.34 -1.34 1.91 3.25 0.66 [h,k,l] 7 C m m m -1.16 4.97 6.12 0.32 -0.28 2.41 2.69 0.60 [h+k,-h+k,l] 8 C 1 2/m 1 -3.85 2.23 6.08 0.14 -1.36 1.41 2.76 0.72 [h-k,h+k,l] 9 C 1 2/m 1 -5.44 0.93 6.36 0.05 -1.34 1.43 2.77 0.79 [h+k,-h+k,l] 10 P -1 -5.70 0.00 5.70 0.00 -2.58 0.00 2.58 0.00 [h,k,l] >>> 1 P m m m [h,k,l] Transformed cell: 44.7 46.1 117.9 90.0 90.0 90.0 deviation 0.00 List of possible spacegroups: <P 2 2 2> <P 2 2 21> <P 21 2 2> <P 2 21 2> <P 21 21 2> <P 2 21 21> <P 21 2 21> <P 21 21 21>

A confusing case in C222: Unit cell 74.72 129.22 184.25 90 90 90 This has b  √3 a so can also be indexed on a hexagonal lattice, lattice point group P622 (P6/mmm), with the reindex operator: h/2+k/2, h/2-k/2, -l Conversely, a hexagonal lattice may be indexed as C222 in three distinct ways, so there is a 2 in 3 chance of the indexing program choosing the wrong one

A hexagonal lattice may be indexed as C222 in three distinct ways, so there is a 2 in 3 chance of the indexing program choosing the wrong one Hexagonal axes (black) Three alternative C-centred orthorhombic Lattices (coloured)

The scores show that it is indeed orthorhombic C222 not hexagonal, and picks the correct indexing scheme Two-fold axes along h(1,0,0), l(0,0,1) and (-1 2 0) in the hexagonal lattice correspond to three orthogonal axes. Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) > 1 10.22 0.70 2.89 21018 -10.23 0.225 *** 2-fold l ( 0 0 1) {-h,-k,+l} 2 -0.52 -0.03 1.05 26383 -19.52 0.570 2-fold ( 1-1 0) {-k,-h,-l} 3 0.11 0.02 0.78 13746 -20.89 0.632 2-fold ( 2-1 0) {+h,-h-k,-l} > 4 11.37 0.78 3.66 5303 -6.32 0.164 *** 2-fold h ( 1 0 0) {+h+k,-k,-l} 5 -0.83 -0.05 0.44 24881 -22.65 0.619 2-fold ( 1 1 0) {+k,+h,-l} 6 0.22 0.02 0.54 20745 -22.10 0.646 2-fold k ( 0 1 0) {-h,+h+k,-l} > 7 11.60 0.79 4.02 16554 -4.50 0.151 *** 2-fold (-1 2 0) {-h-k,+k,-l} 8 -0.03 0.01 0.18 23384 -23.96 0.626 3-fold l ( 0 0 1) {-h-k,+h,+l} 9 0.70 0.06 0.30 35908 -23.33 0.644 6-fold l ( 0 0 1) {-k,+h+k,+l} Laue Group NetZcc Zcc+ Zcc- CC Rmerge ReindexOperator 1 C m m m *** 10.94 10.97 0.03 0.75 0.19 [3/2*h+1/2*k,1/2*h-1/2*k,-l] 6 C m m m 2.48 4.68 2.21 0.33 0.47 [1/2*h+1/2*k,3/2*h-1/2*k,-l] 9 C m m m -0.71 2.42 3.13 0.17 0.48 [h,k,l] 5 P 6/m m m 2.86 2.86 0.00 0.20 0.51 [1/2*h+1/2*k,1/2*h-1/2*k,-l]

Pseudo-symmetry: monoclinic pseudo-tetragonal The program will detect symmetry, crystallographic or approximately crystallographic Cell 120.57 54.97 120.57 90° 92.61° 90° Tested as tetragonal symmetry P422 (P4/mmm) Strong 4-fold BUT angle 92.6° is too far from 90° Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) 1 9.52 0.78 2.99 20815 -3.64 0.386 *** 2-fold l ( 0 0 1) {-h,-k,+l} 2 5.47 0.45 2.50 21677 -4.73 0.593 ** 2-fold h ( 1 0 0) {+h,-k,-l} 3 8.47 0.70 3.00 26150 -3.61 0.513 *** 2-fold ( 1 1 0) {+k,+h,-l} 4 5.82 0.48 2.38 24834 -4.98 0.579 ** 2-fold k ( 0 1 0) {-h,+k,-l} 5 8.77 0.72 3.03 20915 -3.56 0.474 *** 2-fold ( 1-1 0) {-k,-h,-l} 6 7.62 0.63 2.39 46893 -4.96 0.578 *** 4-fold l ( 0 0 1) {-k,+h,+l} Laue Group NetZcc Zcc+ Zcc- CC NetZrms Zrms+ Zrms- Rmerge ReindexOperator 1 P 4/m m m *** 7.67 7.67 0.00 0.63 2.60 2.60 0.00 0.53 [l,h,k] 2 P 1 2/m 1 2.10 9.52 7.42 0.78 0.44 2.99 2.56 0.39 [h,k,l] This shows the usefulness of separate scores for each symmetry element

Future developments Score deviations from cell dimension constraints (eg are angles really 90°?) and combine this score with symmetry score (how?) Examine intensity statistics to find centric zones, to provide an additional score, and to distinguish pointgroups in non-chiral spacegroups (not for macromolecules) Examine systematic absences to score potential screw axes (& glide planes) Compare data with reference dataset to choose between alternative valid but non-equivalent indexing schemes … Add scaling & merging to replace Scala (eventually)

Current state Working & available Linux executable on MRCLMB ftp site Source code from me – depends on cctbx & clipper Not in 6.0 release

Internals of Pointless • All in C++ • Symmetry handled using cctbx libraries (Ralf Grosse-Kunstleve et al.). These are wrapped in classes to hide the cctbx data-types • Reflection data handling (hkl_unmerge class) is inspired by Clipper, but has different internal organisation to the Clipper reflection lists • Some Clipper classes used (eg 3x3 matrix & 3-vector classes) • All memory allocation done with STL classes (mostly std::vector) (or almost all)

hkl_unmerge class Organisation reflects unmerged MTZ file structure hkl_unmerge_list datasets batches vector of reflections vector of observation_parts (the raw data) vector of pointers to parts (for sorting) (almost) the only pointers used reflection class vector of observations observation class vector of pointers, ** observation_parts

Not a general class: the items stored for each observation_part are fixed: I don’t know how to make it more general without being too complicated observation_part(const Hkl& hkl_in, const int& isym_in, const int& batch_in, const Rtype& I_in, const Rtype& sigI_in, const Rtype& Xdet_in, const Rtype& Ydet_in, const Rtype& phi_in, const Rtype& time_in, const Rtype& fraction_calc_in, const Rtype& width_in, const Rtype& LP_in, const Rtype& BgPkRatio_in, const int& Npart_in, const int& Ipart_in, const int& RefFlag_in);