Download presentation
Presentation is loading. Please wait.
Published byScot Myron Golden Modified over 6 years ago
1
Phil Evans MRC Laboratory of Molecular Biology Cambridge • Developments in Scala: Two approaches to correlation analysis • New program Pointless
2
Correlation analysis The problem: how to measure correlation between two measurements, eg two observations of (I+ - I-) from random half datasets? No signal Scatter plots of DI1 v. DI2 give a clear indication of whether there is a correlation (ie a signal), but what is the best number to look at? Signal
3
Traditional measure is the correlation coefficient which equivalent to fitting a straight line through the data points. But this is dominated by outliers, which have a large leverage, and can be misleading or confusing when the correlation is poor. We know that if there is a signal, the two measurements should be the same, so we should fit a straight line with slope = 1 Along this diagonal line, the distribution is the “signal” Perpendicular to line is “error” “Unitary Correlation Ratio” UCR = RMS(“signal”)/RMS(“error”) = RMS(I+ - I-)/RMS(I+ + I-) CC -0.08 UCR 0.91 No signal CC +0.61 UCR 2.00 Signal
4
Now output from Scala Another way of looking to see if there is an anomalous signal
5
Scoring for Laue group determination
Another approach to correlation scoring We want a scoring system which is robust to limited data (eg from a few preliminary images 90° apart), when there may be very few reflections which sample a symmetry element. Accidentally good agreement between a few observations should not give a high score. We need to estimate the significance of the score. One solution: score all pairs of observations related by a potential symmetry element, and compare these with pairs which cannot be related by any symmetry element (but are matched in resolution). This works even for small samples (and any score function). Example: if we have 10 potentially related pairs of observations, we can take the (many) unrelated pairs in random groups of 10, generate the scores for each group of 10, and calculate the mean score and its standard deviation. Then the Z-score = number of standard deviations from mean. Z = (Score(related) - Mean(Score(unrelated)))/s(Score(unrelated)) This gives a Z-score which is self-adjusting for unknown errors and for small samples.
6
Some possible score functions
1. Correlation coefficient Relatively insensitive to unknown scale. Use normalised intensities (E2) to avoid artificial correlation due to change in <I> with resolution 2. Difference functions Sensitive to unknown scale, often less discriminating than correlation coefficient (in tests so far) - S (I1 - I2)2 / [s2(I1) + s2(I2)] Related to log(probability) if s2(I) reflects only random errors We do not know the scales as we can only determine the scales when we know the Laue group!
7
Pointless: a program for determining Laue groups
work in progress! From the unit cell dimensions, find the highest compatible lattice symmetry Score each symmetry element belonging to lattice symmetry using all pairs of observations related by that element Score combinations of symmetry elements for all possible sub-groups (Laue groups) of lattice symmetry group. Net Z-score for each possible Laue group is Net Z = Z(for) Z(against) = Z+ - Z- Z(for) score for all symmetry elements belonging to subgroup Z(against) score for all symmetry elements belonging to the lattice group but not the subgroup
8
Examples: Orthorhombic with a b Cell: Tested in lattice pointgroup P422 (P4/mmm), but 4-fold is absent Scores for images 1-5 only (5°) Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) > fold l ( 0 0 1) {-h,-k,+l} > fold h ( 1 0 0) {+h,-k,-l} fold ( 1 1 0) {+k,+h,-l} > fold k ( 0 1 0) {-h,+k,-l} fold ( 1-1 0) {-k,-h,-l} fold l ( 0 0 1) {-k,+h,+l} Correct Laue group Pmmm found despite limited data for dyad around k (pointgroup P222) Laue Group NetZcc Zcc+ Zcc- CC NetZrms Zrms+ Zrms- Rmerge ReindexOperator > P m m m * [h,k,l] 2 P 1 2/m 1 * [l,h,k] 3 P 1 2/m [k,l,h] P 4/m [h,k,l] 5 P 4/m m m [h,k,l] P [h,k,l] 7 C 1 2/m [h+k,-h+k,l] 8 P 1 2/m [h,k,l] 9 C m m m [h+k,-h+k,l] 10 C 1 2/m [h-k,h+k,l]
9
Discrimination between Laue groups is clearer with a full dataset: the monoclinic possibilities (P2/m) have a greater score against them (Zcc-) than the correct Pmmm orthorhombic solution Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) *** 2-fold l ( 0 0 1) {-h,-k,+l} *** 2-fold h ( 1 0 0) {+h,-k,-l} fold ( 1 1 0) {+k,+h,-l} *** 2-fold k ( 0 1 0) {-h,+k,-l} fold ( 1-1 0) {-k,-h,-l} * 4-fold l ( 0 0 1) {-k,+h,+l} {+k,-h,+l} Laue Group NetZcc Zcc+ Zcc- CC NetZrms Zrms+ Zrms- Rmerge ReindexOperator 1 P m m m *** [h,k,l] 2 P 1 2/m 1 ** [l,h,k] 3 P 1 2/m 1 ** [k,l,h] 4 P 1 2/m 1 ** [h,k,l] 5 P 4/m m m ** [h,k,l] P 4/m [h,k,l] 7 C m m m [h+k,-h+k,l] 8 C 1 2/m [h-k,h+k,l] 9 C 1 2/m [h+k,-h+k,l] P [h,k,l] >>> P m m m [h,k,l] Transformed cell: deviation List of possible spacegroups: <P 2 2 2> <P > <P > <P > <P > <P > <P > <P >
10
A confusing case in C222: Unit cell This has b √3 a so can also be indexed on a hexagonal lattice, lattice point group P622 (P6/mmm), with the reindex operator: h/2+k/2, h/2-k/2, -l Conversely, a hexagonal lattice may be indexed as C222 in three distinct ways, so there is a 2 in 3 chance of the indexing program choosing the wrong one
11
A hexagonal lattice may be indexed as C222 in three distinct ways, so there is a 2 in 3 chance of the indexing program choosing the wrong one Hexagonal axes (black) Three alternative C-centred orthorhombic Lattices (coloured)
12
The scores show that it is indeed orthorhombic C222 not hexagonal, and picks the correct indexing scheme Two-fold axes along h(1,0,0), l(0,0,1) and (-1 2 0) in the hexagonal lattice correspond to three orthogonal axes. Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) > *** 2-fold l ( 0 0 1) {-h,-k,+l} fold ( 1-1 0) {-k,-h,-l} fold ( 2-1 0) {+h,-h-k,-l} > *** 2-fold h ( 1 0 0) {+h+k,-k,-l} fold ( 1 1 0) {+k,+h,-l} fold k ( 0 1 0) {-h,+h+k,-l} > *** 2-fold (-1 2 0) {-h-k,+k,-l} fold l ( 0 0 1) {-h-k,+h,+l} fold l ( 0 0 1) {-k,+h+k,+l} Laue Group NetZcc Zcc+ Zcc- CC Rmerge ReindexOperator 1 C m m m *** [3/2*h+1/2*k,1/2*h-1/2*k,-l] 6 C m m m [1/2*h+1/2*k,3/2*h-1/2*k,-l] 9 C m m m [h,k,l] 5 P 6/m m m [1/2*h+1/2*k,1/2*h-1/2*k,-l]
13
Pseudo-symmetry: monoclinic pseudo-tetragonal
The program will detect symmetry, crystallographic or approximately crystallographic Cell ° ° 90° Tested as tetragonal symmetry P422 (P4/mmm) Strong 4-fold BUT angle 92.6° is too far from 90° Nelmt Z-cc CC Z-rms N rmsD Rmerge Symmetry & operator (in Lattice Cell) *** 2-fold l ( 0 0 1) {-h,-k,+l} ** 2-fold h ( 1 0 0) {+h,-k,-l} *** 2-fold ( 1 1 0) {+k,+h,-l} ** 2-fold k ( 0 1 0) {-h,+k,-l} *** 2-fold ( 1-1 0) {-k,-h,-l} *** 4-fold l ( 0 0 1) {-k,+h,+l} Laue Group NetZcc Zcc+ Zcc- CC NetZrms Zrms+ Zrms- Rmerge ReindexOperator 1 P 4/m m m *** [l,h,k] 2 P 1 2/m [h,k,l] This shows the usefulness of separate scores for each symmetry element
14
Future developments Score deviations from cell dimension constraints (eg are angles really 90°?) and combine this score with symmetry score (how?) Examine intensity statistics to find centric zones, to provide an additional score, and to distinguish pointgroups in non-chiral spacegroups (not for macromolecules) Examine systematic absences to score potential screw axes (& glide planes) Compare data with reference dataset to choose between alternative valid but non-equivalent indexing schemes … Add scaling & merging to replace Scala (eventually)
15
Current state Working & available Linux executable on MRCLMB ftp site Source code from me – depends on cctbx & clipper Not in 6.0 release
16
Internals of Pointless
• All in C++ • Symmetry handled using cctbx libraries (Ralf Grosse-Kunstleve et al.). These are wrapped in classes to hide the cctbx data-types • Reflection data handling (hkl_unmerge class) is inspired by Clipper, but has different internal organisation to the Clipper reflection lists • Some Clipper classes used (eg 3x3 matrix & 3-vector classes) • All memory allocation done with STL classes (mostly std::vector) (or almost all)
17
hkl_unmerge class Organisation reflects unmerged MTZ file structure
hkl_unmerge_list datasets batches vector of reflections vector of observation_parts (the raw data) vector of pointers to parts (for sorting) (almost) the only pointers used reflection class vector of observations observation class vector of pointers, ** observation_parts
18
Not a general class: the items stored for each observation_part are fixed: I don’t know how to make it more general without being too complicated observation_part(const Hkl& hkl_in, const int& isym_in, const int& batch_in, const Rtype& I_in, const Rtype& sigI_in, const Rtype& Xdet_in, const Rtype& Ydet_in, const Rtype& phi_in, const Rtype& time_in, const Rtype& fraction_calc_in, const Rtype& width_in, const Rtype& LP_in, const Rtype& BgPkRatio_in, const int& Npart_in, const int& Ipart_in, const int& RefFlag_in);
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.