Istituto di Cristallografia, CNR,

Slides:



Advertisements
Similar presentations
PLATON, New Options Ton Spek, National Single Crystal Structure Facility, Utrecht, The Netherlands. Delft, Sept. 18, 2006.
Advertisements

Phasing Goal is to calculate phases using isomorphous and anomalous differences from PCMBS and GdCl3 derivatives --MIRAS. How many phasing triangles will.
Erice 2011 Electron Crystallography – June Introduction to Sir2011 Giovanni Luca Cascarano CNR- Istituto di Cristallografia – Bari - Italy.
Chapter 16 Introduction to Nonparametric Statistics
The General Linear Model. The Simple Linear Model Linear Regression.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
Hanging Drop Sitting Drop Microdialysis Crystallization Screening.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Chapter 17 Analysis of Variance
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Chapter 5 Continuous Random Variables and Probability Distributions
Chapter 4 Continuous Random Variables and Probability Distributions
 We cannot use a two-sample t-test for paired data because paired data come from samples that are not independently chosen. If we know the data are paired,
Curve Modeling Bézier Curves
Component Reliability Analysis
Part IV Significantly Different: Using Inferential Statistics
Crystal structure solution via PED data: the BEA algorithm Carmelo Giacovazzo Università di Bari Istituto di Cristallografia, IC-CNR
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
Chem Patterson Methods In 1935, Patterson showed that the unknown phase information in the equation for electron density:  (xyz) = 1/V ∑ h ∑ k.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
1 Two Functions of Two Random Variables In the spirit of the previous lecture, let us look at an immediate generalization: Suppose X and Y are two random.
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
Binomial Distribution Derivation of the Estimating Formula for u an d ESTIMATING u AND d.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
1. Diffraction intensity 2. Patterson map Lecture
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Basic Business Statistics
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.
Univariate Gaussian Case (Cont.)
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 6-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Conditional Expectation
Virtual University of Pakistan
And distribution of sample means
Univariate Gaussian Case (Cont.)
The simple linear regression model and parameter estimation
Chapter 4: Basic Estimation Techniques
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 4 Basic Estimation Techniques
Net Present Value and Other Investment Rules
Basic Estimation Techniques
Solving Crystal Structures
Chapter 3 Component Reliability Analysis of Structures.
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
Introduction to Isomorphous Replacement and Anomalous Scattering Methods Measure native intensities Prepare isomorphous heavy atom derivatives Measure.
Basic Estimation Techniques
Chapter 9 Hypothesis Testing.
Chapter 11 Analysis of Variance
Solution of Equations by Iteration
Elementary Statistics
Chapter 25: Paired Samples and Blocks
Chapter 4, Regression Diagnostics Detection of Model Violation
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Lecture Slides Elementary Statistics Twelfth Edition
9. Two Functions of Two Random Variables
12. Principles of Parameter Estimation
Chapter 15 Analysis of Variance
The Normal Distribution
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
Presentation transcript:

Istituto di Cristallografia, CNR, workshop Facets of Electron Crystallography Berlin 7-9 july 2010 Direct Methods Carmelo Giacovazzo Istituto di Cristallografia, CNR, Bari University, Italy carmelo.giacovazzo@ic.cnr.it

Let us answer the following questions: crystal structure  Let us answer the following questions: crystal structure  ?  crystal structure ? As a consequence :

A third question: structure Xo rj’ rj

A fourth basic question How can we derive the phases from the diffraction moduli ? This seems contradictory: indeed The phase values depend on the origin chosen by the user, the moduli are independent of the user . The moduli are structure invariants , the phases are not structure invariants. Evidently, from the moduli we can derive information only on those combinations of phases ( if they exist) which are structure invariants.

The simplest invariant : the triplet invariant Use the relation F’h = Fh exp ( -2ihX0) to check that the invariant Fh FkF-h-k does not depend on the origin. The sum (h +  k+ -h-k ) is called triplet phase invariant .

Structure invariants Any invariant satisfies the condition that the sum of the indices is zero: doublet invariant : Fh F-h = | Fh|2 triplet invariant : Fh Fk F-h-k quartet invariant :Fh Fk Fl F-h-k-l quintet invariant : Fh Fk Fl Fm F-h-k-l-m ……………….

The prior information we can use for deriving the phase estimates may be so summarised: 1) atomicity: the electron density is concentrated in atoms: 2) positivity of the electron density: ( r ) > 0  f > 0 3) uniform distribution of the atoms in the unit cell. r1 r2 a1 a2

The Wilson statistics Under the above conditions Wilson ( 1942,1949) derived the structure factor statistics. The main results where: (1) Eq.(1) is : a) resolution dependent (fj varies with θ ), b) temperature dependent: From eq.(1) the concept of normalized structure factor arises:

The Wilson Statistics |E|-distributions: and in both the cases. The statistics may be used to evaluate the average themel factor and the absolute scale factor.

A s2 Fh0 A y x

The Cochran formula h,k =h + k + -h-k = h + k - h+k P(hk) [2 I0]-1exp(G cos hk) where G = 2 | Eh Ek Eh+k |/N1/2 Accordingly: h + k - h+k  0  G = 2 | Eh Ek Eh+k |/N1/2 h - k - h-k  0  G = 2 | Eh Ek Eh-k |/N1/2 h  k - h-k  G = 2 | Eh Ek Eh-k |/N1/2

The tangent formula A reflection can enter into several triplets The tangent formula A reflection can enter into several triplets.Accordingly h  k1 + h-k1 = 1 with P1(h)  G1 = 2| Eh Ek1 Eh-k1 |/N1/2 h  k2 + h-k2 = 2 with P2(h)  G2 = 2| Eh Ek2 Eh-k2 |/N1/2 ………………………………………………………………………………………………………. h  kn + h-kn = n with Pn(h)  Gn = 2| Eh Ekn Eh-kn |/N1/2 Then P(h)  j Pj(h)  L-1 j exp [Gj cos (h - j )] = L-1 exp [ cos (h - h )] where

A geometric interpretation of 

The random starting approach To apply the tangent formula we need to know one or more pairs ( k + h-k ). Where to find such an information? The most simple approach is the random starting approach. Random phases are associated to a chosen set of reflections. The tangent formula should drive these phases to the correct values. The procedure is cyclic ( up to convergence). How to recognize the correct solution? Figures of merit can or cannot be applied

Tangent cycles φ1 φ’1 φ’’1 ……………. φc1 φ2 φ’2 φ’’2 ……………. φc2 …………………………………………………………………….. φn φ’n φ’’n………………. φcn

Ab initio phasing SIR2009 is able to solve -small size structures (up to 80 atoms in the a.u.); -medium-size structures ( up to 200); -large size (no upper limit) It uses Patterson deconvolution techniques ( multiple implication transformations) as well as Direct methods to obtain a starting set of phases. They are extended and refined via electron density modification techniques

Direct methods limits for proteins: 1) the large size ( proteins range from300 atoms in the asymmetric unit to several thousands). The G factor in the Cochran formula are very small. 2) data resolution

To overcome the limits one is obliged to : -increase the number of direct methods trials . The cost to pay concerns the computing time. - improve and extend the the poor phases available by DM by exploiting some specific features of the proteins ( e.g., the solvent , etc. ).

About the data resolution limit Atomic resolution at length was considered a necessary ( and not sufficient ) condition for ab initio phasing ( Sheldrick rule) , condition relaxed later on ( up to 1.2 Å). If it is not satisfied: - the atomicity condition is violated; - the number of reliable triplet invariants exploitable by the tangent procedure is small. - Patterson and EDM procedures are less effective; - the small ratio number of observations/ number of parameters make least squares unreliable.