Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP.

Slides:



Advertisements
Similar presentations
Phasing Goal is to calculate phases using isomorphous and anomalous differences from PCMBS and GdCl3 derivatives --MIRAS. How many phasing triangles will.
Advertisements

Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan
Protein x-ray crystallography
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Methods: X-ray Crystallography
Overview of the Phase Problem
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Experimental Phasing stuff. Centric reflections |F P | |F PH | FHFH Isomorphous replacement F P + F H = F PH FPFP F PH FHFH.
A Brief Description of the Crystallographic Experiment
Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey.
Experimental Phasing Andrew Howard ACA Summer School 22 July 2005.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Fourier transform. Fourier transform Fourier transform.
Visual Recognition Tutorial
In honor of Professor B.C. Wang receiving the 2008 Patterson Award In honor of Professor B.C. Wang receiving the 2008 Patterson Award Direct Methods and.
19 Feb 2008 Biology 555: Crystallographic Phasing II p. 1 of 38 ProteinDataCrystalStructurePhases Overview of the Phase Problem John Rose ACA Summer School.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
Direct Methods By Fan Hai-fu, Institute of Physics, Beijing Direct Methods By Fan Hai-fu, Institute of Physics, Beijing
Overview of the Phase Problem
Phasing based on anomalous diffraction Zbigniew Dauter.
Radial Basis Function Networks
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
Patterson Space and Heavy Atom Isomorphous Replacement
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
Lesson 20 Solving the structure  Getting a good login  The Phase Problem  Charge Flipping.
H.F. Fan & Y.X. Gu Beijing National Laboratory for Condensed Matter Physics Institute of Physics, Chinese Academy of Sciences P.R. China H.F. Fan & Y.X.
Chem Patterson Methods In 1935, Patterson showed that the unknown phase information in the equation for electron density:  (xyz) = 1/V ∑ h ∑ k.
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Chem Structure Factors Until now, we have only typically considered reflections arising from planes in a hypothetical lattice containing one atom.
Overview of MR in CCP4 II. Roadmap
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
1. Diffraction intensity 2. Patterson map Lecture
Zhang, T., He, Y., Wang, J.W., Wu, L.J., Zheng, C.D., Hao, Q., Gu, Y.X. and Fan, H.F. (2012) Institute of Physics, Chinese Academy of Sciences Beijing,
THE PHASE PROBLEM Electron Density
Page 1 X-ray crystallography: "molecular photography" Object Irradiate Scattering lens Combination Image Need wavelengths smaller than or on the order.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Pattersons The “third space” of crystallography. The “phase problem”
Atomic structure model
Anomalous Differences Bijvoet differences (hkl) vs (-h-k-l) Dispersive Differences 1 (hkl) vs 2 (hkl) From merged (hkl)’s.
17 th October 2005CCP4 Database Meeting (York) CCP4i Database Overview Peter Briggs.
Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r  (r) · exp (2  i r · S) dr Using the inverse Fourier Transform  (r) =
Before Beginning – Must copy over the p4p file – Enter../xl.p4p. – Enter../xl.hkl. – Do ls to see the files are there – Since the.p4p file has been created.
Phasing in Macromolecular Crystallography
H.F. Fan 1, Y.X. Gu 1, F. Jiang 1,2 & B.D. Sha 3 1 Institute of Physics, CAS, Beijing, China 2 Tsinghua University, Beijing, China 3 University of Alabama.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Lecture 3 Patterson functions. Patterson functions The Patterson function is the auto-correlation function of the electron density ρ(x) of the structure.
Stony Brook Integrative Structural Biology Organization
OASIS-2004 A direct-method program for
Istituto di Cristallografia, CNR,
Model Building and Refinement for CHEM 645
Solving Crystal Structures
Database Requirements for CCP4 17th October 2005
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
CCP4 from a user perspective
Introduction to Isomorphous Replacement and Anomalous Scattering Methods Measure native intensities Prepare isomorphous heavy atom derivatives Measure.
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
S. Takeda, A. Yamashita, K. Maeda, Y. Maeda
r(xyz)=S |Fhkl| cos2p(hx+ky+lz -ahkl)
Not your average density
By Fan Hai-fu, Institute of Physics, Beijing
Presentation transcript:

Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP

Classical Direct Methods Main method for “small molecule” structure determination Highly automated (almost totally “black box”) Solves structures containing up to a few hundred non-hydrogen atoms in the asymmetric unit.

Direct Methods Assumptions and Requirements Non-negativity of electron density Atoms are “resolved”, i.e. “atomic resolution” data are available Unit cell, symmetry and contents are known

Important Concepts - 1 Normalized Structure Factors E H given by E H = F H / 1/2 with averaging in resolution shells The phase  H of E H is the same as for F H = 1 hence “normalized”

Important Concepts - 2 Structure Invariant - structural quantity independent of choice of unit cell origin Probabilistic estimates can be made for the values of structure invariants given the associated E magnitudes and cell contents

Linear combinations of phases whose Miller indices sum to zero are structure invariants Example  HK =  H +  K +  -H-K = structure invariant, i.e.  HK =  1,2,1 +  2,-1,3 +  -3,-1,-4  HK referred to as triple, triplet, three-phase invariant, invariant, tpr, sigma2 relationship etc

Fundamental formulas involving individual triplets P(  HK ) = [2  I 0 (A HK )] -1 exp(A HK cos  HK ) where P(  HK ) is the probability of the structure invariant having the value  HK A HK = 2 |E H E K E -H-K | / N 1/2 where N is the number of atoms in the cell and the E’s are normalized structure factors

Note probability P(  HK ) increases as A HK increases, and that A HK is proportional to product of E’s and inversely proportional to N 1/2 Expected value of cos  HK is given by = I 1 (A HK ) / I 0 (A HK )

Cochran Distribution for various K’s  vs K  3 =  HK, K=A HK

Most probable value of  HK is always zero, so  HK =  H +  K +  -H-K becomes 0  =  H +  K +  -H-K and  H =  -  K -  -H-K i.e.  1,2,1 =  -  2,-1,3 -  -3,-1,-4 There are many more triplets than structure factors, so the phases are highly over determined (Lysozyme at 3.0Å, has 2186 reflections and 3,636,804 triplets, i.e. 1663:1)

Fundamental formula involving multiple triplets Tangent formula -  |E K E -H-K | sin (  K +  -H-K ) tan (  H ) = ____________________________  |E K E -H-K | cos (  K +  -H-K )

Fundamental formula involving multiple triplets Minimum function R(  ) =  A HK [ cos (  HK ) - I 1 (A HK ) / I 0 (A HK ) ] 2 __________________________________  A HK

Classical Direct Methods Applications for Proteins Used for phase extension to very high resolution Used with moderate success to locate heavy atom sites in isomorphous derivatives E values used in molecular replacement calculations

Current Direct Methods Applications for Proteins Shake n Bake (based on minimum function) used to solve complete protein structures with over 1,000 atoms (rubredoxin, lysozyme, calmodulin etc.), provided data to 1.1Å or better is available Used to locate anomalous scatterer sites from MAD or SAS data

General Shake n Bake Concept Use a multi-solution method starting with random phases (or randomly positioned atoms). For each trial phase set, use a “dual space” procedure iterating between real and reciprocal space optimization/constraints.

Reciprocal space optimization based on shifting phases to reduce the “minimum function” R(  ) Real space optimization and constraints based on computing new phases only from the largest peaks in map based on previous cycle phases Each trial phase set ranked by value of R(  )

Generate random trial structure Select “structure” from largest peaks Compute phases from structure Shift phases to reduce R(  ) Compute map from new phases SnB inner loop for trial structure Stop after N iterations

Application to pyruvate dehydrogenase multi-enzyme complex E 1 component MW 100 Kda (monomer) a= 81.69, b= 141.6, c= 82.46Å,  =102.4° Space group P2 1 Asymmetric unit = dimer, 1774 residues 42 methionines MAD data (3 ) on selenomethionine analog to 2.3Å, used 3.5Å data for Se determination

Choice of data for Se determination Use | |F H | + - |F H | - | (anomalous) difference at single Use | |F H | i - |F H l j | (dispersive) difference between two ’s Use F A values (derived from data at all ’s) Use F HLE values based on max anomalous and max dispersive differences

SelMet-Met Scattering Power  fo  f’  f” Se  Se-SCuK  = inflection point, 2= peak, 3= high energy remote

Projection of peaks down NC twofold

Computing Phases Phases computed by multiplying individual SIR and/or SAS probability distributions using A,B,C,D representation based on intensities. “Standard” E values updated by averaging lack of closure over all reflections, with each reflection’s contribution itself a probability weighted average over all possible protein phases.

MAD Phasing For data collected at 1, 2 etc, choose a wavelength n as “native” data, and “reduce” that data set by averaging Bijvoet pairs. For other “derivative” wavelengths d, reduce both by averaging Bijvoet pairs to form “isomorphous” data sets, and without averaging to form “anomalous” data sets.

MAD Phasing For “isomorphous” and “derivative anomalous” data sets, scale “derivative” to “native” and use scattering factors of f 0 = 0, f’= f’( d) - f’( n), f”= f”( d) For “native anomalous” data use original native Bijvoet pairs and scattering factors of f 0 = 0, f’ = 0, f”= f”( n)

Phase Refinement Minimizing  |FPHcalc  h 2  |FPobs| h 2  |FHcalc| h 2  2|FPobs| h |FHcalc| h cos  P  H  h (  P )| W h P  P  P  h  |FPHobs| h  |FPHcalc  P  | h  2 where

Phase Refinement Options “Classical” -  P = centroid, W h =1/E 2,1/ or unity, P  P =1, use reflections with FOM > “Maximum Likelihood” -  P stepped over allowed phases, P  P = corresponding probability, W h =1/E 2, 1/ or unity, use reflections with FOM > 0.2  P, P  P can also come from external source, i.e solvent flattened or NC-symmetry averaged maps. W h h  P  P |FPHobs| h  |FPHcalc (  P )| h   P  2

MAD 1, 2, 3 data (Scalepack files) “iso” and “ano” scaled files “extension” file all “native” ( 3) data CMBISOCMBANO PHASIT MISSNG FSFOUR BNDRY MAPINV EXTRMP MAPAVG BLDCEL “phase” file “submap” file “averaging” mask file final map

MAD Phasing/Averaging Statistics

Peak anomalous ( 2) difference Patterson

SelMet-Met Scattering Power  fo  f’  f” Se  Se-SCuK  = inflection point, 2= peak, 3= high energy remote

With SnB it’s possible to automatically locate the anomalous scatterer substructure with data from any one of the dispersive combinations or anomalous pair sets As expected, sets with the maximum dispersive or anomalous signal typically yield a greater frequency of success

Automated Applications of BnP: Methodology W. Furey, 1 L. Pasupulati, 1 S. Potter 2, H. Xu 2, R. Miller 3 & C. Weeks 2 S. Potter 2, H. Xu 2, R. Miller 3 & C. Weeks 2 1 University of Pittsburgh School of Medicine and VA Medical Center and VA Medical Center 2 Hauptman-Woodward Medical Research Institute 3 Center for Computational Research, SUNY at Buffalo

SnB Strengths 1. Powerful, state-of-the-art direct methods for automatically locating heavy atom sites 2. Friendly graphical user interface. SnB Weaknesses 1. Stops after finding sites, i.e no protein phasing 2. No software interface PHASES Strengths 1. Proven protein phasing (MAD, MIRAS, etc), solvent flattening, NCS averaging, external program interfacing 2. Interactive graphics PHASES Weaknesses 1. Doesn’t automatically find heavy atom sites 2. Script based, i.e. no GUI Goal: Provide user-friendly software for automatic determination of protein crystal structures

 Combine the SnB program with the “PHASES” package, putting everything under GUI control  Establish default parameters and procedures allowing all aspects of the structure determination to be fully automated  Also provide a manual mode allowing experienced users more control, and to facilitate development  Provide graphical feedback when possible  Facilitate coupling with popular external software Adopted Strategy

 Automatic substructure solution detection  Automatic substructure validation  Automatic hand determination (including space group changes, when needed) Main Developments Required for Automated Structure Determination

Automatic Substructure Solution Detection Original Method Based on histogram (Manual, time consuming, requires user interaction) Current Method Based on R min and R cryst statistics (Automatic, fast, no user interaction)

Automatic Substructure Validation Original Method Left up to user to decide which peaks correspond to true sites (Manual) Current Method (auto mode) Based on occupancy refinement against Bijvoet differences (Automatic, fast, requires no coordinate refinement, hand insensitive) Current Method (manual mode) As in auto but can also compare peaks from different solutions (Manual)

Automatic Substructure Validation

Automatic Hand Determination Original Method Visual inspection of map projections (Manual, requires user interaction) Current Method (MAD, SIRAS or MIRAS) Based on variance differences in protein and solvent regions (Automatic, fast since requires no refinement, also requires no user interaction)

Automatic Hand Determination Current Method (SAS data only) Comparative analysis of R, FOM and CC after solvent flattening/phase combination. (Automatic, fast, requires no refinement) Current Method (SIR, MIR data only) Both hands tried, map examination needed. (Requires user interaction)

No man (or program) is an island Importing data files  Scalepack files  D*Trek files  MTZ files $  Free format files Exporting control files  O  RESOLVE 2.08  Arp/wARP Exporting data files  Free format files  CNS files  MTZ files $  O files  CHAIN files  PDB files Job submission from GUI  RESOLVE $ 2.08  Arp/wARP $ $ RESOLVE, Arp/wARP and/or CCP4 must be obtained from their respective authors/distributors for these options to work

Results for 1jc4 a=43.6 b=78.6, c=89.4 Å,  = 91.95°, P2 1 4 molecules (592 residues) in asu 2.1Å data, 3 MAD data Substructure: Found 24 of 24 Se Phasing: mean PP- 2.95; mean FOM Time to map: ~41 min on G4 (1.5 GHz) Powerbook ~13 min on G5 (2.7 GHz) Desktop Auto Tracability: Resolve- 87% main chain, 68% side chain Arp/wARP- 82% main chain, 73% side chain

Steps Included in BnP “Auto Runs” StepTime* Normalization and other data preparation38 sec SnB substructure phasing (6 trials)8 sec Occupancy refinement23 sec Enantiomorph (hand) determination15 sec Rigorous substructure positional and thermal parameter refinement 12 min Solvent flattening23 sec * For a 24-site substructure (PDB code: 1JC4) using 3-wavelength MAD data (~35K Bijvoet pairs each) on an Apple Power Mac G5.

SeMet ASU Size & Data Resolution PDB Code No. Sites No. ResiduesNCS d(Å) PDB Code No. Sites No. ResiduesNCS d(Å) 1QC CLI BX A7A CB L8A T5H E3M JXH HI GSO GKP TPS DQ DBT E2Y JEN M JC EQ

Phasing Flexibility (Manual Mode)

Conclusion BnP is a user friendly, efficient, package for the automated determination of protein structures from x-ray diffraction data BnP downloads for Linux, Apple G5 & G4 and SGI’s available (academic & non-profit institutions) at