Determining protein structures Andrew Torda, wintersemester 2006 / 2007 X-ray numerically most important NMR more detail What is our goal ? a set of x, y, z coordinates short detour to coordinate files … Lecture plan X-ray first, then NMR
Coordinate files and the PDB PDB = protein data bank only good repository of protein structures usually required for publications format from old fortran based programs (columns / punch cards) http://www.rcsb.org/pdb/
Coordinate information general headers and information HEADER PROTEINASE INHIBITOR (TRYPSIN) 18-FEB-95 1BPI 1BPI 2 COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR (BPTI) (CRYSTAL FORM II) 1BPI 3 EXPDTA X-RAY DIFFRACTION 1BPI 5 ….. ATOM 1 N ARG 1 31.758 13.358 -13.673 1.00 18.79 1BPI 137 ATOM 2 CA ARG 1 31.718 13.292 -12.188 1.00 14.26 1BPI 138 ATOM 3 C ARG 1 33.154 13.224 -11.664 1.00 18.25 1BPI 139 ATOM 4 O ARG 1 33.996 12.441 -12.225 1.00 20.10 1BPI 140 ATOM 5 CB ARG 1 30.886 12.103 -11.724 1.00 16.74 1BPI 141 ATOM 6 CG ARG 1 29.594 11.968 -12.534 1.00 15.96 1BPI 142 ATOM 7 CD ARG 1 28.700 13.182 -12.299 1.00 15.45 1BPI 143 ATOM 8 NE ARG 1 27.267 12.895 -12.546 1.00 12.82 1BPI 144 ATOM 9 CZ ARG 1 26.661 13.087 -13.727 1.00 17.38 1BPI 145 ATOM 10 NH1 ARG 1 27.370 13.558 -14.735 1.00 18.38 1BPI 146 ATOM 11 NH2 ARG 1 25.367 12.797 -13.838 1.00 25.73 1BPI 147 ATOM 12 N PRO 2 33.800 13.936 -10.586 1.00 17.07 1BPI 148 ATOM 13 CA PRO 2 34.976 13.367 -9.840 1.00 14.99 1BPI 149 ATOM 14 C PRO 2 34.960 11.922 -9.660 1.00 13.11 1BPI 150 ATOM 15 O PRO 2 33.962 11.306 -9.391 1.00 10.57 1BPI 151 ATOM 16 CB PRO 2 34.922 14.145 -8.523 1.00 15.81 1BPI 152 ATOM 17 CG PRO 2 34.058 15.391 -8.737 1.00 18.91 1BPI 153 ATOM 18 CD PRO 2 33.371 15.273 -10.096 1.00 19.41 1BPI 154 ATOM 19 N ASP 3 36.192 11.317 -9.707 1.00 8.73 1BPI 155 … in case you drop them on the floor temperature factor x, y, z coordinates
X-ray sociology / geography History 1896 X-rays from Wilhelm von Röntgen 1913 Bragg first small molecule 1950's or early 60's first proteins (Mb) X-ray sociology / geography biggest, meanest X-ray source ? DESY (down the street)
Proteins and crystals Proteins can form crystals like table salt or sugar just much more difficult a, b, c define the unit cell may not be perpendicular may have more than 1 molecule a b
Proteins and X-rays light wavelength 4 – 700 x 10-9 m (about 4000 bonds !) x-rays have wavelengths near 1 Å (10-10 m) no such thing as X-ray lens they will diffract cute explanation x-ray frequency about 2 x 1018s-1 electrons move at about 2 x 106 ms-1 effectively standing still
proteins as a diffraction grid remember high school diffraction depends on wavelength x-rays bounce off electron clouds will eventually give information about electron density (ρ(x, y, z)) like light in a diffraction grid intuitively shine light on grid and try to work out separation
Diffraction extra path length ABC if it is a full wavelength 2θ A B C Diffraction extra path length ABC if it is a full wavelength x-rays come out in phase we see a spot formalise this in Bragg’s law we have lots of d’s the bigger the d’s, the closer spaced the diffraction spots we know λ, but have the information from all the d’s at once can this be separated ?
Collecting data rotate sample detector x-ray source rotate sample fuzzy looking spots, indexed by position and angle diagram from www-structmed.cimr.cam.ac.uk/Course/Overview/Overview.html
electron density expression spots are intensity lots of electrons scatter more intensity is square of structure factors F(hkl) example in one dimension where ρx is the density at our one dimensional coordinate x α is a phase, h frequency |F| is a structure factor, absolute value a comes from unit cell everything except α defined can be done in three dimensions
where does this come from ? the structure factors will be periodic property of spacing within the crystal and between atoms should be the same on different days, copies of crystal FT real real FT real FT colours represent phase every dot has its own phase from Kevin Cowtan's http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html
Phasing direct methods not practical for many points other methods replace atoms (MIR) guesses based on model (MR)
Multiple isomorphous replacement (MIR) if we only have a few points, phases are easy proteins have many points make them act like a few points bind some heavy atoms (lots of electrons) they will then dominate scattering modern method.. engineer in sites for selenomethionine phase directly heavy atoms
Molecular replacement (MR) make a model and use it to get rough phases I have unknown protein "A" similar to protein "B" whose structure I do know phases from "B" should be similar to those from "A" idea from "B" calculate density back transform get phases apply to data from "A" in pictures…
Molecular replacement (MR) Background - if I have a protein I can calculate the spots protein coordinates Put protein on a grid calculate ρx at each grid point, use but apply backwards to get Fh (back transform) with phases if I know the structure, I can calculate the expected α 's density on a grid
Molecular replacement (MR) requires a starting model for structure much luck – often works if model is 30 to 40 % identical to correct answer very important procedure many proteins of interest are similar to known ones requires no chemistry (unlike MIR)
Overall procedure Making crystals make crystals collect data phase fit to initial map refine Making crystals do you normally see protein crystals ? concentrated protein + salts + robotic trials lots of trial and error
Data collection can be done in the lab more powerful X-rays from a synchrotron may damage crystal often done in the cold takes 10 minutes (synchrotron) to few days
Fitting to a map the Fourier transform gives you electron density not nuclei not protons (H atoms) may look like a protein atoms have to be placed within skeleton http://www-structmed.cimr.cam.ac.uk/Course/Basic_refinement/Refinement.html
Fitting atoms errors can be made backwards wrong sequence cannot tell O from N http://www-structmed.cimr.cam.ac.uk/Course/Basic_refinement/Refinement.html
Refinement removing noise fixing phases adjusting coordinates from a model, calculate structure factors move atoms to as to match predicted vs measured
Quality Quality of data some proteins diffract better than others 2θ A B C Quality of data some proteins diffract better than others some crystals are better than others completeness Resolution physical meaning most scattered x-rays from smallest "d" best resolution, smallest θ
Disorder / mobility what if proteins do not pack perfectly ? data will be smeared / blurry what if there are differences between molecules ? proteins are not perfectly static what is real resolution typical 1.5 to 3.0 Å best < 0.8 Å (small friendly proteins) worst > 5 Å (large membrane bound)
Local disorder protein may crystallise even when parts are not well ordered typical of loops and termini can we quantify this ? with a model and approximations
Atoms - how well determined ? model of Gaussian density believable ? does reflect refinement contribution from overall disorder atoms not really Gaussians certainly reflects relative mobility can we see this in coordinates ? individual B-factors where U is mean square displacement (Å) σ is width μ is centre
Coordinate information HEADER PROTEINASE INHIBITOR (TRYPSIN) 18-FEB-95 1BPI 1BPI 2 COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR (BPTI) (CRYSTAL FORM II) 1BPI 3 EXPDTA X-RAY DIFFRACTION 1BPI 5 ….. ATOM 1 N ARG 1 31.758 13.358 -13.673 1.00 18.79 1BPI 137 ATOM 2 CA ARG 1 31.718 13.292 -12.188 1.00 14.26 1BPI 138 ATOM 3 C ARG 1 33.154 13.224 -11.664 1.00 18.25 1BPI 139 ATOM 4 O ARG 1 33.996 12.441 -12.225 1.00 20.10 1BPI 140 ATOM 5 CB ARG 1 30.886 12.103 -11.724 1.00 16.74 1BPI 141 ATOM 6 CG ARG 1 29.594 11.968 -12.534 1.00 15.96 1BPI 142 ATOM 7 CD ARG 1 28.700 13.182 -12.299 1.00 15.45 1BPI 143 ATOM 8 NE ARG 1 27.267 12.895 -12.546 1.00 12.82 1BPI 144 ATOM 9 CZ ARG 1 26.661 13.087 -13.727 1.00 17.38 1BPI 145 ATOM 10 NH1 ARG 1 27.370 13.558 -14.735 1.00 18.38 1BPI 146 ATOM 11 NH2 ARG 1 25.367 12.797 -13.838 1.00 25.73 1BPI 147 ATOM 12 N PRO 2 33.800 13.936 -10.586 1.00 17.07 1BPI 148 ATOM 13 CA PRO 2 34.976 13.367 -9.840 1.00 14.99 1BPI 149 ATOM 14 C PRO 2 34.960 11.922 -9.660 1.00 13.11 1BPI 150 ATOM 15 O PRO 2 33.962 11.306 -9.391 1.00 10.57 1BPI 151 ATOM 16 CB PRO 2 34.922 14.145 -8.523 1.00 15.81 1BPI 152 ATOM 17 CG PRO 2 34.058 15.391 -8.737 1.00 18.91 1BPI 153 ATOM 18 CD PRO 2 33.371 15.273 -10.096 1.00 19.41 1BPI 154 ATOM 19 N ASP 3 36.192 11.317 -9.707 1.00 8.73 1BPI 155 … temperature factor x, y, z coordinates
Lastly, what can we do Time to solve a structure days to years crystallisation, phasing biggest structures macromolecular complexes, ribosome, photosynthetic centre, … most difficult and important membrane bound proteins account for many drug targets more applications solve structures with ligands / complexes inhibitors DNA