Download presentation
Presentation is loading. Please wait.
Published byFranklin Wilkins Modified over 8 years ago
1
SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine all their structures in a high-throughput and high-resolution manner?
2
MOTIVATION FOR DETERMINING PROTEIN STRUCTURE The functions necessary for life are undertaken by proteins. Protein function is mediated by protein three-dimensional structure. Knowing protein structure at high resolution will enable us to: Determine and understand molecular function. Understand substrate and ligand binding. Devise intelligent mutagenesis and biochemical experiments to understand biological function. Design therapeutics rationally. Design novel proteins. Knowing the structures of all proteins encoded by an organism’s genome will enable us to understand complex pathways and systems, and ultimately organismal behaviour and evolution. Applications in the area of medicine, nanotechnology, and biological computing.
3
HOW CAN WE DETERMINE STRUCTURE? 0246 Accuracy Experiment (X-ray, NMR) Computation (de novo) Computation (template-based) Hybrid (Iterative Bayesian interpretation of noisy NMR data with structure simulations) One distance constraint for every six residues One distance constraint for every ten residues C α RMSD
4
DISTANCE INFORMATION USING MASS SPECTROSCOPY Add crosslinkers Repeat using different crosslinkers and isotope labelling Identify crosslinked fragments MS MKRS LVKQ VSKNT KEVN Confirm sequence MS Identify proteins with single crosslinks and fragment MS
5
HOW AND WHY WILL THIS WORK? Perform experiments to obtain a number of distance constraints (one for every six residues for medium to high-resolution structures). Perform simulations based on high confidence constraints and use distance distributions from resulting structures to iteratively reinterpret the spectra (without repeating experiment) until we obtain a high-resolution structure. Computational aspects largely complete. Components of approach have been implemented by others in a limited way but are assembled here in a robust and unique manner. Method can handle: Impure protein purification (ex: structural genomics failures). Environment-dependent structures (ex: chaperones + effectors). Partially disordered proteins. Several proteins simultaneously (large scale). No need for proteolytic digestion (complicates things). Focus on structures from noisy data, unlike X-ray diffraction and NMR.
6
PLAN OF ACTION Begin computational studies using simulated data (with noise) and develop software to prioritise experiments (ex: crosslinker choices). Initial studies using UW Mass Spectrometry Center: Start with fairly pure mixtures >> not-so-pure mixtures >> 2-3 proteins >> handful of proteins >> Difficult proteins >> heterogenous mixtures >> whole proteomes. Advice from Aebersold, Kelleher. Team of 10-20 personnel working on crosslinking technology, protein enrichment, mass spectroscopy, structure calculation, parameterisation. Dedicated instrumentation through Pioneer Award, startup, MRI. Bayesian framework will be utilised to estimate accuracy/error: Avoid repeating past oversight with NMR. Obtain an R-factor like estimate as in X-ray diffraction. Comparison of generated spectra from models to actual spectra. Iterative reinterpretation of experimental data.
7
RECENT SUCCESSES AND SUITABILITY PROTINFO structure for 1aye 1.8 Å C α RMSD for 70 residues http://protinfo.compbio.washington.edu PROTEIN STRUCTURE DETERMINATION PROTEIN DESIGN/NANOTECHNOLOGY PROTEIN INHIBITOR DISCOVERY Track record of notable successes (5 years). Excellent environment at UW/Seattle. Ability to unify components cohesively. Young and highly energetic. Right combination of computational skills and experimental design strategy to carry out the work.
8
OUTCOME AND EXPECTATIONS Structural genomics projects aim to obtain a representative structure of every protein family using X-ray diffraction and NMR methods and employ computational methods to fill in the gaps. However, several families of proteins will not be accessible by these structure determination methodologies, and computational methods alone are far from capable of consistently producing high resolution structures. Even in successful cases, the effect of the biological environment on protein structure is not accounted for. Our hybrid approach, which complements existing structural genomics efforts, will be used to rapidly obtain structures for entire proteomes in biologically relevant environments.
10
WHY ARE CURRENT METHODS NOT ADEQUATE? The major bottlenecks for both X-ray diffraction and NMR studies is producing sufficient quantities of the protein in a pure form to perform the experiments. Deviations from ideal behaviour in a protein sample result in slow and labour-intensive structure determination, if at all possible. These major structure determination techniques were developed at a time when our worldview of proteins was simple and did not account for environment-dependent structure formation, protein dynamics and conformational changes, and post-translational modifications. The vast majority of proteins will therefore be inaccessible to X-ray diffraction and NMR studies. Computational approaches do not have the resolution of experimental approaches and lack consistency.
11
CROSSLINKING POSSIBILITIES Seven chemical groups that can be crosslinked: amines (2), carboxyls (3), and thiols (2). Numerous distances for the ~42 (7 x 6) possible pairs of groups. For every 100 residues, there may be up to ten members of each group, but typically only one crosslink is possible at a particular distance out of the ~100 possible pairs. For every 100 residues, the total number of groups is ~20-40, resulting in a potential yield of 400-1600 distance constraints if all crosslink possibilities can occur.
12
DISTANCE INFORMATION USING KNOWN STRUCTURES Residue specific all-atom probability discriminatory function (RAPDF) s(d ab ) for contacts AO AN AC... YOH AO AN AC … YOH 167 X167 contacts distance bins Known structures atom-atom contacts AO AN AC... YOH AO AN AC … YOH Candidate structure atom-atom contacts AO AN AC... YOH AO AN AC … YOH NxN contacts
13
STRUCTURES FROM SIMULATIONS USING RAPDF Good correlation between RAPDF score and accuracy of structure. RAPDF is one of the first all-atom knowledge-based functions and is a standard by which other scoring functions are compared. RAPDF has contributed to our success at CASP when combined with our simulation protocols to sample protein conformational space efficiently. PROTINFO AB CASP6 prediction for T0281 4.3 Å C α RMSD for all 70 residues (continuous RAPDF produces 2.1 Å RMSD structure) PROTINFO CM CASP6 prediction for T0271 2.4 Å C α RMSD for all 142 residues (46% ID)
14
DISTANCE INFORMATION USING NMR H HN N Peak coordinates: 1.235 9.738 130.97 Protons with consistent chemical shifts: 43 VAL HG1 1.256 - - 8 ILE HN 9.748 130.95 59 LEU HB3 1.242 - - Nucleii of proteins emit RF radiation measured in the form of chemical shifts. Primary source of distance information between protons is due to NOE. Steps: experiment (labourious), chemical shift assignment (automated), peak assignment (nontrivial), and structure determination (partially automated). Bayesian estimation of contact probabilities: Prior Post. Dist. 43 VAL HG1 - 8 ILE HN 0.038 0.75 4.6 Å 59 LEU HB3 - 8 ILE HN 0.002 0.05 8.0 Å
15
STRUCTURES USING COMPUTATION AND EXPERIMENT Bayesian approach calculates the probability distribution of each NOE peak contributing to proton-proton distances in a protein. Approach is assignment free, fast, fully automated, tolerant of noise, incompleteness and ambiguity, and enables iterative reinterpretation of source experimental data based on simulated structures (90% complete). PROTINFO NMR structure for 1aye 1.8 Å C α RMSD for 70 residues PROTINFO NMR structure for mjnop 3.5 Å C α RMSD for 50 residues (required manual interpretation for several months)
16
DISTANCE INFORMATION USING MASS SPECTROSCOPY Add labelled and unlabelled crosslinkers to a heterogeneous mixture of proteins Repeat with different fragmentation resolution, crosslinker types, isotope labelling Relative abundance mass/charge For each peak representing a protein with a single crosslinker: MS Enrich (LC, biotin) fragment MS Identify peaks consistent with crosslinked fragments and obtain distance constraints Relative abundance mass/charge
17
INTERPRETING MASS SPECTRA …AKRS…LKYVT…SKL…ARKT… (4 x 3 = 12 possibilities, one true contact) Ambiguous peaks in spectra are Relative abundance mass/charge AKR-LK ARK-KL disambiguated (either eliminated or prioritised) using different fragmentation resolution, database preferences, and iterative reinterpretation after structure simulations Relative abundance mass/charge AKRS-LKY Spurious peaks in spectra are mass/charge eliminated using isotope labelling (look for precise shifts) AKR-LK ARK-KL Relative abundance AKR-SK? Relative abundance mass/charge AKR-LK ARK-KL
18
DISTANCE INFORMATION USING FRET Analogous to MS approach, but instead of peaks representing mass/charge ratios that identify two crosslinked residues (indirect distance information), we can obtain direct distance information. Express protein in an in vitro system to ensure single flurophore donor/acceptor pair for two residues in a protein. Use confocal microscopy setup to measure energy transfer for many donor/acceptor pairs. Distance is based on donor/acceptor type can be obtained for any pair of residues that do not cause loss of structure (determined by consistency across many pairs); tangential benefit of identifying structurally important residues. Ideal for measurement of long range distances and for large proteins.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.