Download presentation
Published byMarilyn Shields Modified over 9 years ago
1
~400,000 peptide mass spectra
2
A few diverse examples of proteins: A muscle protein:
aspirin A virus protein shell (“capsid”): Watercolors by David Goodsell, Scripps
3
Outline Part I What dictates the 3D shape (“fold”) of proteins?
1. Primary structure of proteins - amino acids & peptide bonds 2. Secondary structure of proteins - “local” folding topology & predicting 2° structure 3. Tertiary structure of proteins - “global” folding topology - X-ray crystallography & NMR - aligning structure computationally - protein folding - designing new structures Part II How do proteins interact with each other in the cell?
4
The levels of protein structure:
5
solvent accessible surface “ribbon” = Ca backbone Different representations of a typical globular protein (myoglobin) ribbon + stick-figure side chains all atoms drawn at van der waals radii
6
Due to resonance forms of the peptide bond:
Peptide bonds (N-CO) are planar, so only allowed rotation along amino acid backbone is around Ca-N and Ca-CO bonds ==> by convention angles called F & Y Protein folding = the selection of F/Y angles & side chain angles leading to low energy packing of the atoms
7
A Ramachandran plot shows only certain F/Y combinations are sampled,
dictated by steric hindrance of atoms neighboring peptide bond Favored regions correspond to secondary structures ==> allowable “local” structural conformations
8
3 of the most common secondary structures
a helix 3.6 aa’s/turn
9
Amino acids vary in their intrinsic propensities to adopt
the different secondary structures
10
Given aa sequence, how to predict 2° structure? ==> PhD
input = 13 aa sliding window - neural network, predicts 3 states: a helix, b strand, coil & relative level of solvent accessibility ==> 3 state prediction accuracy ~72%
11
Some proteins have unusual secondary structures that
span membrane => membrane proteins How to identify transmembrane segments in a protein? A 0.1 B 0.3 C 0.4 A 0.4 B 0.1 C 0.3 Current best approach, TMHMM is based on Hidden Markov models. transition probabilities Hidden states emission Y A generic HMM: X Hidden state seq: Observable seq: XXXXYYYYXXXY CCBCCAAABCAC Goal = recover hidden state sequence by analyzing emissions
12
TMHMM hidden Markov model inside & outside loop models, helix cap
HMM for 5-25 aa helix core Correctly predicts >90 % of the transmembrane helices Discriminates between soluble and membrane proteins with false positive rate ~1% Krogh et al, J Mol Biol. 305: (2001)
13
Packing of secondary structures leads to more complex
3D assemblies (“motifs”):
14
= 3D packing of secondary structural elements
Tertiary structure = 3D packing of secondary structural elements - Hydrophobic residues (Phe, Ile, Leu, Trp) buried in the core - Core densely packed; not room even for H2O, comparable to a typical crystal - Core atoms so close that van der Waals bonds contribute significantly - Charged and polar R groups (e.g., Arg, Lys, Glu, Asp, His) on outside and hydrated
15
atomic layers in crystal
Experimental approaches to protein structure I X-ray crystallography crystal of pure protein Rotate crystal, collect amplitudes of diffracted X-rays as function of incident angle of X-rays Find phases of diffracted X-rays (by experiment or computation) With phases & amplitudes, Fourier transform to find distribution of electrons (“electron density”) in protein Electrons in crystal diffract X-rays according to Bragg’s Law: nl = 2d sinq wavelength angle of X-rays to plane of atoms distance between atomic layers in crystal Build atomic model into electron density, refine From B. Rupp’s X-ray crystallography intro:
16
Experimental approaches to protein structure II
Nuclear magnetic resonance protein in solution in center Vary radio wave pulses, Measure field generated in response over time => function of chemical environment of each nucleus Assign identities to nuclei, measure distances between amino acid atoms Use distance geometry to solve for ensemble of 3D structures consistent with distance constraints very strong magnet coils to send/detect radio waves Basic principle: Atomic nuclei w/ odd mass #’s have spin ==> charged, spinning particles & produce magnetic field In an external magnetic field, this nuclear magnetic field precesses around an axis Can observe this process by applying radio wave pulses at frequencies related to precession frequencies & measuring the resulting induced electric current Flemming Poulson, A Brief Introduction to NMR spectroscopy of proteins.
17
3 broadest classes of protein 3D structures
Fibrous e.g., collagen Membrane e.g, K+ channel & Globular ...
18
Examples of globular protein “folds”
all a a/b all b a+b
19
>24,000 experimentally determined protein structures
stored in PDB database:
20
Atomic coordinates of a protein structure (PDB format)
- first 3 aa’s = Met-Glu-Ala... atomic coordinates aa type & # occupancy atom type atom # & name x y z B-factor ATOM N MET A N ATOM CA MET A C ATOM C MET A C ATOM O MET A O ATOM CB MET A C ATOM CG MET A C ATOM SD MET A S ATOM CE MET A C ATOM N GLU A N ATOM CA GLU A C ATOM C GLU A C ATOM O GLU A O ATOM CB GLU A C ATOM CG GLU A C ATOM CD GLU A C ATOM OE1 GLU A O ATOM OE2 GLU A O ATOM N ALA A N ATOM CA ALA A C ATOM C ALA A C ATOM O ALA A O ATOM CB ALA A C
21
Some of the major computational questions in structural biology
1. How to distinguish membrane proteins from soluble proteins ? 2. How to align protein structures & start organizing them into families, etc. ? 3. How to predict folded protein structure from the linear amino acid sequence? 4. How to identify the active/functional region of the protein from the structure? 5. How to predict the interactions of drugs or other proteins from the structure? 6. How to computationally predict the structural consequences of mutations? 7. How to predict protein function from structure? 8. How to design new or unnatural protein structures?
22
How to find the best superposition of 2 protein structures?
Note: superimposing 2 structures is easy if you know the equivalent amino acids -> the hard part is to find this mapping of atoms from 1 structure to the other One now-classic approach: DALI Align sequence #1 to sequence #2 so as to maximize similarity in contact patterns Amino acid # Protein #1 structure Ca coordinates only Amino acid # Calculate matrix of all pairwise Ca-Ca distances Repeat for protein # 2 Holm & Sander, J Mol Biol. 233: (1993)
23
Best structural alignment corresponds to maximizing
i, j = aligned pairs of matched residues i = iA, iB j=jA,jB f = similarity of 2 Ca-Ca distance matrices, dAij and dBij In the simplest case, where dAij and dBij are equivalenced residues in proteins A and B. and q R = minimum level of similarity Choose mapping of residues (e.g. iA to iB) to minimize dAij- dBij iA iB dAij dBij jA jB Protein A Protein B
24
The ability to compare structures has led to recognition
of a hierarchy of 3° structures (“folds”) Class As organized in the CATH or SCOP or FSSP databases: Architecture Manual classification at architecture level, automated at topology level Topology Homologous Superfamily H flavodoxin homologues
25
Protein Folding Classic experiment from 1960’s (Chris Anfinsen):
Purified small protein RNaseA, Refolded in a few minutes in solution ==> all information necessary for correct folding was captured in the linear amino acids sequence Corollary: Proteins do not fold by randomly testing conformations. Given a 100 amino acid protein, & 10 possible conformations / amino acids = possible conformations for the protein ==> not possible to randomly sample, clearly constrained search
26
An energetic view of the folding process
Fast Slow Large # of conformationally different molecules Collection of similar conformations interconverting Unique or small # of final conformations optimize packing T “hydrophobic collapse” free energy U M F Molten globule Transition state Unfolded Folded folding trajectory Local secondary structures form first Adapted from Branden & Tooze
27
One long-time goal of biologists/biophysicists:
Solve the Protein Folding Problem = computationally predict protein 3D fold from 1D amino acid sequence Two general approaches: 1st principles/ab initio: e.g., atomistic molecular dynamics simulations of proteins, modeling force fields w/ electrostatic, van der waals forces, solvent, etc. over long time Empirical: - fold recognition/threading - reverses the process: given set of structures, learn empirical rules that predict folds Empirical currently more successful at predicting final structure, but no information about folding trajectory
28
An example of a successful design of a new protein fold
by a combination of empirical & ab initio structural modeling designed 93 amino acid protein with topology not in PDB dbase designed model solved structure Kuhlman et al, Science, 302: (2003)
29
The Kuhlman et al. design strategy
Starting model = Choose predefined 3D topology Assemble 3D model from 3 and 9 amino acid fragments of known structure ==> Generated 172 backbone-only starting models Initialization Choose optimal sequence for each starting model using energy function that captures: 12-6 Lennard-Jones potential orientation-dependent hydrogen bonding term implicit solvation model Choose amino acid side chain orientations (“rotamers”) by sampling from known structures Iterate between: Optimize choice of amino acid sequence for a fixed backbone conformation Optimize amino acid backbone coordinates for a fixed sequence Same energy function used at all stages Only previous lowest energy sequence/structure optimized at each stage Final designed sequence not similar to any known protein sequence Kuhlman et al, Science, 302: (2003)
30
References A good introduction to structural biology = Introduction to Protein Structure - Carl Branden & John Tooze Web resources: Protein Data Bank = > 24,000 protein structures, atomic coordinates, & the “protein of the month” CATH/SCOP protein structure hierarchies: Several of the illustrations in this tutorial were taken from Lehninger Principles of Biochemistry, by Nelson & Cox
31
Part II
32
Macrophage (“white blood cell”) Blood serum
Bacterium “Macrophage and Bacterium 2,000,000X” Watercolor by David S. Goodsell, 2002
33
Typical size ranges of known protein structures & assemblies
single protein domain dimeric protein aquaporin (membrane channel) Ribosome From a (recommended) review article==>Sali et al. Nature 422: (2003)
34
Outline Part I What dictates the 3D shape (“fold”) of proteins?
Part II How do proteins interact with each other in the cell? 4. “Quaternary” structure of proteins & protein interactions 5. Experimental approaches to determine interactions - yeast 2 hybrid, mass spectrometry 6. Testing the accuracy of the interactions 7. Moving back to the atomic resolution world - electron microscopy & tomography - modeling structures of complexes
35
Why study interactions?
Proteins interact all the time (e.g., bump into each other non-specifically) We’re interested in specific interactions ==> e.g., those w/ downstream consequences For example, consequences might include: Inducing a change in the structure of an interaction partner Stabilizing or destabilizing an interaction partner Modifying the activity of a protein (activate, inhibit, or otherwise regulate) Cause interaction partner to move to another location Cut interaction partner Chemically modify interaction partner (phosphorylate, dephosphorylate, glycosylate, deglycosylate, ubiquitinate, sumoylate, etc... ==> more than 200 modifications to proteins known, many catalyzed by other proteins So, defining interactions helps to define these processes & their functional consequences
36
Experimental/Computational methods for observing/inferring protein interactions
Sali et al. Nature 422: (2003)
37
X-ray structure of ATP synthase Schematic version Network representation a b g d b2 e a c12 Total set = protein complex Sum of direct + indirect interactions
38
Some methods measure direct interactions, some indirect
Xenarios & Eisenberg, Curr. Op. Biotech. 12:334-9 (2001)
39
Interactions between yeast proteins
40
Experimental approaches to protein interactions I
Yeast two-hybrid + DBD “Bait” “Prey” Act DNA binding domain Transcription activation domain Prey Act Core transcription machinery Bait DBD transcription operator or upstream activating sequence Reporter gene Basic idea = screen library of “prey” proteins to test which ones interact with a given “bait” protein Fields & Song, Nature 340:245-6 (1989)
41
Experimental approaches to protein interactions I
High-throughput yeast two-hybrid I Haploid yeast cells expressing activation domain- prey fusion proteins Diploid yeast probed with DNA-binding domain- Pcf11 bait fusion protein Uetz et al. Nature 403 (2000)
42
Uetz et al. Nature 403 (2000)
43
was the apparent inconsistency among the interaction sets
A second group (Ito et al.), with a related yeast two-hybrid approach, also mapped a large number of interactions, then compared the interactions w/ the Uetz data: A surprise at the time was the apparent inconsistency among the interaction sets ==> either # of potential interactions is large or false positive rate high (or both) Ito et al. PNAS 98: (2001)
44
Experimental approaches to protein interactions II
Mapping complexes by mass spectrometry I “Bait” protein Tag Interaction partners co-purified with “bait” Affinity column 493 bait proteins 3617 “interactions” protein 1 protein 2 Ho et al. Nature 415 (2002) SDS- page protein 3 Trypsin digest, identify peptides by mass spectrometry protein 4 protein 5 protein 6
45
Experimental approaches to protein interactions I
A variant: Tandem affinity purification (TAP) + Mass spectrometry Tag1 Tag2 Bait Affinity column2 protein 1 Affinity column1 protein 2 SDS- page protein 3 protein 4 + protease protein 5 protein 6 Trypsin digest, identify peptides by mass spectrometry Affinity column1 Rigout et al., Nature Biotech. 17: (1999)
46
Gavin et al. Nature 415 (2002)
47
How accurate are these high-throughput screens?
Can compare to known interactions, but these are incomplete A different strategy is to identify properties that correlate with interactions & test versus those properties Three tests: 1. Comparison of interactions to a reference interaction set 2. Comparison of mRNA co-expression of interacting partners 3. Comparison of functions of predicted interaction partners
48
(tends to underestimate accuracy)
Test #1 Estimate accuracy by comparing to a well-determined reference set of interactions (tends to underestimate accuracy) von Mering, Krause et al. Nature May 8, 2002
49
Estimating interaction assay accuracy by
Test #2 Estimating interaction assay accuracy by assessing mRNA co-expression of putative interaction partners Correlation coefficient between expression vectors derived from many DNA microarray experiments True interactions Random Protein Pairs Estimate % false positives from observed vs. expected genes w/ correlated expression Estimated false positive rates based on this test: Mrowka et al. Genome Research 11: (2001)
50
of those from random & well-characterized interactions
A related strategy: fit distribution of co-expression relationships as mix of those from random & well-characterized interactions ==> Mixture % indicates accuracy. Deane, Salwinski et al. Mol. Cell. Proteomics (2002)
51
Estimated true positive rates based on this test
>1 independent expmt >2 independent expmt Genome-wide yeast two-hybrid At least 1 small-scale expmt >1 independent experiment Paralogs also interact Increasing # of Interaction Sequence Tags Deane, Salwinski et al. Mol. Cell. Proteomics (2002)
52
S U pw1 pw2 Test #3 Swi4 Cdc27 Cell cycle MAPK signaling pathway
Estimate accuracy by measuring functional similarity of putative partners ==> in particular, measure tendency to be in same cellular system or process From literature & pathway databases (KEGG/GO), we know ~ yeast protein functions: Swi4 Cdc27 Cell cycle MAPK signaling pathway Cell cycle Ubiquitin-mediated proteolysis Pathways of A Pathways of B pw1 pw2 Jaccard coefficient = # pathways in common / # total pathways 1 n S pw1 pw2 U <pathway similarity> = n pairs Systematically test every pair of characterized proteins
53
Quality of the observed protein-protein interactions
as measured by the pathway overlap test max agreement of interacting proteins’ pathways Small-scale experiments Large-scale yeast two-hybrid interaction experiments Date & Marcotte, Nature Biotech. 2003
54
The various accuracy tests agree to a first approximation
(at least as regards the ranking of accuracies) Estimated True Positive Rate via Co-expression Test set Pathways Authors Method # interactions Mrowka Deane vonMering Date Ito et al. Y2H % 22% % ~18% Ho et al. MS ~3617 ~10% 1-3% Gavin et al. MS ~1440 ~85% ~10% Uetz et al. Y2H % 50% ~57% Tong et al. synthetic lethal ~20% >1 independent experiment ~ % ~30-40% ~87% >2 independent experiments % ~60-70% ~95%
55
The current highest throughput protein interaction screens:
Authors Method # interactions Ito et al. Y2H Yeast Tong et al. SL ~4000 Ho et al. MS ~3617 Gavin et al. MS ~1440 Uetz et al. Y2H Fromont-Racine et al. Y2H Tong et al. SL Newman et al. Y2H C. elegans Li et al. Y2H ~4000 Walhout et al. Y2H Davy et al. Y2H Fly Giot et al. Y2H 20,405 Human Bouwmeester et al. MS & several others, including Hepatitis C & H. pylori Y2H = yeast two hybrid MS = mass spectrometry SL = synthetic lethal
56
==> ~1/3 of the way to a complete map!
How many meaningful physical protein-protein interactions are there? At a rough estimate: Human Yeast ~5,800 genes ~40,000 genes ~5,800 proteins x 2-10 interactions/protein >>40,000 proteins x 2-10 interactions/protein ~12, ,000 interactions >>80, ,000 interactions >10-20,000 known, perhaps ~1/2 correct ==> ~1/3 of the way to a complete map! <5,000 known ==> approx. 1% of the complete map! ==> We’re a long ways from the complete map of the human “interactome”
57
electron microscopy or
Can we relate these interactions back to the protein structure? ==> A growing area of research is combination of low resolution structure with atomic models to build structures of protein complexes: For example: Low resolution electron density map from electron microscopy or electron tomography Experimental or computational protein models Rough estimate of atomic model of protein complex
58
Reconstructed electron density map
Example 1 – Electron microscopy of a protein complex Experimental electron microscopy data Reconstructed electron density map of protein complex Dock atomic models into electron density maps Sali et al. Nature 422: (2003)
59
Example 2- Electron tomography of a protein complex/assembly
Measure projections of molecules after illuminating with electron beam from different angles Reconstruct density distribution (“tomogram”) as sum of back-projected densities Sali et al. Nature 422: (2003)
60
Reconstructing cellular organization of molecular complexes by
fitting structures into electron tomograms “noisy” tomogram (3D density map) of single cell Fit known structures (“templates”) into density Sali et al. Nature 422: (2003)
61
Some Protein Interaction Resources on the Internet
Protein interaction databases Biomolecular Interaction Network Database (BIND) Currently 73,000 interactions Database of Interacting Proteins (DIP) Currently 44,000 interactions Protein Quaternary structure database (PSQ) Atomic structures of interacting proteins Interactive visualization of networks Cytoscape: Interactive display of protein networks LGL (Large Graph Layout): Visualization of networks with up millions of edges, 100,000’s of vertices
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.