Download presentation
Presentation is loading. Please wait.
Published byPosy Davidson Modified over 9 years ago
1
Modeling the Structures of Macromolecular Assemblies
Frank Alber Maya Topf Andrej Šali Get the 3 Aims slide. That is the direction/spirit for the talk. Do I quote lots of our papers????!!!! Ask people. Improve "overall model accuracy" with better backbone plots; list error types on it. Get the latest Chothia & Lesk plot from Eashwar. Get the latest numbers for the TrEMBM run, pie chart, density plot. Latest ModBase slide - perhaps from the ModView paper. Something better on LigBase - more info; on the slide. Have 3 slides on the BRCA1 BRCT domain. Get the NPC slides from Mike & Frank! Write the bullet points for each slide. Think globally. Maybe have a text-based slide with steps in ModPipe, with references, indicating a lot of our methods work. Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco U C S F
2
Contents Modeling of assemblies by optimization (5’, Andrej)
Representation for modeling, illustrated by NPC (15’, Frank) Comparative modeling (10’, Andrej) Combined comparative modeling and density fitting (10’, Maya)
3
Acknowledgments http://salilab.org Our group/assemblies Frank Alber
Maya Topf Fred Davis Dmitry Korkin M.S. Madhusudhan Damien Devos Marc A. Martí-Renom Mike Kim Bino John Narayanan Eswar Ursula Pieper Min-yi Shen EM Wah Chiu Matt Baker Wen Jiang Chandrajit Bajaj E. coli ribosome H.Gao J.Sengupta M.Valle A.Korostelev S.Stagg P.Van Roey R.Agrawal S.Harvey M.Chapman J.Frank Yeast NPC Tari Suprapto Julia Kipper Wenzhu Zhang Liesbeth Veenhoff Sveta Dokudovskaya Mike Rout Brian Chait NIH NSF Sinsheimer Foundation A. P. Sloan Foundation Burroughs-Wellcome Fund Merck Genome Res. Inst. Mathers Foundation I.T. Hirschl Foundation The Sandler Family Foundation SUN IBM Intel Structural Genomix Structural Genomics Stephen Burley (SGX) John Kuriyan (UCB) NY-SGRC
4
Protein and assembly structure by experiment and computation
Sali, Earnest, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, , 2003.
5
Determining the structures of proteins and assemblies
Use structural information from any source: measurement, first principles, rules, resolution: low or high resolution to obtain the set of all models that are consistent with it.
6
Modeling proteins and macromolecular assemblies
by satisfaction of spatial restraints Representation of a system. Scoring function (spatial restraints). Optimization. There is nothing but points and restraints on them. Sali, Earnest, Glaeser, Baumeister. Nature 422, , 2003.
7
Assembly representation for modeling (visualization)
Multi-scale representation: Resolution of subunit structures can vary largely. Hierarchical representation: Spatial information about subunit-subunit interactions may be available at different resolutions. Flexibility of subunits: Subunit structures may be subject to conformational changes upon assembly formation. Points and restraints between them (distance, angle dihedral restraints).
8
Software implementation
Assembler, extension of MODELLER Provides a framework for assembly modelling: multi-scale hierarchical representations, integration of experimental input data, calculates models that are consistent with input data.
9
Resolution of restraints
300Å 1Å Cryo-EM/ tomography Site-directed mutagenesis Cross-linking FRET Binding site predictions Computational docking Correlated mutations Site-directed mutagenesis Cross-linking FRET Binding site predictions Computational docking Cryo-EM/ tomography Tandem affinity purification FRET Site-directed mutagenesis Binding site predictions Computational docking Cryo-EM/ tomography Cross-linking Immuno-EM Yeast-two-hybrid Tandem affinity purification FRET Site-directed mutagenesis Binding site predictions Computational docking Cryo-EM/ tomography Cross-linking Cross-linking FRET Site-directed mutagenesis Computational docking Binding site predictions Correlated mutations
10
Resolution of restraints
300Å 1Å Cross-linking Cross-linking Cross-linking Cross-linking Cross-linking Cryo-EM/ tomography FRET FRET FRET FRET FRET Site-directed mutagenesis Site-directed mutagenesis Site-directed mutagenesis Site-directed mutagenesis Site-directed mutagenesis Computational docking Cryo-EM/ tomography Cryo-EM/ tomography Computational docking Computational docking Binding site predictions Binding site predictions Binding site predictions Cryo-EM/ tomography Cryo-EM/ tomography Cryo-EM/ tomography Tandem affinity purification Tandem affinity purification Immuno-EM Yeast-two-hybrid
11
Properties of points Interaction centers:
Binding patches Define interaction centers at each structural level. Interaction centers transmit interactions between assembly subunits.
12
Properties of points Interaction centers: Subunit flexibility: q
Allow or restrict conformational degrees of freedom within assembly subunits. Subunit flexibility: q Define interaction centers at each structural level. Interaction centers transmit interactions between assembly subunits. Design alternative internal restraints sets for different conformational states.
13
Modeling the Yeast Nuclear Pore complex by satisfaction of spatial restraints
Andrej Sali Frank Alber, Damien Devos Mike Rout, T. Suprapto, J. Kipper, L. Veenhoff, S. Dokudovskaya Brian Chait, W. Zhang Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco The Rockefeller University, 1230 York Avenue, New York
14
Integrate spatial information
NUP Stoichiometry NUP Localization NUP- NUP Interactions Symmetry Global shape NUP Shape 2) 3) 1)
15
Successful optimization
16
Analysis of models Assessing the well scoring models:
How similar are the models to each other? Do the models make sense given other data? Search for conservation of : Protein-protein contacts Structural features Visualization of the results: Probability distribution of subunit positions (density maps)
17
Comparative modeling
18
Steps in Comparative Protein Structure Modeling
Template Search TEMPLATE START TARGET ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPERASFQWMNDK No Target – Template Alignment MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE Model Building OK? Model Evaluation END Yes Main steps in comparative modeling, all approaches: 1) Threading can be used, others, in step 1. A. Šali, Curr. Opin. Biotech. 6, 437, 1995. R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997. M. Marti et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291,
19
Comparative modeling by satisfaction of spatial restraints MODELLER
3D GKITFYERGFQGHCYESDC-NLQP… SEQ GKITFYERG---RCYESDCPNLQP… 1. Extract spatial restraints F(R) = P pi (fi /I) i 2. Satisfy spatial restraints Comparative Modeling by Satisfaction of Spatial Restraints. Two points: Similar to NMR structure refinement formally. We have invented a formalism that allows us to combine physical and evolutionary information about a given protein sequence in a flexible and relatively accurate manner. A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.
20
Comparative modeling of the TrEMBL database
Unique sequences processed: 1,182,126 Sequences with fold assignments or models: 659,495 (56%) 70% of models based on <30% sequence identity to template. On average, only a domain per protein is modeled (an “average” protein has 2.5 domains of 175 aa). 9/15/ ~4 weeks on 660 Pentium CPUs
21
Pieper et al., Nucl. Acids Res
22
Structural Genomics Sali. Nat. Struct. Biol. 5, 1029, 1998. Sali et al. Nat. Struct. Biol., 7, 986, 2000. Sali. Nat. Struct. Biol. 7, 484, 2001. Baker & Sali. Science 294, 93, 2001. Characterize most protein sequences based on related known structures. Characterize most protein sequences based on related known structures. The number of “families” is much smaller than the number of proteins. Any one of the members of a family is fine. CM is an important technique, partly because it is an intrinsic part of structural genomics. Organize the universe into groups, get members of the groups by experiment. Model the rest. And most proteins, even after structural genomics, will be models, not experimentally determined structures. There are ~16,000 30% seq id families (90%) (Vitkup et al. Nat. Struct. Biol. 8, 559, 2001).
23
Comparative modeling and EM density fitting
24
E. coli ribosome H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, , 2003. Cryo-EM density map of the 70S ribosome of E. coli was determined at ~12Å. Calculate comparative models of the proteins and rRNA based on T. thermophilus, H. marismortui, and D. radiodurans ribosome structures. For each protein, select the best model among several alternatives based on model assessment by statistical potentials, model compactness, target-template sequence identity, quality of template structure, and fit into the EM map. Obtain an assembly model by multi-rigid body real-space refinement of the fit into the EM map, by RSRef (M. Chapman). Repeat 3 and 4 iteratively to get an optimized model.
25
E. coli ribosome Fitting of comparative models into 12Å cryo-electron density map. All 29 proteins in 30S and 29 of 36 proteins in the 50S subunit could be modeled on 22-73% seq.id. to a known structure. The models generally cover whole folds, but tails. H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, , 2003.
26
Errors in comparative models vs. resolution
Incorrect template Rigid body movement Misalignment Region without a template Distortion/shifts in aligned regions Sidechain packing 20Å 10Å 2Å Application of comparative modeling to proteins of known structure identifies the following five types of errors in comparative models. Template selection (<25% sequence identity). Misalignments (<35% sequence identity). Loop modeling, shifts, sidechain modeling (whole range). Maps courtesy of Wah Chiu.
27
Moulding: iterative alignment, model building, model assessment
B. John, A. Sali. Nucl. Acids Res. 31, 3982, 2003. Alignments Models per alignment Comparative modeling Threading Moulding 1 104 1030 105 model building alignment model assessment Idea: Start with any alignments, and then build models based on them, assess models, improve alignments in the next iteration based on assessment, and so on. Based on assessement of 3D models as opposed to alignment score. Another perspective, comparison with threading and CM. Both of these methods limit the phase space explored in the search for the best model, but both are extremes. They existed for many years, but nobody tried to generalize and explore in-between, a better balance between exploring the conformational and alignment degrees of freedom in the search for the best model. Practical considerations suggest on the order of 10,000 alignments.
28
Moulding A more detailed scheme. GA for alignment evolution. MODELLER for model building, and a variety of criteria for model assessment. alignment model building model assessment
29
Application to a difficult modeling case 1BOV-1LTS (4
Application to a difficult modeling case 1BOV-1LTS (4.4% sequence identity) initial Ca RMSD 3.6Å final Here, illustrated visually. Starting model is useless. The panel on the right will show the evolution, resulting in a much better model. Ca RMSD 10.1Å 1lts structure 1lts model
30
Given a sequence-structure pair and an EM map: Exploring alignment (CM) and rigid body (EM) degrees of freedom 1. Generate a set of comparative models by a standard procedure. 2. Select the model that fits the EM map best. More generally: Use the quality of the best fit into the EM map as one of the model fitness criteria in MOULDER. With Wah Chiu et al.
31
helixhunter, sheehunter
Moulding protocol S fold assignment helixhunter, sheehunter PROF, PSIPRED Initial alignment MODELLER alignment MODELLER model building When modeling proteins that are related at less than 30% sequence identity, the alignment errors become frequent. In addition, it is often difficult to assign a fold to the target sequence using sequence-based search methods. Moulding: iterative alignment, model building, model fitting, model assessment. Idea: Start with any alignments, and then build models based on them, assess models, improve alignments in the next iteration based on assessment, and so on. Based on assessment of 3D models as opposed to alignment score. Fold assignment and initial alignments for MOULDER will be obtained by using the secondary structure assignments from cryo-EM maps, correlated with the secondary structure predictions based on sequence alone. foldhunter, EM docking algorithm model fitting model building alignment model assessment GA341 score PROSA score EM fitting score model assessment E
32
Application: Rice Dwarf Virus (6.8Å EM)
Fold assignment: HELIXHUNTER and DEJAVU - 2 templates: Upper domain (beta sheet) - bluetongue virus VP7 (1bvp) Lower domain (helical) - rotavirus VP6 (1qhd) Generate initial alignments : HELIXHUNTER and secondary structure prediction methods MOULDING: Iterative alignment - restrained by EM data Model building with MODELLER, loop optimization Model assessment (GA341 score and PROSA score) Model fitting (FOLDHUNTER) With Wah Chiu et al.
33
Upper domain - good fit loop optimization: EM-derived restraints + MM forcefield
34
MODELLER-derived restraints
Lower domain - to improve fit best fitted structure model fitting EM-derived restraints
35
END
36
Acknowledgments http://salilab.org Comparative Modeling Bino John
Narayanan Eswar Ursula Pieper Marc A. Martí-Renom Ribosomes H.Gao J.Sengupta M.Valle A.Korostelev S.Stagg P.Van Roey R.Agrawal S.Harvey M.Chapman J.Frank Structural Genomics Stephen Burley (SGX) John Kuriyan (UCB) NY-SGRC NIH NSF Sinsheimer Foundation A. P. Sloan Foundation Burroughs-Wellcome Fund Merck Genome Res. Inst. Mathers Foundation I.T. Hirschl Foundation The Sandler Family Foundation SUN IBM Intel Structural Genomix LSD Patsy Babbitt, Fred Cohen, Ken Dill, Tom Ferrin, John Irwin, Matt Jacobson, Tanja Kortemme, Tack Kuntz, Brian Shoichet, Chris Voigt
37
Analysis Assessing the well scoring models
How similar are the models to each other? Do the models make sense given other data? Using “toy” models as benchmarks. Score
38
Analysis Search conservation of : Protein-protein contacts
Structural features frequency NUP type
39
E. coli ribosome H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, , 2003. … Calculate comparative models of the proteins and rRNA based on T. thermophilus, H. marismortui, and D. radiodurans ribosome structures. For each protein, select the best model among several alternatives based on model assessment by statistical potentials, model compactness, target-template sequence identity, quality of template structure, and fit into the EM map. Obtain the final assembly model by rigid body real-space refinement of the fit into the EM map, by RSRef (M. Chapman). Repeat 3 and 4.
40
Combining Modeller and Docking of Atomic Models into EM Density Maps (ModEM?)
41
Improving resolution and accuracy of molecular models determined by cryo-EM and comparative protein structure prediction Map courtesy of Wah Chiu.
42
Typical errors in comparative models
Incorrect template MODEL X-RAY TEMPLATE Misalignment Region without a template Distortion/shifts in aligned regions Sidechain packing Application of comparative modeling to proteins of known structure identifies the following five types of errors in comparative models. Template selection (<25% sequence identity). Misalignments (<35% sequence identity). Loop modeling, shifts, sidechain modeling (whole range). Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, , 2000.
43
Properties of points Points: “Interaction centers”
Binding patches Define “interaction centers” at each structural level. Interaction centers transmit “interactions” between assembly subunits.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.