Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Similar presentations


Presentation on theme: "Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University."— Presentation transcript:

1 Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University

2 Amino Acids A protein is composed of a central backbone and a collection of (typically) 50-2000 amino acids (a.k.a. residues). There are 20 different kinds of amino acids each consisting of up to 18 atoms, e.g., Name3-letter code1-letter code Leucine LeuL Alanine AlaA Serine SerS Glycine GlyG Valine ValV Glutamic acid GluE Threonine ThrT

3 CH 2 CH 2 CH CH 2 H C CH 3 CH 2 CH 2 CH 2 CH 2 COO - CH 2 H 3 C CH 3 CH 2 HC CH CH 2 CH 2 CH 3 HN N OH NH CH C NH 2 N + H 2 Asp Arg Val Tyr Ile His Pro Phe D R V Y I H P F O H O H O H O H O H O H O H H 3 N + CH C N CH C N CH C N CH C N CH C N CH C N CH C N CH COO - Protein Structure Protein sequence: DRVYIHPF repeating backbone structure

4 Given an amino acid sequence, e.g., MDPNCSCAAAGDSCTCANSCTCLACKCTSCK, how will it fold in 3D? Protein Folding Problem The fold is important because it determines the function of the protein.

5 Note: The pictures I’ve been giving are “cartoons” of the backbone

6 The Inverse Protein Folding Problem Instead of given a sequence, and asking what’s its fold, take a fold, and ask for all the sequences that form that fold. …VLWIXS…. …SSCILWG…

7 What do we mean by “that fold”?

8 SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

9

10

11 Can we recognize and model all folds that form a beta-trefoil, etc.? If they are evolutionarily close enough the answer is YES. Use BLAST to recognize homology (similar sequences have similar folds) and align conserved parts of the backbone. …GVFIIIMGSHGK… …GVD-LMG-HGR…

12 Comparative modeling One the backbone of the conserved core is fixed, pack in the sidechains Add loops and unstructured regions.

13 Can we recognize and model all folds that form a beta-trefoil, etc.? But STRUCTURE can be more CONSERVED that sequence—maybe the structures align but we can no longer use BLAST because the sequence similarity is too weak …GVFIIIMGSHGK… …GR—CV-GCAGR…

14 Comparative modeling If you CAN find the correct alignment, can do as before. One the backbone of the conserved core is fixed, pack in the sidechains Add loops and unstructured regions.

15 Statistical template/profile methods (Altschul et al. 1990) Hidden Markov Models (Eddy, 1998) Threading Methods (Jones et al. 1992) Combinations of two or more of the above Approaches to Structural Motif Recognition

16

17 Our Results Recognizing the Beta Helix and Beta Trefoil Folds

18 A processive fold composed of repeated super- secondary units. Each rung consists of three beta-strands separated by turn regions. No sequence repeat. The Right-handed Parallel Beta-Helix Pectate Lyase C (Yoder et al. 1993)

19 Biological Importance of Beta Helices Surface proteins in human infectious disease: virulence factors adhesins toxins allergens Proposed as a model for amyloid fibrils (e.g. Alzheimer’s and Creutzfeldt-Jakob) Virulence factors in plant pathogens

20 What was Known Solved beta-helix structures: 12 structures in PDB in 7 different SCOP families Pectate Lyase: Pectate Lyase C Pectate Lyase E Pectate Lyase Galacturonase: Polygalacturonase Polygalacturonase II Rhamnogalacturonase A Pectin Lyase: Pectin Lyase A Pectin Lyase B Chondroitinase B Pectin Methylesterase P.69 Pertactin P22 Tailspike

21 [Bradley, Cowen, Menke, King, Berger, PNAS, 2001, 98:26, 14,819- 14,824 ; Cowen, Bradley, Menke, King, Berger (2002), J Comp Biol, 9, 261-276] Performance: On PDB: no false positives & no false negatives. Recognizes beta helices in PDB across SCOP families in cross-validation. Recognizes many new potential beta helices when run on larger sequence databases. Runs in linear time (~5 min. on SWISS-PROT). BetaWrap Program

22 Histogram of protein scores for: beta helices not in database (12 proteins) non-beta helices in PDB (1346 proteins )

23 Single Rung of a Beta Helix

24 3D Pairwise Correlations Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B3 T2 B2 B1

25 Question: how can we find these correlations which are a variable distance apart in sequence?

26 Finding Candidate Wraps Assume we have the correct locations of a single T2 turn (fixed B2 & B3). Generate the 5 best-scoring candidates for the next rung. B2 B3 T2 Candidate Rung

27 Scoring Candidate Wraps (rung-to-rung) Rung-to-rung alignment score incorporates: Beta sheet pairwise alignment preferences taken from amphipathic beta structures in PDB. (w/o beta helices) Additional stacking bonuses on internal pairs. Distribution on turn lengths.

28 Scoring Candidate Wraps (5 rungs) Iterate out to 5 rungs generating candidate wraps: Score each wrap: - sum the rung-to-rung scores - B1 correlations filter - screen for alpha-helical content

29 Predicted Beta Helices Features of the 200 top-scoring proteins in the NCBI’s protein sequence database: Many proteins of similar function to the known beta- helices; some with similar sequences. A significant fraction are characterized as microbial outer membrane or cell-surface proteins. Mouse, human, worm and fly sequences significantly underrepresented – only two proteins!

30 Some Predicted Beta Helices in Human Pathogens Vibrio cholerae Helicobacter pylori Plasmodium falciparum Chlamyidia trachomatis Chlamydophilia pneumoniae Listeria monocytogenes Trypanosoma brucei Borrelia burgdorferi Leishmania donovani Bordetella bronchiseptica Trypanosoma cruizi Bordetella parapertussis Bacillus anthracis Rickettsia ricketsii Rickettsia japonica Neisseria meningitidis Legionaella pneumophilia Cholera Ulcers Malaria Venereal infection Respiratory infection Listeriosis Sleeping sickness Lyme disease Leishmaniasis Respiratory infection Sleeping sickness Whooping cough Anthrax Rocky Mtn. spotted fever Oriental spotted fever Meningitis Legionnaire’s disease

31 The beta-trefoil consists of three leaves around an axis of three-fold symmetry. The Beta-Trefoil x3 Single LeafEntire trefoil (3 leaves) B1 B3 B2 B4 Cap Barrel 1BFF (Kitagawa et al. 1991) T1 T2 T3

32 A leaf template consists of: Templates In addition, it is between 26 and 64 residues long. A trefoil template consists of three leaf templates separated by two T4 turns of length 0 to 16. a B1-strand, followed by a T1 turn of length 2 to 17, followed by a B2-strand, followed by a T2 turn of length 0 to 11, followed by a B3-strand, followed by a T3 turn of length 4 to 20, followed by a B4 strand. Cap template B1 B3 B2 B4 T2 T3 T1

33 What Pairs Do We Consider? In both the barrel and the cap, we consider both directly aligned pairs of residues and pairs of residues one- off from each other. Different tables are used for pairwise preferences for buried, exposed, and one-off pairs of residues. T1 B4 B1 B2 B3 T2 T3

34 Packing moves earlier in the modeling process In order to produce more accurate sequence-structure alignments, we return several possible “wraps” and try to pack sidechains. So sidechain packing is used earlier in the comparative modeling process; also to help find the correct sequence-structure alignment.

35 Top wraps fed to packing function. SCWRL (Canutescu, 2003) is better at packing cap than barrels. Input to SCWRL: Atomic coordinates of the backbone of cap strand pairs from a member of each trefoil superfamily in the training set. Top 4 wraps of the target sequence onto the trefoil template. Return best-scoring wrap with a good packing, if one exists, else reject. The Packing Function

36 Partial PDB file from actual trefoil Example of the Packing Phase 1 2 3 5 4 6 10 7 8 9 Steric clash ATOM 4340 N LEU B 196 41.442 … ATOM 4341 CA LEU B 196 40.705 … ATOM 4342 C LEU B 196 40.704 … ATOM 4343 O LEU B 196 41.787 … ATOM 4344 CB LEU B 196 41.441 … ATOM 4345 CG LEU B 196 41.503 … ATOM 4346 CD1 LEU B 196 41.902 … ATOM 4347 CD2 LEU B 196 40.155 … ATOM 4348 H LEU B 196 42.299 … ATOM 4349 N THR B 197 39.524 … ATOM 4350 CA THR B 197 39.397 … ATOM 4351 C THR B 197 38.506 … ATOM 4352 O THR B 197 37.700 … ATOM 4353 CB THR B 197 38.704 … ATOM 4354 OG1 THR B 197 39.307 … ATOM 4355 CG2 THR B 197 38.808 … ATOM 4356 H THR B 197 38.752 … … Known Cap LTSKD STILL 12345 67890 ATOM 1 N LEU 1 41.442 … ATOM 2 CA LEU 1 40.705 … ATOM 3 C LEU 1 40.704 … ATOM 4 O LEU 1 41.787 … ATOM 5 CB LEU 1 41.412 … ATOM 6 CG LEU 1 40.686 … ATOM 7 CD1 LEU 1 39.364 … ATOM 8 CD2 LEU 1 41.533 … ATOM 9 N ARG 2 39.524 … ATOM 10 CA ARG 2 39.397 … ATOM 11 C ARG 2 38.506 … ATOM 12 O ARG 2 37.700 … ATOM 13 CB ARG 2 38.788 … ATOM 14 CG ARG 2 39.658 … ATOM 15 CD ARG 2 38.984 … ATOM 16 NE ARG 2 39.799 … ATOM 17 CZ ARG 2 39.404 … … Predicted cap atomic positions Cap from top wrap LRVYY RILHN 12345 67890 SCWRL 1ABR (Tahirov et al. 1995) B2 B3 B2 B3

37 Toward Automation For each SCOP beta-structural template *align all known examples of fold *find pairs in conserved core *thread onto template (additionally use profiles); find candidate alignments Pack sidechains for each, determine best structure Place loops and unstructured regions

38 Toward Automation For each SCOP beta-structural template *align all known examples of fold *find pairs in conserved core *thread onto template (additionally use profiles); find candidate alignments Pack sidechains for each, determine best structure Place loops and unstructured regions

39 Multiple Structure Alignment for Remote Protein Homologs We spend the remainder of the talk discussing our new program for multiple structure alignment: MATT

40 The Multiple Structure Alignment Problem Input: atomic coordinates for the backbones of m protein structures Output: A sequence alignment of the protein structures, together with a superimposition of the structures in 3D space.

41 The Multiple Structure Alignment Problem Def: the common core of a protein structure is the set of positions where every structure contributes a residue in alignment

42 The Multiple Structure Alignment Problem Geometric criteria: Good multiple structure alignments MAXIMIZE common core size while MINIMIZING pairwise RMSDs between structures. Note: even simplified versions NP-Hard (Goldman, Istrail and Papadimitriou, 1999)

43 The Multiple Structure Alignment Problem Discrimination criteria: Good multiple structure alignments align what is “supposed to be aligned” because it is part of the evolutionarily conserved core.

44 Approaches to Structure Alignment AFP chaining methods align all short pieces and chain together using dynamic programming Contact map methods look for similarities within distance matrices Geometric hashing, secondary structure elements, etc.

45 Some Popular Structure Aligners Dali (Holm 93) VAST (Bryant 96) LOCK (Singh 97) FlexProt (Shatsky et al. 02) FATCAT (Ye&Godzik 04) LOVOALIGN (Andreani et al. 06) CE/CE-MC (Shindyalov 2000) SSAP (Orengo&Taylor 96) MultiProt (Shatsky&Wolfson 04) POSA (Ye&Godzik 05) Mustang (Konagurthu et al. 06) CBA (Ebert 07)

46 The Benchmark Datasets Globins Homstrad –1028 alignments –Each alignment contains 2-41 structures –399 sets with > 2 structures

47 The Benchmark Datasets Sabmark Superfamily set: –3645 domains in 426 subsets Twilight zone set: –1740 domains in 209 subsets Both sets contain: –Between 3 and 25 structures –Decoy structures (sequence matches that reside in different SCOP domains)

48 Matt: Multiple Alignment with Translation and Twists Matt is an AFP chaining method that additionally adds flexibility in the form of geometrically impossible bends and breaks.

49 Other work modeling flexibility In structure alignment: –Flexprot [Shatsky et al., 2002] –Fatcat/POSA [Ye&Godzik, 2004, 2005] For other reasons: –Molecular docking [Echols et al,03; Bonvin,06] –Ligand binding [Lemmen et al, 2006] –Decoy construction [Singh&Berger, 2006]

50 Outline of the Matt Algorithm

51 Results on Sabmark (Superfamily) Program NameAvg. Core SizeAvg. RMSD Multiprot68.7011.498 Mustang104.1624.146 Matt104.6922.639

52 Results on Sabmark (Twilight Zone) Program NameAvg. Core SizeAvg. RMSD Multiprot36.541.536 Mustang66.8335.035 Matt66.9672.916

53

54

55 Sabmark Decoy Set For each SCOP superfamily, positive examples of the fold, and negative examples that are –Random examples from a different superfamily –Examples from a different superfamily that are nonetheless good BLAST hits

56

57 Toward Automation For each SCOP beta-structural template *align all known examples of fold *find pairs in conserved core *thread onto template (additionally use profiles); find candidate alignments Pack sidechains for each, determine best structure Place loops and unstructured regions

58 On the Web BetawrapPro for predicting beta-helices and beta-trefoils at: http://betawrappro.csail.mit.edu http://betawrappro.csail.mit.edu Matt at: http://matt.csail.mit.edu OR http://matt.cs.tufts.eduhttp://matt.csail.mit.edu http://matt.cs.tufts.edu

59 Acknowledgements Matt Menke Andrew McDonnell Phil Bradley Bonnie Berger Jonathan King National Science Foundation


Download ppt "Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University."

Similar presentations


Ads by Google