Presentation is loading. Please wait.

Presentation is loading. Please wait.

110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15.

Similar presentations


Presentation on theme: "110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15."— Presentation transcript:

1 110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15

2 210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction Chp 15 - pp 214 - 230 Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction Chp 16 - pp 231 - 242 Fri Oct 18 - Lecture 25 Gene Prediction Chp 8 - pp 97 - 112 Required Reading (before lecture)

3 310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction New Reading & Homework Assignment ALL: HomeWork #4 (emailed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat Oct 13

4 410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Seminars Last Week Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar The Computational Microscope 2:10 PM in E164 Lagomarcino http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulte n_Seminar.pdf http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulte n_Seminar.pdf Check out links on Schulten's website (videos, etc) http://www.ks.uiuc.edu/~kschulte/ Great seminar - amazing simulations of dynamics in proteins and large macromolecular assemblies Very computationally intensive - very impressive demonstration of power of computation to produce insights not attainable using only experimental approaches

5 510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB Sachdeve Sidhu (Genentech) Phage peptide and antibody libraries in protein engineering and ligand selection Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI Lyric Bartholomay (Ent, ISU) TBA

6 610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Sequence & Structure: Analysis Diamond STING Millennium - Many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/ http://trantor.bioc.columbia.edu/SMS/ SwissProt (UniProt) Protein knowledgebase http://us.expasy.org/sprot InterPro S equence analysis tools http://www.ebi.ac.uk/interpro

7 710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 14 - Secondary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 14 Protein Secondary Structure Prediction √Secondary Structure Prediction for Globular Proteins √Secondary Structure Prediction for Transmembrane Proteins √Coiled-Coil Prediction

8 810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Where Find "Actual" Secondary Structure? In the PDB

9 910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction How Does Predicted Secondary Structure Compare with Actual? (An example) QueryMAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIENEE GOR VCCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCC FDMCCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC CDMCCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC DSSP Author Predicted - Using 3 methods (from CMD server, Jernigan Group, ISU) Actual - Calculated from PDB coordinates by DSSP or author:

10 1010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 15 - Tertiary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 15 Protein Tertiary Structure Prediction Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP

11 1110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Structural Genomics - Status & Goal ~ 20,000 "traditional" genes in human genome (recall, this is fewer than earlier estimate of 30,000) ~ 2,000 proteins in a typical cell > 4.9 million sequences in UniProt (Oct 2007) > 46,000 protein structures in the PDB (Oct 2007) Experimental determination of protein structure lags far behind sequence determination! Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction

12 1210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Structural Genomics Project TargetDB: Database of Structural Genomics Targets http://targetdb.pdb.org

13 1310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction PMDB: Protein Model Database http://mi.caspur.it/PMDB/help.php also, via NAR's Molecular Biology Database Collection http://www.oxfordjournals.org/nar/database/summary/855 Database of Theoretical Structures? Theoretical structural models (predicted) are no longer accepted by the PDB (since 10/15/06); but, it is possible to search for models deposited earlier: http://www.rcsb.org/pdb/search/searchModels.do

14 1410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Structure Prediction or Protein Folding Problem "Major unsolved problem in molecular biology" In cells:spontaneous assisted by enzymes assisted by chaperones In vitro: many proteins can fold to their "native" states spontaneously & without assistance but, many do not!

15 1510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Deciphering the Protein Folding Code Protein Structure Prediction or Protein Folding Problem Given the amino acid sequence of a protein, predict its 3-dimensional structure (fold) Inverse Folding Problem Given a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure

16 1610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Structure Prediction Structure is largely determined by sequence BUT: Similar sequences can assume different structures Dissimilar sequences can assume similar structures Many proteins are multi-functional 2 Major Protein Folding Problems: 1- Determine folding pathway 2- Predict tertiary structure from sequence Both still largely unsolved problems

17 1710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Steps in Protein Folding 1- "Collapse"- driving force is burial of hydrophobic aa’s (fast - msecs) 2- Molten globule - helices & sheets form, but "loose" (slow - secs) 3- "Final" native folded state - compaction & rearrangement of 2' structures Native state? - assumed to be lowest free energy - may be an ensemble of structures

18 1810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Dynamics Protein in native state is NOT static Function of many proteins requires conformational changes, sometimes large, sometimes small Globular proteins are inherently "unstable" (NOT evolved for maximum stability) Energy difference between native and denatured state is very small (5-15 kcal/mol) (this is equivalent to ~ 2 H-bonds!) Folding involves changes in both entropy & enthalpy

19 1910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Difficulty of Tertiary Structure Prediction Folding or tertiary structure prediction problem can be formulated as a search for minimum energy conformation Search space is defined by psi/phi angles of backbone and side-chain rotamers Search space is enormous even for small proteins! Number of local minima increases exponentially with number of residues Computationally it is an exceedingly difficult problem!

20 2010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Tertiary Structure Prediction Methods 2 (or 3) Major Methods: 1.Comparative Modeling: Homology Modeling (easiest!) Threading and Fold Recognition (harder) 2.Ab Initio Protein Structural Prediction (really hard)

21 2110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Comparative Modeling? Comparative modeling - term is sometimes used interchangeably with homology modeling, but also sometimes used to mean both: homology modeling threading/fold recognition

22 2210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Ab Initio Prediction 1.Develop energy function bond energy bond angle energy dihedral angle energy van der Waals energy electrostatic energy 2.Calculate structure by minimizing energy function usually Molecular Dynamics (MD) or Monte Carlo (MC)  Ab initio prediction - impractical for most real (long) proteins Computationally? very expensive Accuracy? Usually poor for all except short peptides  (but much improvement recently!) Provides both folding pathway & folded structure

23 2310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Comparative Modeling Provide folded structure only Two types: 1) Homology modeling 2) Threading (fold recognition) Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target

24 2410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Homology Modeling 1.Identify homologous protein sequences (  -BLAST) 2.Among available structures (in PDB), choose one with closest sequence to target as template (can combine steps 1 & 2 by using PDB-BLAST) 3.Build model by placing target sequence residues in corresponding positions on homologous structure & refine by "tweaking" modeled structure (energy minimization)  Homology modeling - works "well" Computationally? "relatively" inexpensive Accuracy? higher sequence identity  better model  Requires ~30% sequence identity with sequence for which structure is known

25 2510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading - Fold Recognition Identify “best” fit between target sequence & template structure 1.Develop energy function 2.Develop template library 3.Align target sequence with each template in library & score 4.Identify top scoring template (1D to 3D alignment) 5.Refine structure as in homology modeling  Threading - works "sometimes" Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading is used Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")  Usually, higher sequence identity to protein of known structure  better model

26 2610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading: the Motivation Basic premise: Statistics from Protein Data Bank (>46,000 structures) Thus, chances for a protein to have a native-like structural fold in PDB are quite good Note: Proteins with similar structural folds could be either homologs or analogs The number of unique structural folds in nature is fairly small (probably 2000-3000) Prior to Structural Genomics Project, 90% of "new" structures submitted to PDB were similar to existing folds in PDB - suggesting that almost all folds in nature have been identified

27 2710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 1.Align target sequence with template structures in fold library (usually from the PDB) 2.Calculate energy score to evaluate "goodness of fit" between target sequence & template structure 3.Rank models based on energy scores Target Sequence Structure Templates ALKKGF…HFDTSE Steps in Threading

28 2810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading Goal - & Issues Structure database - must be "complete" Can't build a good model if there is no good template in library! Sequence-structure alignment algorithm: Bad alignment  Bad score! Energy function or Scoring Scheme: Must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments Must distinguish “correct” fold from close decoys Prediction reliability assessment - How determine whether predicted structure is correct? (or even close?) Find “correct” sequence-structure alignment of a target sequence with its native-like fold in template library (usually derived from PDB)

29 2910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading: Template database Build a database of structural templates e.g., ASTRAL domain library derived from the PDB Sometimes, supplement with additional decoys e.g., generated using ab initio approach such as Rosetta (Baker)

30 3010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading: Energy function Two main methods (& combinations of these) Structural profile (environmental) physicochemical properties of amino acids Contact potential (statistical) based on contact statistics from PDB famous one : Miyazawa & Jernigan (ISU)

31 3110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Threading: Typical energy function How well does a specific residue fit structural environment? What is "probability" that two specific residues are in contact? Alignment gap penalty? Total energy: E p + E s + E g Goal: Find a sequence-structure alignment that minimizes energy function

32 3210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction A Local Example: Rapid Threading Approach for Protein Structure Prediction Kai-Ming Ho, Physics Haibo Cao Yungok Ihm Zhong Gao James Morris Cai-zhuang Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Jeff Sander Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004) Three-dimensional threading approach to protein structure recognition Polymer 45:687-697

33 3310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Motivations for & Assumptions of Ho Threading Algorithm Goal: Develop a threading algorithm that: Is simple & rapid enough to be used in high throughput applications Is relatively "insensitive" to sequence similarity between target protein sequence & sequence of template structure (to enhance detection of remote homologs & structures that are similar due to convergent evolution) Can be used to answer questions such as: What are predicted structures of all "unassigned" ORFs in Arabidopsis? Does Arabidopsis have a protein with structure similar to mammalian Tumor Necrosis Factor (TNF)? Assumptions: Native state of a protein is lowest free energy state Hydrophobic interactions drive protein folding

34 3410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Simplify: Template structure representation Å if (contact) Otherwise A neighbor in sequence (non-contact) i j 1 N Template structure ( contact matrix) Yungok Ihm

35 3510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Simplify: Target Sequence Representation Miyazawa-Jernigan (MJ) model: inter-residue contact energy M(i,j) is a quasi-chemical approximation based on pair- wise contact statistics extracted from known protein structures in the PDB: 20 X 20 matrix = 210 values ("letters") Li-Tang-Wingreen (LTW): factorize the MJ interaction matrix to reduce the number of parameters associated with amino acids from 210 to 20 q values Hydrophobic-Polar (HP): represent amino acids as either H (hydrophobic) or polar (P); Dill et al demonstrated the utility of this simple binary alphabet representation: 2 values Compare results with 210 vs 20 vs 2 letter representations How low can we go?

36 3610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Simplify: Energy Function Interaction “counts” only if two hydrophobic amino acid residues are in contact At residue level, pair-wise hydrophobic interaction is dominant: E =  i,j C ij U ij C ij : contact matrix U ij = U ( residue I, residue J ) MJ : U = U ij LTW : U = Q i *Q j HP : U = {1,0} Yungok Ihm

37 Energy calculation: Contact energy Miyazawa-Jernigan (MJ) matrix : 210 parameters Statistical potential Li-Tang-Wingreen (LTW) : 20 parameters Contact Energy : with C M F I L CMFILVWCMFILVW 046 054 -020 049 -001 006 057 001 003 -008 052 018 010 -001 -004 ~ solubility ~ hydrophobicity contact matrix Yungok Ihm

38 i j 1 N Template Structure Contact Energy Contact Matrix Sequence AVFMRIHNDIVYNDIANTTQ Sequence Vector otherwise (a neighbor in sequence),0 56 if,1   ij C rC Å Scoring Function Summary of Ho Threading Procedure Yungok Ihm

39 Can complexity be further reduced? Consider simplifying structure representation, too ALKKGF…HFDTSE Sequence – Structure (1D – 3D problem) (1D – 2D problem) (1D – 1D problem) Sequence – Contact Matrix Sequence – 1D Profile Haibo Cao

40 Examine eigenvectors of contact matrix Hydrophobic Contacts :i-th eigenvector :eigenvector with largest eigenvalue :i-th eigenvalue of :fraction of hydrophobic contacts from i-th eigenvector :protein sequence of the template structure :contact matrix Haibo Cao

41 Represent contact matrix by its dominant eigenvector (1D profile) First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure Higher ranking (rank > 4) eigenvectors are “sequence blind” Haibo Cao

42 4210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading Alignment Step - now fast! Align target sequence vector (1D) with eigenvector profile of template structure (1D) 1D Profile Maximize the overlap between the Sequence ( S ) and the profile ( P ) allowing gaps Calculate contact energy using the alignment: E c New profile Cao et al Polymer 45 (2004)

43 4310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Parameters for alignment? Gap penalty: Insertion/deletion in helices or strands is strongly penalized; smaller penalties for in/dels in loops Gap penalties apply to alignment score only, not to energy calculation Size penalty: If a target residue and aligned template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized Size penalties apply to alignment score only, not to energy calculation Loop Helix ALKKGFG…HFDTSE Yungok Ihm

44 4410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction How incorporate secondary structure? Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V) N + = total number of matches between predicted & actual secondary structure of template N - = total number of mismatches N s = total number of residues selected in alignment “Global fitness” : f = 1 + (N + - N - ) / N s E mod = f * E threading Yungok Ihm

45 How much better is this “fit” than random? E shuffle : Shuffled Sequence vs Structure E relative = E mod – E shuffled Yungok Ihm Avg E score for same sequence shuffled (randomized) many times E score modifed to reflect fit with predicted 2' structure

46 4610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Performance Evaluation? "Blind Test" CASP5 Competition (CASP7 is most recent) (Critical Assessment of Protein Structure Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published)

47 4710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Typical Results: (well, actually, our BEST Results): HO = #1-Ranked CASP5 Prediction for this Target Target 174 PDB ID = 1MG7 Actual Structure Predicted Structure T174_1 T174_2 Cao, Ihm, Wang, Dobbs, Ho

48 4810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction FR Fold Recognition (targets manually assessed by Nick Grishin) ----------------------------------------------------------- Rank Z-Score Ngood Npred NgNW NpNW Group-name 1 24.26 9.00 12.00 9 12 Ginalski 2 21.64 7.00 12.00 7 12 Skolnick Kolinski 3 19.55 8.00 12.50 9 14 Baker 4 16.88 6.00 10.00 6 10 BIOINFO.PL 5 15.25 7.00 7.00 7 7 Shortle 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA 7 13.49 4.00 11.00 4 11 Brooks 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming 9 10.45 3.00 5.50 3 6 Jones-NewFold ----------------------------------------------------------- FR NgNW - number of good predictions without weighting for multiple models FR NpNW - number of total predictions without weighting for multiple models Overall Performance in CASP5 Contest ~8th out of 180 (M. Levitt, Stanford)

49 4910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction CASP - Check it out! Critical Assessment of Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/ http://predictioncenter.gc.ucdavis.edu/ CASP7 contest - 2006: http://www.predictioncenter.org/casp7/Casp7.html Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP, EVA) & URLs for them Related contests & resources: Protein Function Prediction (part of CASP) CAPRI = Critical Assessment of Predicted Interactions New: CASPM = CASP for M = Mutant proteins Predict effects of small (point) mutations, e.g., SNPs

50 5010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Another Convenient List of Links for Protein Prediction Servers http://en.wikipedia.org/wiki/List_of_protein_structure_pre diction_software

51 5110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification Protein Structural Visualization  Protein Structure Comparison Protein Structure Classification

52 5210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Structure Comparison Methods 3 Basic Approaches for Aligning Structures (see Xiong textbook for details) 1.Intermolecular 2.Intramolecular 3.Combined But, very active research area - many recent new methods 3 Popular Methods: DALI = Distance Matrix Alignment of Structures (Holm) FSSP Database SSAP = Sequential Structure Alignment Program (Orengo) CATH Database CE = Combinatorial Extension (Bourne) VAST at NCBI URLS: http://en.wikipedia.org/wiki/Structural_alignment_software

53 5310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Another local example : Combining Structure Prediction, Machine Learning & "Real" (wet-lab) Experiments to Investigate the Lentiviral Rev Protein: A Step Toward New HIV Therapies Susan Carpenter (Washington State Univ) Wendy Sparks Yvonne Wannemuehler Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Kai-Ming Ho, Physics Yungok Ihm Haibo Cao Cai-zhuang Wang Gloria Culver, BBMB Laura Dutca

54 5410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Provirus Cytoplasm Nucleus Late: Structural Proteins Progeny RNA Macromolecular interactions mediated by Rev protein in lentiviruses (HIV & EIAV) pre-mRNA AAAA (protein-protein) NUCLEAR EXPORT AAAA Rev NUCLEAR IMPORT Spliceosome AAAA Early: Regulatory Proteins Tat Rev MULTIMERIZATION AAAA Rev RNA BINDING Rev (protein-RNA) Susan Carpenter

55 5510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Rev is essential for lentiviral replication Rev is a small nucleoplasmic shuttling protein (HIV Rev 115 aa; EIAV Rev 165 aa) Recognizes a specific binding site on viral RNA: Rev Responsive Element (RRE) Interacts with CRM1 to export incompletely spliced viral RNAs from nucleus to the cytoplasm Specific domains of Rev mediate nuclear localization, RNA binding, and nuclear export Critical role of Rev in lentiviral replication makes it an attractive target for antiviral (AIDs) therapy

56 5610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Problem: no high resolution Rev structure! not even for HIV Rev, despite intense effort ($$) Why?? Rev aggregates at concentrations needed for NMR or X- ray crystallography What about insights from sequence comparisons? "undetectable" sequence similarity among Revs from different lentiviruses (eg, EIAV vs HIV <10%) But: We know that lentiviral Rev proteins are functionally "homologous" - even in highly diverse lentiviruses

57 5710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Computationally model structures of lentiviral Rev proteins - using structural threading algorithm (with Ho et al) Predict critical residues for RNA-binding, protein interaction - using machine learning algorithms (with Honavar et al ) Test model and predictions - using genetic/biochemical approaches (with Carpenter & Culver) - using biophysical approaches (with Andreotti & Yu groups) Initially: focus on EIAV Rev & RRE Hypothesis: Rev proteins from diverse lentiviruses share structural features critical for function Approach:

58 5810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction  HIV-1 Rev Functional domains: EIAV vs HIV Rev 1 31 165  EIAV Rev NES NLS RRDRW ERLEKRRRK RBM Folding ? exon 1 exon 2 NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif 1 116 NES NLS/RBM RQARRNRRRRWR

59 5910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Predicted EIAV Rev Structure Yungok Ihm

60 6010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction EIAVHIVFIV SIV DimerHIV Dimer Comparison of Predicted Rev Structures Yungok Ihm

61 6110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction A Predicted Structure HIV Rev N-terminus B NMR Structure HIV Rev N-terminal Peptide (Battiste & Williamson) C Overlay Alignment of Predicted & NMR Structures Predicted vs Experimental Structure of N-terminal region of HIV Rev Yungok Ihm

62 6210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Location of functional residues EIAV Rev? Yungok Ihm Putative RBM NES Leu36,45,49: On surface, consistent with role in nuclear export Leu95 & Leu109: Buried in core, critical hydrophic contacts for fold?

63 6310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Mutate hydrophobic residues predicted to be critical for helical packing in core L65 L95 L109 Yungok Ihm Single Ala Mutation L  A Single Asp Mutation L  D Negligible effect on Rev activity Dramatic change in Rev activity? Insert charged aa in hydrophobic core Double Ala Mutation L  L  A  A Reduction in Rev activity? L65 vs L95 & L109 Single mutants: Leu to Ala Leu to Asp Double mutants: Leu to Ala

64 6410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Activity of Rev Structural Mutants Sham RI pcDNA3 Functional Analysis of Rev Structural Mutants in vivo (CAT assay) Wendy Sparks

65 6510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Functional domains: EIAV vs HIV Rev  HIV-1 Rev - RNA interaction - Protein interaction NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif Green Red 1 116 NES NLS/RBM RQARRNRRRRWR  EIAV Rev NES NLS RRDRW ERLEKRRRK RBM Folding ?

66 6610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Putative RNA-binding Motifs & Predicted RNA-binding Residues Mapped onto Predicted EIAV Rev Structure 61 71 81 91 ARRHLGPGPT QHTPS RRDRW IREQILQAEV L Q ERLE WRIR … ++ +++++++ +++++ +++++ + + 31 41 51 61 71 81 91 101 111 121 131 141 151 161 DPQGPLESDQ WCRVLRQSLP EEKISSQTCI ARRHLGPGPT QHTPS RRDRW IREQILQAEV L QERLE WRIR GVQQVAKELG EVNRGIWREL HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDS KR RRK HL ++ + ++ +++++++ +++++ +++++ + + + + ++++ ++ +++ ++++++++ ++ +++ ++ 121 131 141 151 161 HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDS KRRRK HL + ++++ ++ +++ ++++++++ +++++ ++ Michael Terribilini Yungok Ihm KRRRK RRDRW ERLE

67 6710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Express & purify MBP-ERev deletion mutants 60 42 30 22 Marker MBP 1-165 31-165 31-14557-165 57-14557-124 125-165146-165 MBP-ERev 1-165 31-165 31-145 57-165 57-145 57-124 125-165 146-165 NES NLS 1 31 57 125 146 165 RBM Folding? Jae-Hyung Lee MBP

68 6810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction MBP-ERev binds specifically to RRE in vitro sense antisense 31-165 BSA MBP 1-165 BSA MBP 1-165 31-165 Cold RRE No protein No cold RRE UV crosslinkingCompetition Undigested 32 P-RRE Jae-Hyung Lee

69 PREDICTED: Structure Protein binding residues RNA binding residues KRRRK RRDRW VALIDATED: Protein binding residues RNA binding residues EIAV Rev: Binding Predictions vs Experiments + + 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ++++++++++ ++ +++ ++++ ++ 61 71 81 91 ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI +++++++++++++++ +++++++++++++ +++ 41 51 GP L ESDQWCRV L RQS L PEEKISSQTCI ++ + +++++ + + Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 57-165 MBP WT 31-16531-145 145-165 RRDRW ERLE KRRRK NES 57 125145 165 31 FOLD? NLS/RBM RBM Jae-Hyung Lee

70 7010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction AADAA AALA KAAAK Roles of Putative RNA Binding Motifs? NES NLS RRDRW ERLEKRRRK RBD ERDE RBD 1 31 57 124 146 165 Jae-Hyung Lee

71 Rev RNA Binding Motifs: Predicted vs Experiment AADAA AALA KAAAK ERDE PREDICTED: Structure Protein binding residues RNA binding residues KRRRK RRDRW VALIDATED: Protein binding residues RNA binding residues + + 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ++++++++++ ++ +++ ++++ ++ 61 71 81 91 ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI +++++++++++++++ +++++++++++++ +++ 41 51 GP L ESDQWCRV L RQS L PEEKISSQTCI ++ + +++++ + +   RRDRW ERLE KRRRK NES 57 125 145 165 31 KAAAK AADAA AALA ERDE WT NLS RBMFOLD? NLS/RBM Jae-Hyung Lee

72 KRRRK RRDRW Summary: Predictions vs Experiments 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ++++++++++ ++ +++ ++++ ++ 61 71 81 91 ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI +++++++++++++++ +++++++++++++ +++ 41 51 GP L ESDQWCRV L RQS L PEEKISSQTCI ++ + +++++ + + Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 RRDRW ERLE KRRRK NES 57 125 145 165 31 FOLD NLS/RBM RBM ERLE

73 7310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Conclusions & Future Directions Combination of computational & wet lab approaches revealed that: EIAV Rev has a bipartite RNA binding domain Two Arg-rich RBMs are critical RRDRW in central region (but not ERLE) KRRRK at C-terminus, overlapping the NLS Based on computational modeling, the RBMs are in close proximity within the 3-D structure of protein Lentiviral Rev proteins & their cognate RRE binding sites may be more similar in structure than has been appreciated Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 Future: Computational: Use Rev-RRE model system to discover "predictive rules" for protein-RNA recognition Experimental?

74 7410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Experimentally determine the structure of Rev-RRE complex !!!

75 Building “Designer” Zinc Finger DNA-binding Proteins J Sander, P Zaback, F Fu, J Townsend, R Winfrey D Wright, K Joung, L Miller, D Dobbs, D Voytas Wright et al (2006) Nature Protocols Sander et al (2007) Nucleic Acids Res

76 7610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) Introduction Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation


Download ppt "110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15."

Similar presentations


Ads by Google