110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Structure Prediction
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
1 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting A Computational.
Thomas Blicher Center for Biological Sequence Analysis
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Lecture 10 – protein structure prediction. A protein sequence.
Representations of Molecular Structure: Bonds Only.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont.1 11/9/05 Protein Structure Databases (continued) Prediction & Modeling.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 Lecture 22  Secondary Structure Prediction  Tertiary Structure.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17.
110/19/07BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects BCB 444/544 Lecture 25  More RNA Structure  BCB 544 Projects #25_Oct19.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
10/8/07BCB 444/544 F07 ISU Dobbs #20 - Protein Structure Basics & Classification1 BCB 444/544 Lecture 20 Protein Structure Basics, Visualization, Classification.
Motif Search and RNA Structure Prediction Lesson 9.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
PROTEIN MODELLING Presented by Sadhana S.
Protein dynamics Folding/unfolding dynamics
Protein Structure Prediction
Protein Structures.
Protein structure prediction.
Presentation transcript:

110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15

210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction Chp 15 - pp Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction Chp 16 - pp Fri Oct 18 - Lecture 25 Gene Prediction Chp 8 - pp Required Reading (before lecture)

310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction New Reading & Homework Assignment ALL: HomeWork #4 ( ed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33: (PDF posted on website) Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by on Sat Oct 13

410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Seminars Last Week Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar The Computational Microscope 2:10 PM in E164 Lagomarcino n_Seminar.pdf n_Seminar.pdf Check out links on Schulten's website (videos, etc) Great seminar - amazing simulations of dynamics in proteins and large macromolecular assemblies Very computationally intensive - very impressive demonstration of power of computation to produce insights not attainable using only experimental approaches

510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB Sachdeve Sidhu (Genentech) Phage peptide and antibody libraries in protein engineering and ligand selection Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI Lyric Bartholomay (Ent, ISU) TBA

610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Sequence & Structure: Analysis Diamond STING Millennium - Many useful structure analysis tools, including Protein Dossier SwissProt (UniProt) Protein knowledgebase InterPro S equence analysis tools

710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 14 - Secondary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 14 Protein Secondary Structure Prediction √Secondary Structure Prediction for Globular Proteins √Secondary Structure Prediction for Transmembrane Proteins √Coiled-Coil Prediction

810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Where Find "Actual" Secondary Structure? In the PDB

910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction How Does Predicted Secondary Structure Compare with Actual? (An example) QueryMAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIENEE GOR VCCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCC FDMCCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC CDMCCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC DSSP Author Predicted - Using 3 methods (from CMD server, Jernigan Group, ISU) Actual - Calculated from PDB coordinates by DSSP or author:

1010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 15 - Tertiary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 15 Protein Tertiary Structure Prediction Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP

1110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Structural Genomics - Status & Goal ~ 20,000 "traditional" genes in human genome (recall, this is fewer than earlier estimate of 30,000) ~ 2,000 proteins in a typical cell > 4.9 million sequences in UniProt (Oct 2007) > 46,000 protein structures in the PDB (Oct 2007) Experimental determination of protein structure lags far behind sequence determination! Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction

1210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Structural Genomics Project TargetDB: Database of Structural Genomics Targets

1310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction PMDB: Protein Model Database also, via NAR's Molecular Biology Database Collection Database of Theoretical Structures? Theoretical structural models (predicted) are no longer accepted by the PDB (since 10/15/06); but, it is possible to search for models deposited earlier:

1410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Structure Prediction or Protein Folding Problem "Major unsolved problem in molecular biology" In cells:spontaneous assisted by enzymes assisted by chaperones In vitro: many proteins can fold to their "native" states spontaneously & without assistance but, many do not!

1510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Deciphering the Protein Folding Code Protein Structure Prediction or Protein Folding Problem Given the amino acid sequence of a protein, predict its 3-dimensional structure (fold) Inverse Folding Problem Given a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure

1610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Structure Prediction Structure is largely determined by sequence BUT: Similar sequences can assume different structures Dissimilar sequences can assume similar structures Many proteins are multi-functional 2 Major Protein Folding Problems: 1- Determine folding pathway 2- Predict tertiary structure from sequence Both still largely unsolved problems

1710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Steps in Protein Folding 1- "Collapse"- driving force is burial of hydrophobic aa’s (fast - msecs) 2- Molten globule - helices & sheets form, but "loose" (slow - secs) 3- "Final" native folded state - compaction & rearrangement of 2' structures Native state? - assumed to be lowest free energy - may be an ensemble of structures

1810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Dynamics Protein in native state is NOT static Function of many proteins requires conformational changes, sometimes large, sometimes small Globular proteins are inherently "unstable" (NOT evolved for maximum stability) Energy difference between native and denatured state is very small (5-15 kcal/mol) (this is equivalent to ~ 2 H-bonds!) Folding involves changes in both entropy & enthalpy

1910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Difficulty of Tertiary Structure Prediction Folding or tertiary structure prediction problem can be formulated as a search for minimum energy conformation Search space is defined by psi/phi angles of backbone and side-chain rotamers Search space is enormous even for small proteins! Number of local minima increases exponentially with number of residues Computationally it is an exceedingly difficult problem!

2010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Tertiary Structure Prediction Methods 2 (or 3) Major Methods: 1.Comparative Modeling: Homology Modeling (easiest!) Threading and Fold Recognition (harder) 2.Ab Initio Protein Structural Prediction (really hard)

2110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Comparative Modeling? Comparative modeling - term is sometimes used interchangeably with homology modeling, but also sometimes used to mean both: homology modeling threading/fold recognition

2210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Ab Initio Prediction 1.Develop energy function bond energy bond angle energy dihedral angle energy van der Waals energy electrostatic energy 2.Calculate structure by minimizing energy function usually Molecular Dynamics (MD) or Monte Carlo (MC)  Ab initio prediction - impractical for most real (long) proteins Computationally? very expensive Accuracy? Usually poor for all except short peptides  (but much improvement recently!) Provides both folding pathway & folded structure

2310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Comparative Modeling Provide folded structure only Two types: 1) Homology modeling 2) Threading (fold recognition) Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target

2410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Homology Modeling 1.Identify homologous protein sequences (  -BLAST) 2.Among available structures (in PDB), choose one with closest sequence to target as template (can combine steps 1 & 2 by using PDB-BLAST) 3.Build model by placing target sequence residues in corresponding positions on homologous structure & refine by "tweaking" modeled structure (energy minimization)  Homology modeling - works "well" Computationally? "relatively" inexpensive Accuracy? higher sequence identity  better model  Requires ~30% sequence identity with sequence for which structure is known

2510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading - Fold Recognition Identify “best” fit between target sequence & template structure 1.Develop energy function 2.Develop template library 3.Align target sequence with each template in library & score 4.Identify top scoring template (1D to 3D alignment) 5.Refine structure as in homology modeling  Threading - works "sometimes" Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading is used Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")  Usually, higher sequence identity to protein of known structure  better model

2610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading: the Motivation Basic premise: Statistics from Protein Data Bank (>46,000 structures) Thus, chances for a protein to have a native-like structural fold in PDB are quite good Note: Proteins with similar structural folds could be either homologs or analogs The number of unique structural folds in nature is fairly small (probably ) Prior to Structural Genomics Project, 90% of "new" structures submitted to PDB were similar to existing folds in PDB - suggesting that almost all folds in nature have been identified

2710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 1.Align target sequence with template structures in fold library (usually from the PDB) 2.Calculate energy score to evaluate "goodness of fit" between target sequence & template structure 3.Rank models based on energy scores Target Sequence Structure Templates ALKKGF…HFDTSE Steps in Threading

2810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading Goal - & Issues Structure database - must be "complete" Can't build a good model if there is no good template in library! Sequence-structure alignment algorithm: Bad alignment  Bad score! Energy function or Scoring Scheme: Must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments Must distinguish “correct” fold from close decoys Prediction reliability assessment - How determine whether predicted structure is correct? (or even close?) Find “correct” sequence-structure alignment of a target sequence with its native-like fold in template library (usually derived from PDB)

2910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading: Template database Build a database of structural templates e.g., ASTRAL domain library derived from the PDB Sometimes, supplement with additional decoys e.g., generated using ab initio approach such as Rosetta (Baker)

3010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading: Energy function Two main methods (& combinations of these) Structural profile (environmental) physicochemical properties of amino acids Contact potential (statistical) based on contact statistics from PDB famous one : Miyazawa & Jernigan (ISU)

3110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Threading: Typical energy function How well does a specific residue fit structural environment? What is "probability" that two specific residues are in contact? Alignment gap penalty? Total energy: E p + E s + E g Goal: Find a sequence-structure alignment that minimizes energy function

3210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction A Local Example: Rapid Threading Approach for Protein Structure Prediction Kai-Ming Ho, Physics Haibo Cao Yungok Ihm Zhong Gao James Morris Cai-zhuang Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Jeff Sander Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004) Three-dimensional threading approach to protein structure recognition Polymer 45:

3310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Motivations for & Assumptions of Ho Threading Algorithm Goal: Develop a threading algorithm that: Is simple & rapid enough to be used in high throughput applications Is relatively "insensitive" to sequence similarity between target protein sequence & sequence of template structure (to enhance detection of remote homologs & structures that are similar due to convergent evolution) Can be used to answer questions such as: What are predicted structures of all "unassigned" ORFs in Arabidopsis? Does Arabidopsis have a protein with structure similar to mammalian Tumor Necrosis Factor (TNF)? Assumptions: Native state of a protein is lowest free energy state Hydrophobic interactions drive protein folding

3410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Simplify: Template structure representation Å if (contact) Otherwise A neighbor in sequence (non-contact) i j 1 N Template structure ( contact matrix) Yungok Ihm

3510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Simplify: Target Sequence Representation Miyazawa-Jernigan (MJ) model: inter-residue contact energy M(i,j) is a quasi-chemical approximation based on pair- wise contact statistics extracted from known protein structures in the PDB: 20 X 20 matrix = 210 values ("letters") Li-Tang-Wingreen (LTW): factorize the MJ interaction matrix to reduce the number of parameters associated with amino acids from 210 to 20 q values Hydrophobic-Polar (HP): represent amino acids as either H (hydrophobic) or polar (P); Dill et al demonstrated the utility of this simple binary alphabet representation: 2 values Compare results with 210 vs 20 vs 2 letter representations How low can we go?

3610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Simplify: Energy Function Interaction “counts” only if two hydrophobic amino acid residues are in contact At residue level, pair-wise hydrophobic interaction is dominant: E =  i,j C ij U ij C ij : contact matrix U ij = U ( residue I, residue J ) MJ : U = U ij LTW : U = Q i *Q j HP : U = {1,0} Yungok Ihm

Energy calculation: Contact energy Miyazawa-Jernigan (MJ) matrix : 210 parameters Statistical potential Li-Tang-Wingreen (LTW) : 20 parameters Contact Energy : with C M F I L CMFILVWCMFILVW ~ solubility ~ hydrophobicity contact matrix Yungok Ihm

i j 1 N Template Structure Contact Energy Contact Matrix Sequence AVFMRIHNDIVYNDIANTTQ Sequence Vector otherwise (a neighbor in sequence),0 56 if,1   ij C rC Å Scoring Function Summary of Ho Threading Procedure Yungok Ihm

Can complexity be further reduced? Consider simplifying structure representation, too ALKKGF…HFDTSE Sequence – Structure (1D – 3D problem) (1D – 2D problem) (1D – 1D problem) Sequence – Contact Matrix Sequence – 1D Profile Haibo Cao

Examine eigenvectors of contact matrix Hydrophobic Contacts :i-th eigenvector :eigenvector with largest eigenvalue :i-th eigenvalue of :fraction of hydrophobic contacts from i-th eigenvector :protein sequence of the template structure :contact matrix Haibo Cao

Represent contact matrix by its dominant eigenvector (1D profile) First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure Higher ranking (rank > 4) eigenvectors are “sequence blind” Haibo Cao

4210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Threading Alignment Step - now fast! Align target sequence vector (1D) with eigenvector profile of template structure (1D) 1D Profile Maximize the overlap between the Sequence ( S ) and the profile ( P ) allowing gaps Calculate contact energy using the alignment: E c New profile Cao et al Polymer 45 (2004)

4310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Parameters for alignment? Gap penalty: Insertion/deletion in helices or strands is strongly penalized; smaller penalties for in/dels in loops Gap penalties apply to alignment score only, not to energy calculation Size penalty: If a target residue and aligned template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized Size penalties apply to alignment score only, not to energy calculation Loop Helix ALKKGFG…HFDTSE Yungok Ihm

4410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction How incorporate secondary structure? Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V) N + = total number of matches between predicted & actual secondary structure of template N - = total number of mismatches N s = total number of residues selected in alignment “Global fitness” : f = 1 + (N + - N - ) / N s E mod = f * E threading Yungok Ihm

How much better is this “fit” than random? E shuffle : Shuffled Sequence vs Structure E relative = E mod – E shuffled Yungok Ihm Avg E score for same sequence shuffled (randomized) many times E score modifed to reflect fit with predicted 2' structure

4610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Performance Evaluation? "Blind Test" CASP5 Competition (CASP7 is most recent) (Critical Assessment of Protein Structure Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published)

4710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Typical Results: (well, actually, our BEST Results): HO = #1-Ranked CASP5 Prediction for this Target Target 174 PDB ID = 1MG7 Actual Structure Predicted Structure T174_1 T174_2 Cao, Ihm, Wang, Dobbs, Ho

4810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction FR Fold Recognition (targets manually assessed by Nick Grishin) Rank Z-Score Ngood Npred NgNW NpNW Group-name Ginalski Skolnick Kolinski Baker BIOINFO.PL Shortle BAKER-ROBETTA Brooks Ho-Kai-Ming Jones-NewFold FR NgNW - number of good predictions without weighting for multiple models FR NpNW - number of total predictions without weighting for multiple models Overall Performance in CASP5 Contest ~8th out of 180 (M. Levitt, Stanford)

4910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction CASP - Check it out! Critical Assessment of Protein Structure Prediction CASP7 contest : Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP, EVA) & URLs for them Related contests & resources: Protein Function Prediction (part of CASP) CAPRI = Critical Assessment of Predicted Interactions New: CASPM = CASP for M = Mutant proteins Predict effects of small (point) mutations, e.g., SNPs

5010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Another Convenient List of Links for Protein Prediction Servers diction_software

5110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification Protein Structural Visualization  Protein Structure Comparison Protein Structure Classification

5210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Protein Structure Comparison Methods 3 Basic Approaches for Aligning Structures (see Xiong textbook for details) 1.Intermolecular 2.Intramolecular 3.Combined But, very active research area - many recent new methods 3 Popular Methods: DALI = Distance Matrix Alignment of Structures (Holm) FSSP Database SSAP = Sequential Structure Alignment Program (Orengo) CATH Database CE = Combinatorial Extension (Bourne) VAST at NCBI URLS:

5310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Another local example : Combining Structure Prediction, Machine Learning & "Real" (wet-lab) Experiments to Investigate the Lentiviral Rev Protein: A Step Toward New HIV Therapies Susan Carpenter (Washington State Univ) Wendy Sparks Yvonne Wannemuehler Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Kai-Ming Ho, Physics Yungok Ihm Haibo Cao Cai-zhuang Wang Gloria Culver, BBMB Laura Dutca

5410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Provirus Cytoplasm Nucleus Late: Structural Proteins Progeny RNA Macromolecular interactions mediated by Rev protein in lentiviruses (HIV & EIAV) pre-mRNA AAAA (protein-protein) NUCLEAR EXPORT AAAA Rev NUCLEAR IMPORT Spliceosome AAAA Early: Regulatory Proteins Tat Rev MULTIMERIZATION AAAA Rev RNA BINDING Rev (protein-RNA) Susan Carpenter

5510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Rev is essential for lentiviral replication Rev is a small nucleoplasmic shuttling protein (HIV Rev 115 aa; EIAV Rev 165 aa) Recognizes a specific binding site on viral RNA: Rev Responsive Element (RRE) Interacts with CRM1 to export incompletely spliced viral RNAs from nucleus to the cytoplasm Specific domains of Rev mediate nuclear localization, RNA binding, and nuclear export Critical role of Rev in lentiviral replication makes it an attractive target for antiviral (AIDs) therapy

5610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Problem: no high resolution Rev structure! not even for HIV Rev, despite intense effort ($$) Why?? Rev aggregates at concentrations needed for NMR or X- ray crystallography What about insights from sequence comparisons? "undetectable" sequence similarity among Revs from different lentiviruses (eg, EIAV vs HIV <10%) But: We know that lentiviral Rev proteins are functionally "homologous" - even in highly diverse lentiviruses

5710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Computationally model structures of lentiviral Rev proteins - using structural threading algorithm (with Ho et al) Predict critical residues for RNA-binding, protein interaction - using machine learning algorithms (with Honavar et al ) Test model and predictions - using genetic/biochemical approaches (with Carpenter & Culver) - using biophysical approaches (with Andreotti & Yu groups) Initially: focus on EIAV Rev & RRE Hypothesis: Rev proteins from diverse lentiviruses share structural features critical for function Approach:

5810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction  HIV-1 Rev Functional domains: EIAV vs HIV Rev  EIAV Rev NES NLS RRDRW ERLEKRRRK RBM Folding ? exon 1 exon 2 NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif NES NLS/RBM RQARRNRRRRWR

5910/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Predicted EIAV Rev Structure Yungok Ihm

6010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction EIAVHIVFIV SIV DimerHIV Dimer Comparison of Predicted Rev Structures Yungok Ihm

6110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction A Predicted Structure HIV Rev N-terminus B NMR Structure HIV Rev N-terminal Peptide (Battiste & Williamson) C Overlay Alignment of Predicted & NMR Structures Predicted vs Experimental Structure of N-terminal region of HIV Rev Yungok Ihm

6210/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Location of functional residues EIAV Rev? Yungok Ihm Putative RBM NES Leu36,45,49: On surface, consistent with role in nuclear export Leu95 & Leu109: Buried in core, critical hydrophic contacts for fold?

6310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Mutate hydrophobic residues predicted to be critical for helical packing in core L65 L95 L109 Yungok Ihm Single Ala Mutation L  A Single Asp Mutation L  D Negligible effect on Rev activity Dramatic change in Rev activity? Insert charged aa in hydrophobic core Double Ala Mutation L  L  A  A Reduction in Rev activity? L65 vs L95 & L109 Single mutants: Leu to Ala Leu to Asp Double mutants: Leu to Ala

6410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Activity of Rev Structural Mutants Sham RI pcDNA3 Functional Analysis of Rev Structural Mutants in vivo (CAT assay) Wendy Sparks

6510/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Functional domains: EIAV vs HIV Rev  HIV-1 Rev - RNA interaction - Protein interaction NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif Green Red NES NLS/RBM RQARRNRRRRWR  EIAV Rev NES NLS RRDRW ERLEKRRRK RBM Folding ?

6610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Putative RNA-binding Motifs & Predicted RNA-binding Residues Mapped onto Predicted EIAV Rev Structure ARRHLGPGPT QHTPS RRDRW IREQILQAEV L Q ERLE WRIR … DPQGPLESDQ WCRVLRQSLP EEKISSQTCI ARRHLGPGPT QHTPS RRDRW IREQILQAEV L QERLE WRIR GVQQVAKELG EVNRGIWREL HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDS KR RRK HL HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDS KRRRK HL Michael Terribilini Yungok Ihm KRRRK RRDRW ERLE

6710/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Express & purify MBP-ERev deletion mutants Marker MBP MBP-ERev NES NLS RBM Folding? Jae-Hyung Lee MBP

6810/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction MBP-ERev binds specifically to RRE in vitro sense antisense BSA MBP BSA MBP Cold RRE No protein No cold RRE UV crosslinkingCompetition Undigested 32 P-RRE Jae-Hyung Lee

PREDICTED: Structure Protein binding residues RNA binding residues KRRRK RRDRW VALIDATED: Protein binding residues RNA binding residues EIAV Rev: Binding Predictions vs Experiments QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI GP L ESDQWCRV L RQS L PEEKISSQTCI Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11: MBP WT RRDRW ERLE KRRRK NES FOLD? NLS/RBM RBM Jae-Hyung Lee

7010/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction AADAA AALA KAAAK Roles of Putative RNA Binding Motifs? NES NLS RRDRW ERLEKRRRK RBD ERDE RBD Jae-Hyung Lee

Rev RNA Binding Motifs: Predicted vs Experiment AADAA AALA KAAAK ERDE PREDICTED: Structure Protein binding residues RNA binding residues KRRRK RRDRW VALIDATED: Protein binding residues RNA binding residues QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI GP L ESDQWCRV L RQS L PEEKISSQTCI   RRDRW ERLE KRRRK NES KAAAK AADAA AALA ERDE WT NLS RBMFOLD? NLS/RBM Jae-Hyung Lee

KRRRK RRDRW Summary: Predictions vs Experiments QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI GP L ESDQWCRV L RQS L PEEKISSQTCI Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 RRDRW ERLE KRRRK NES FOLD NLS/RBM RBM ERLE

7310/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Conclusions & Future Directions Combination of computational & wet lab approaches revealed that: EIAV Rev has a bipartite RNA binding domain Two Arg-rich RBMs are critical RRDRW in central region (but not ERLE) KRRRK at C-terminus, overlapping the NLS Based on computational modeling, the RBMs are in close proximity within the 3-D structure of protein Lentiviral Rev proteins & their cognate RRE binding sites may be more similar in structure than has been appreciated Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 Future: Computational: Use Rev-RRE model system to discover "predictive rules" for protein-RNA recognition Experimental?

7410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Experimentally determine the structure of Rev-RRE complex !!!

Building “Designer” Zinc Finger DNA-binding Proteins J Sander, P Zaback, F Fu, J Townsend, R Winfrey D Wright, K Joung, L Miller, D Dobbs, D Voytas Wright et al (2006) Nature Protocols Sander et al (2007) Nucleic Acids Res

7610/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) Introduction Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation