Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Measuring the degree of similarity: PAM and blosum Matrix
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein threading algorithms 1.GenTHREADER Jones, D. T. JMB(1999) 287, Protein Fold Recognition by Prediction-based Threading Rost, B., Schneider,
Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Multiple Sequence Alignments
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Chapter 5 Multiple Sequence Alignment.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Force Fields G Vriend Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Manually Adjusting Multiple Alignments Chris Wilton.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction Graham Wood Charlotte Deane.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Chemistry XXI Unit 3 How do we predict properties? M1. Analyzing Molecular Structure Predicting properties based on molecular structure. M4. Exploring.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Computational Structure Prediction
Protein Folding and Protein Threading
Protein Structures.
Protein structure prediction.
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Chapter 9 Structure Prediction

Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want accuracy You could use nucleotide alignment, but what do you do with the gapped regions? More complex methods are only justified if they can be shown to perform better than simpler methods Simpler methods are only justified if they can perform better than basic sequence alignment Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want accuracy You could use nucleotide alignment, but what do you do with the gapped regions? More complex methods are only justified if they can be shown to perform better than simpler methods Simpler methods are only justified if they can perform better than basic sequence alignment

First Step Some structure comparison methods use secondary structures of the new sequence Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial Supervised learning has been shown to perform well in this task Some structure comparison methods use secondary structures of the new sequence Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial Supervised learning has been shown to perform well in this task

Artificial Neural Network Predicts Structure at this point Predicts Structure at this point

Danger You may train the network on your training set, but it may not generalize to other data Perhaps we should train several ANNs and then let them vote on the structure You may train the network on your training set, but it may not generalize to other data Perhaps we should train several ANNs and then let them vote on the structure

Profile network from HeiDelberg family (alignment is used as input) instead of just the new sequence On the first level, a window of length 13 around the residue is used The window slides down the sequence, making a prediction for each residue The input includes the frequency of amino acids occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment) The second level takes these predictions from neural networks that are centered on neighboring proteins The third level does a jury selection family (alignment is used as input) instead of just the new sequence On the first level, a window of length 13 around the residue is used The window slides down the sequence, making a prediction for each residue The input includes the frequency of amino acids occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment) The second level takes these predictions from neural networks that are centered on neighboring proteins The third level does a jury selection

PHD Predicts 4 Predicts 6 Predicts 5

Threading Threading matches structure to sequence True threading considers 3D spatial interactions Threading matches structure to sequence True threading considers 3D spatial interactions

3D-1D Matching (Bowie et al.) Convert 3D structure into a string Include  -helix,  -sheet or neither Include buried or solvent accessible (6 levels) Total of 3X6=18 distinct states With P a:j = probability of finding amino acid (a) in environment (j) and P a =probability of finding (a) anywhere Convert 3D structure into a string Include  -helix,  -sheet or neither Include buried or solvent accessible (6 levels) Total of 3X6=18 distinct states With P a:j = probability of finding amino acid (a) in environment (j) and P a =probability of finding (a) anywhere

3D-1D Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins

Methods using 3D interactions Residues that have large separation in the sequence may end up next to each other when the protein is folded. Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB Use measure in alignment to evaluate cost, or to select the best alignment Residues that have large separation in the sequence may end up next to each other when the protein is folded. Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB Use measure in alignment to evaluate cost, or to select the best alignment

3D interactions

Potentials of mean force (POMF) Since the notion of contact is somewhat arbitrary, a more general formulation can be tried Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart. Since the notion of contact is somewhat arbitrary, a more general formulation can be tried Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart.

Multiple Sequence Threading Multiple Sequence Alignment Align the most similar to create a consensus sequence Align consensus sequences to create overall alignment Use the same strategy with structures Assume that conserved hydrophobic positions should pack in the core This appears to be work in progress (1997) Multiple Sequence Alignment Align the most similar to create a consensus sequence Align consensus sequences to create overall alignment Use the same strategy with structures Assume that conserved hydrophobic positions should pack in the core This appears to be work in progress (1997)

Example Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. The POMF would have a peak around 5A Aspartate (D) and valine since do not often pack together The POMF will have a dip around 5A Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. The POMF would have a peak around 5A Aspartate (D) and valine since do not often pack together The POMF will have a dip around 5A POMF(A,V) POMF(D,V) Probability Distance 5A

Sequence-Structure Alignment For all know structures Align the unknown sequence to that structure Find the best alignment Return the structure with the best global alignment Unfortunately, we cant use dynamic programming (NP Complete) Heuristics must be used to explore the space. For all know structures Align the unknown sequence to that structure Find the best alignment Return the structure with the best global alignment Unfortunately, we cant use dynamic programming (NP Complete) Heuristics must be used to explore the space.

Evaluating Methods Is the complexity worth it? This is difficult without a benchmark Few comparative studies have been performed When they have been performed, authors of competing methods have complained that wrong parameters were used … Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. All methods submit their predictions Predictions are analyzed based on fold recognition, modeling accuracy and alignment accuracy. No one method or approach is obviously superior Is the complexity worth it? This is difficult without a benchmark Few comparative studies have been performed When they have been performed, authors of competing methods have complained that wrong parameters were used … Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. All methods submit their predictions Predictions are analyzed based on fold recognition, modeling accuracy and alignment accuracy. No one method or approach is obviously superior