Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Protein Tertiary Structure Prediction
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Force Fields G Vriend Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Comp. Genomics Recitation 3 The statistics of database searching.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
JM - 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Meller Jarek Meller Division.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Force Fields Summary. Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a (molecular)
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Step 3: Tools Database Searching
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Introduction to Bioinformatics II
Protein Structures.
Homology Modeling.
Protein structure prediction.
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical tendencies, characteristic of its sequence Physical aspects of the structure are not included in the prediction Major categories of comparative structure prediction: 1.Secondary structure prediction 2.Homology modeling 3.Fold recognition

1. Secondary structure prediction Basic methodology: Each amino acid has a statistical propensity to appear in certain secondary structures (e.g. helix, sheet, turn) The individual amino acid propensities are additive Thus, the propensity of an entire protein segment can be calculated By using a ‘sliding window’, protein segments with strong secondary structure propensities can be identified

P(H), P(E), P(turn) – frequency parameters for appearing in an α-helix, β- sheet, and turn F(i), F(i+1), F(i+2), F(i+3) – frequencies of being in 1 st to 4 th position of β-turn 1.Chou and Fassman (1974) Residue propensities + a sliding widow for prediction Major steps in secondary structure prediction

Success rate: ~50% Y Y Y Y Y Y Y Y Y

2.Sternberg (1987) Incorporating evolutionary information in the calculation, in the form of multiple sequence alignments (MSAs) (homologous proteins tend to have similar secondary structures) Success rate: 69%

3.Rost and Sander (1994) (PHD-Sec) Combines neural networks (i.e. machine learning) with multiple sequence alignments Success rates: PHD-Sec – 72%; PREDATOR – 75%; PSIPRED – 77%

Common problems in secondary structure prediction Prediction is problematic at the extremities of secondary elements Success rate is always under 100% - maybe due to tertiary effects in proteins

2. Homology modeling Basic logics: Homologous proteins (proteins with a common ancestor; high sequence identity) share similar structures Thus, the structure of a protein can be predicted according to its sequence similarity to proteins of known structure (family)

Homology modeling includes the following steps: 1.Finding a ‘template’ protein with high enough sequence identity to the query protein (desirable: at least 30%) [PSI-BLAST] 2.Aligning the two sequences 3.Transferring the coordinates of identical amino acids from the template to the query protein (for non-identical residues - other prediction methods are used)

4. Performing energy optimization to get rid of clashes and distortions

5.

Problems: 1.The number of proteins of known structure that can serve as templates (i.e. > 30% sequence identity) is limited 2.Predicting loops - loops are rich in insertions and deletions, and are therefore difficult to predict Partial solution: combination of sequence-based methods and hydrophobicity profiles make it possible to infer the structure of loops

3. Fold recognition (profile) Basic logics: The sequence-based statistical tendencies (polarity, exposure, secondary structure) of the query protein are compared to those of other proteins with known structure The best match represents the protein of the closest fold to the query protein Useful for: 1.Finding the fold of a query protein 2.Predicting whether a query protein has a novel fold

1.Each of the 20 amino acids is classified according to 3 basic structure-related statistical tendencies: polarity, solvent exposure and secondary structure 2.Each position in the query protein is assigned a code, describing the specific tendencies of this position. This yields a structure-based sequence profile for the query protein 3.The profile is systematically compared to a library containing the profiles of all proteins of known structure 4.A match represents a protein with similar fold 5.If a match is not found, the query protein is assumed to have a novel fold 3. Fold recognition (profile): steps

4. Fold recognition (Threading) A combination of homology modeling and structural profiles Like homology modeling: it predicts the structure of the query protein based on sequence alignments with template proteins However: instead of one 3D model, many low-resolution models are constructed by using different alignments The different models are evaluated based on residue-residue preferences in known structures (converted to energy terms by the Boltzman equation)