IPM-POLYTECHNIQUE-WPI Workshop on Bioinformatics and Biomathematics April 11-21, 2005 IPM School of Mathematics Tehran.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Protein Structure Prediction
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Structure Prediction in 1D
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Proteins Secondary Structure Predictions Structural Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Intelligent Systems for Bioinformatics Michael J. Watts
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Lecture 3.31 Superposition & Threading † Gary Van Domselaar University of Alberta † Slides adapted from David Wishart.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Structure Modelling Many sequences - few structures Homology Modelling - Based on Sequence Similarity with Sequences of Known Structures.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Structural proteomics
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Structural proteomics Handouts. Proteomics section from book already assigned.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Protein Structure Prediction and Protein Homology modeling
Feature Extraction Introduction Features Algorithms Methods
Prediction of RNA Binding Protein Using Machine Learning Technique
Introduction to Bioinformatics II
Protein Structures.
Protein structure prediction.
Presentation transcript:

IPM-POLYTECHNIQUE-WPI Workshop on Bioinformatics and Biomathematics April 11-21, 2005 IPM School of Mathematics Tehran

Prediction of protein surface accessibility based on residue pair types and accessibility state using dynamic programming algorithm R. Zarei 1, M. Sadeghi 2, and S. Arab 3 1,2) NRCGEB, Tehran, Iran 3) IBB, University of Tehran

 Proteins & structure of proteins  Prediction of protein structure  Prediction of protein accessible surface area  Method  conclusion

Flow of information DNA RNA PROTEIN SEQ PROTEIN STRUCT PROTEIN FUNCTION ……….

Proteins are the Machinery of life Proteins have Structural & functional roles in cells No other type of biological macromolecule could possibly assume all of the functions that proteins have amassed over billions of years of evolution.

Proteins structure leads to protein function Precise placement of chemical groups allows proteins to have :  Catalysis function  Structural role  Transport function  Regulatory function Then the determination of 3-dimentional structure of proteins is important.

4 levels of protein structures  The Primary structure of proteins (A string of 20 different Amino acids)  The secondary structure of proteins (Local 3-D structure)  The Tertiary structure of proteins (Global 3-D structure)  The Quaternary structure of proteins (Association of multiple polypeptide chains)

The Primary structure of proteins

The secondary structure of proteins α-helix  α- helices helix Π-helix parallel  β- sheets anti parallel Hairpin loops Loops Ώ loops  Other secondary structures Extended loops Coils random coil

The Tertiary structure of proteins  There are a wide variety of ways in which the various helix, sheets & loop elements can combine to produce a complete structure.  At the level of tertiary structure, the side chains play a much more active role in creating the final structure.

Why predict protein structure?  Structural knowledge brings understanding of function and mechanism of action  Protein structure is determined experimentally by X-ray and NMR  The sequence- structure gap is rapidly increasing known sequences, known structures

What is protein structure prediction?  In its most general form A prediction of the (relative) spatial position of each atom in the tertiary structure generated from knowledge only of the primary structure (sequence)

Hypotheses of Prediction  No general prediction of 3D structure from sequence yet.  Sequence determines structure determines function The 3D structure of a protein (the fold) is uniquely determined by the specificity of the sequence(Afinsen,1973)

Methods of structure prediction  Comparative (homology) modelling  Fold recognition/threading  Ab initio protein folding approaches

3D structure prediction of proteins Existing folds Threading Building by homology similarity (%) New folds Ab initio prediction

Levels of structure prediction  1D secondary structure, accessibility,……  2D contact map of residues  3D Tertiary structure

Prediction in 1D Structure prediction in 1D is To project 3D structure onto strings of structural assignments.  Secondary Structure prediction  Prediction of Accessible Surface Area  Prediction of Membrane Helices

What is prediction in 1D?  Given a protein sequence (primary structure) HWIATGQLIREAYEDYSS GHWIATRGQLIREAYEDYRHFSSECPFIP  Assign the residues (C=coils H=Alpha Helix E=Beta Strands) EEEEEHHHHHHHHHHHHH CEEEEECHHHHHHHHHHHCCCHHCCCCCC

secondary structure prediction in 1D  less detailed results only predicts the H (helix), E (extended) or C (coil/loop) state of each residue, does not predict the full atomic structure  Accuracy of secondary structure prediction The best methods have an average accuracy of just about 73% (the percentage of residues predicted correctly)

History of prediction of protein structure in 1D methods  First generation –How: single residue statistics –Accuracy: low  Second generation –How: segment statistics –Accuracy: ~60%  Third generation –How: long-range interaction, homology based –Accuracy: ~70%

Protein surface

Accessible Surface Area Solvent Probe Accessible Surface Van der Waals Surface Reentrant Surface The accessible surface is traced out by the probe sphere center as it rolls over the protein. It is a kind of expanded van der waalse surface.

Accessibility Accessible Surface Area (ASA) in folded protein  Accessibility = Maximum ASA  Two state = b (buried),e (exposed) e.g. b 16%  Three state = b (buried), I (intermediate), e (exposed) e.g. b i, 36%

Use of Solvent Accessibility studies of solvent accessibility in proteins have led to many insight into protein structure like:  Protein function  Sequence motifs  Domains  Formulating antigenic determinants & site-directed mutagenesis

Why Predict Solvent Accessibility?  Helpful for : Predicting the arrangement of secondary structure segments in 3-D structure Estimating the number of protein-protein & protein- solvent contacts of residues Threading procedure to find putative remote homologues Improving prediction of glycosylation sites Predicting epitops

Problems of predicting solvent Accessibility  Prediction of solvent accessibility is less accurate than that of secondary structuresecondary structure  Problem of approximation for residue accessibility (a projection of surface area onto 2 states leads to reduce of information )  The problem of how to define the threshold

ASA Calculation  DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp)  VADAR - Volume Area Dihedral Angle Reporter (  GetArea -

Other ASA sites  Connolly Molecular Surface Home Page  Naccess Home Page  ASA Parallelization  Protein Structure Database

Methods of Accessibility prediction Scientists YearAccuracyCC Method Salzberg ~ 72%0.43 DT Decision tree 1 Tompson, Goldstein ~ 72%0.43 BS Bayesian statistics 2 Li, Pan ~ 72%0.43 MLR Multiple linear regression 3 Yuan, et al % 2~4 % SVM Support vector Machine 4 Rost, sander % 2~4% Neural network 5 Sadeghi et al 2001 A method Based on information theory 6

PHD Prediction of rCD2

Accessibility Prediction  PredictProtein-PHDacc (58%)  PredAcc (70%?) QHTAW... QHTAWCLTSEQHTAAVIW BBPPBEEEEEPBPBPBPB

THEORY & METHOD

Data sets A set of 230 nonredundant protein structures in the PDB with mutual sequence similarity <25% were selected to construct the training and testing sets from the PDBSELECT and with  2.5 Å resolution determined by x-ray and without chain breaks

ASA calculation  Surface area and accessibility for dataset proteins were calculated by software developed in our group  Accessibility states defined as two states and three states with different threshold  Two states B and E ( 5%, 9%, and 16%)  Three states B, I, E ( 4,9% - 9, 16% - 4,16% )

 Conformation(State) of a residue is affected by: Short range interactions( between near residues ) Long range interactions( between far residues ) Most efforts have been focused on the analysis of near residues(local effects).

 our method is based on : Residue type (R) Residue conformation (state of neighbor residues S & S’): different neighbor residue types cause that residue adopt to different states.

EBI E B I EBIEBIEBI EBIEBI n1 n2 n3 3 n Branch n=length of protein Branch with maximum information

Single residue prediction n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 n 10 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6

S S S S S S S S S S S Double residue prediction S S

Where P(SS’= XX’ ) is the probability of the occurrence of an event P(SS’=XX’ RiRj) is the conditional probability of SS’= XX’ if residues R i and R j have occurred. The complementary event of

Complexity & problems of method  Considering pairwise residue type: 20*20 entry  considering both types of Pair residues & pair residue states simultaneously : For two states : 20*20*2 entry For three states : 20*20*3 entry Note: because of sample limitation we can’t analyze triplets or more.

Problems that we encountered for considering pairwise residue types & states simultaneously was:  Each residue in a window with length of L predicts L times. for example in a window with length of two residues, each residue predicts 2 times and so on.2 times  If we consider the state of each residue in a window with the length of L, there are L times prediction for each residue.L times Result : the ambiguity in answering the question or Which state stands for each residue ? Solution: Use of dynamic programming

n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 n 10 S S S S S S S S S S S Double residue prediction S S

n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 S S S S S double residue prediction for long length wndows S S S S S S S S S S S S S

information content I of a sequence length L, amino acid types R i and R i+m and accessibility states S and S ’  (E,I,B) in window size L calculate as follow:

Dynamic programming algorithm  Build an optimal solution from optimal solutions to sub problems  Decompose a large problem into number of small problems. Solve the small problems and use these to solve the large problem.

Three basic components  The development of a dynamic programming algorithm has three basic components: –The recurrence relation (for defining the value of an optimal solution); –The tabular computation (for computing the value of an optimal solution); –The trace back (for delivering an optimal solution).

Dynamic programming algorithm

Three states accessibility for two residues length window

n 1 n 2 n 3 n 4 n 5 n 6 n 1 n 2 n 3 n 4 n 1 n 2 n 2 n 3 n 2 n 3 n4 n 3 n 4

n1 n2 n3 n2n3n4 EEEBEI BBBIBE IIIBIE EEEBEI BBBIBE IIIBIE EEEBEI BBBIBE IIIBIE EE II

Results & discussion

threshold Window length 16%9%5% Two states accuracy

Three states accuracy thresholds Window length 4,16%9, 16%4, 9 %

Three states accuracy

Suggestions Taking longer windows surely increases prediction accuracy Analysis and scoring of amino acid pairs by other statistical methods such as markov chain Using larger data sets and analysis of amino acid triplets (8000* 27 states)

Thank You