Protein Tertiary Structure Prediction

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Strict Regularities in Structure-Sequence Relationship
Protein Structure, Databases and Structural Alignment
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Protein Structure Prediction and Analysis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Representations of Molecular Structure: Bonds Only.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Motif Search and RNA Structure Prediction Lesson 9.
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Protein Structure Prediction and Protein Homology modeling
There are four levels of structure in proteins
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Debanu Das, Millie M Georgiadis  Structure 
Homology Modeling.
Protein structure prediction.
Protein structure prediction
Debanu Das, Millie M Georgiadis  Structure 
Presentation transcript:

Protein Tertiary Structure Prediction Structural Bioinformatics Protein Tertiary Structure Prediction

The different levels of Protein Structure Primary: amino acid linear sequence. Secondary: -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain

The 3D structure of a protein is stored in a coordinate file Each atom is represented by a coordinate in 3D (X, Y, Z)

The coordinate file can be viewed graphically RBP Description is given in slides 35-36

Predicting 3D Structure Outstanding difficult problem Comparative modeling (homology) Based on structural homology Fold recognition (threading) Based on sequence homology

Comparative Modeling Based on Sequence homology Similar sequences suggests similar structure

Sequence and Structure alignments of two Retinol Binding Protein

How do we evaluate structure similarity?? Structure Alignment

Structure Alignments There are many different algorithms for structural Alignment. The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures.

The RMSD of two aligned structures indicates their divergence from one another. Atom N (x, y, z) Atom N (x, y, z) Atoms in Protein V Atoms in Protein W Low values of RMSD mean similar structures

Based on Sequence homology Comparative Modeling Similar sequence suggests similar structure Builds a protein structure model based on its alignment (sequence) to one or more related protein structures in the database

Can we use comparative modeling for any given sequence?

Based on Sequence homology Comparative Modeling Accuracy of the comparative model is usually related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% can be modeled <30% sequence identity =low accuracy (many errors) However other parameters (such as identify length) can influence the results

Based on Sequence homology Comparative Modeling Modeling of a sequence based on known structures Consist of four major steps : Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST 2. Aligning sequence with the templates 3. Building a model 4. Assessing the model

What is a good model?

What is a good model?

What is a good model?

Based on Structure homology Fold Recognition

Based on Secondary Structure Protein Folds: sequential and spatial arrangement of secondary structures Globin TIM

Similar folds usually mean similar function Transcription factors Homeodomain

The same fold can have multiple functions Rossmann 12 different functions 31 different functions TIM barrel

Based on Structure homology Fold Recognition Fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. Search for folds that are compatible with a particular sequence.

Based on Structure homology Basic steps in Fold Recognition : Compare sequence against a Library of all known Protein Folds (finite number) Query sequence MTYGFRIPLNCERWGHKLSTVILKRP... Goal: find to what folding template the sequence fits best There are different ways to evaluate sequence-structure fit

Based on Secondary Structure homology There are different ways to evaluate sequence-structure fit Potential fold 1) ... 56) ... n) ... ... MAHFPGFGQSLLFGYPVYVFGD... -10 ... -123 ... 20.5

Based on Structure homology Fold Recognition Fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. Search for folds that are compatible with a particular sequence. "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence

Ab Initio Modeling Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossible WHY ? Exceptionally complex calculations Biophysics understanding incomplete

CASP - Critical Assessment of Structure Prediction How do we know what is a good prediction ??? CASP - Critical Assessment of Structure Prediction Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally. Current state - ab-initio - the worst, but greatly improved in the last years. Modeling - performs very well when homologous sequences with known structures exist. Fold recognition - performs well.

What can you do? FOLDIT Solve Puzzles for Science A computer game to fold proteins http://fold.it/portal/puzzles

Predicting function from structure What’s Next Predicting function from structure

Protein structures give us insight into protein function and mechanism of action protein complexes Biologic processes fold Evolutionary relationship Shape and electrostatics Active sites Protein-ligand complexes Functional sites Location Of mutants , SNPs

Classical approach for function prediction new structure ?  similar function

Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

A different approach for predicting function from structure which does not rely on homology • To characterize the known protein structures belonging to a specific family • Find general structural features which are unique to the family • Use these features to predict new members of the family

Predicting new DNA-binding proteins EXAMPLE : Predicting new DNA-binding proteins p53 Many DNA-binding proteins are involved in cancer

Many different folds but all can bind DNA Helix-Turn-Helix Zinc-Finger Leucine zippers b-ribbon

While DNA-binding proteins have diverse folds they all share a common property: All have positive charged surfaces Complementing the negative charge of the DNA Positive (Blue) What proteins are these? Negative (red)

DNA-binding proteins are characterized by positive charged surfaces (Blue) What proteins are these? Negative (red) But so do proteins that don’t bind nucleic acids

Strategy for predicting new DNA-binding proteins Build a database of DNA-binding and non DNA-binding proteins Extract the positive electrostatic patch in all proteins in Data Set. Find features that could be used to discriminate the DNA-binding proteins from other proteins. Use the features as a vector to train a machine learning algorithm to identify novel DNA-binding proteins

Machine learning algorithm for predicting protein function from structural features SVM (Support Vector Machine) is trained on a set of known proteins that have a common function such as DNA binding (red dots), and in addition, a separate set of proteins that are known not to bind DNA (blue dots)

Using this training set of DNA and non-DNA binding protein, an SVM would learn to differentiate between the members and non-members of the family ? Having learned the features of the class (DNA binding proteins), the SVM could recognize a new protein as members or as non-members of the class based on the combination of its structural features.

Testing the algorithm for predicting DNA-binding proteins DNA binding Non- ‘DNA binding correct incorrect 20 40 60 80 100 True Positive = 44 True Negative = 236 False Positive = 10 False Negative = 14

Pymol example Launch Pymol Open file “1aqb” (PDB coordinate file) Display sequence Hide everything Show main chain / hide main chain Show cartoon Color by ss Color red Color green, resi 1:40 Help : http://pymol.org