Download presentation
1
Protein Tertiary Structure Prediction
Structural Bioinformatics Protein Tertiary Structure Prediction
2
The different levels of Protein Structure
Primary: amino acid linear sequence. Secondary: -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain
3
The 3D structure of a protein is stored in a coordinate file
Each atom is represented by a coordinate in 3D (X, Y, Z)
4
The coordinate file can be viewed graphically
RBP Description is given in slides 35-36
5
Predicting 3D Structure
Outstanding difficult problem Comparative modeling (homology) Based on structural homology Fold recognition (threading) Based on sequence homology
6
Comparative Modeling Based on Sequence homology
Similar sequences suggests similar structure
7
Sequence and Structure alignments of two Retinol Binding Protein
8
How do we evaluate structure similarity??
Structure Alignment
9
Structure Alignments There are many different algorithms for structural Alignment. The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures.
10
The RMSD of two aligned structures indicates their divergence from one another.
Atom N (x, y, z) Atom N (x, y, z) Atoms in Protein V Atoms in Protein W Low values of RMSD mean similar structures
11
Based on Sequence homology
Comparative Modeling Similar sequence suggests similar structure Builds a protein structure model based on its alignment (sequence) to one or more related protein structures in the database
12
Can we use comparative modeling for any given sequence?
13
Based on Sequence homology
Comparative Modeling Accuracy of the comparative model is usually related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% can be modeled <30% sequence identity =low accuracy (many errors) However other parameters (such as identify length) can influence the results
14
Based on Sequence homology
Comparative Modeling Modeling of a sequence based on known structures Consist of four major steps : Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST 2. Aligning sequence with the templates 3. Building a model 4. Assessing the model
15
What is a good model?
16
What is a good model?
17
What is a good model?
18
Based on Structure homology
Fold Recognition
19
Based on Secondary Structure
Protein Folds: sequential and spatial arrangement of secondary structures Globin TIM
20
Similar folds usually mean similar function
Transcription factors Homeodomain
21
The same fold can have multiple functions
Rossmann 12 different functions 31 different functions TIM barrel
22
Based on Structure homology
Fold Recognition Fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. Search for folds that are compatible with a particular sequence.
23
Based on Structure homology
Basic steps in Fold Recognition : Compare sequence against a Library of all known Protein Folds (finite number) Query sequence MTYGFRIPLNCERWGHKLSTVILKRP... Goal: find to what folding template the sequence fits best There are different ways to evaluate sequence-structure fit
24
Based on Secondary Structure homology
There are different ways to evaluate sequence-structure fit Potential fold 1) ) n) ... ... MAHFPGFGQSLLFGYPVYVFGD...
25
Based on Structure homology
Fold Recognition Fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. Search for folds that are compatible with a particular sequence. "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence
26
Ab Initio Modeling Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossible WHY ? Exceptionally complex calculations Biophysics understanding incomplete
27
CASP - Critical Assessment of Structure Prediction
How do we know what is a good prediction ??? CASP - Critical Assessment of Structure Prediction Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally. Current state - ab-initio - the worst, but greatly improved in the last years. Modeling - performs very well when homologous sequences with known structures exist. Fold recognition - performs well.
28
What can you do? FOLDIT Solve Puzzles for Science
A computer game to fold proteins
29
Predicting function from structure
What’s Next Predicting function from structure
30
Protein structures give us insight into
protein function and mechanism of action protein complexes Biologic processes fold Evolutionary relationship Shape and electrostatics Active sites Protein-ligand complexes Functional sites Location Of mutants , SNPs
31
Classical approach for function prediction
new structure ? similar function
32
Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
33
A different approach for predicting
function from structure which does not rely on homology • To characterize the known protein structures belonging to a specific family • Find general structural features which are unique to the family • Use these features to predict new members of the family
34
Predicting new DNA-binding proteins
EXAMPLE : Predicting new DNA-binding proteins p53 Many DNA-binding proteins are involved in cancer
35
Many different folds but all can bind DNA
Helix-Turn-Helix Zinc-Finger Leucine zippers b-ribbon
36
While DNA-binding proteins have diverse folds
they all share a common property: All have positive charged surfaces Complementing the negative charge of the DNA Positive (Blue) What proteins are these? Negative (red)
37
DNA-binding proteins are characterized by positive charged surfaces
(Blue) What proteins are these? Negative (red) But so do proteins that don’t bind nucleic acids
38
Strategy for predicting new DNA-binding proteins
Build a database of DNA-binding and non DNA-binding proteins Extract the positive electrostatic patch in all proteins in Data Set. Find features that could be used to discriminate the DNA-binding proteins from other proteins. Use the features as a vector to train a machine learning algorithm to identify novel DNA-binding proteins
39
Machine learning algorithm for predicting protein function from structural features
SVM (Support Vector Machine) is trained on a set of known proteins that have a common function such as DNA binding (red dots), and in addition, a separate set of proteins that are known not to bind DNA (blue dots)
40
Using this training set of DNA and non-DNA binding protein, an SVM would learn to differentiate between the members and non-members of the family ? Having learned the features of the class (DNA binding proteins), the SVM could recognize a new protein as members or as non-members of the class based on the combination of its structural features.
41
Testing the algorithm for predicting
DNA-binding proteins DNA binding Non- ‘DNA binding correct incorrect 20 40 60 80 100 True Positive = 44 True Negative = 236 False Positive = 10 False Negative = 14
42
Pymol example Launch Pymol Open file “1aqb” (PDB coordinate file)
Display sequence Hide everything Show main chain / hide main chain Show cartoon Color by ss Color red Color green, resi 1:40 Help :
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.