Download presentation
Presentation is loading. Please wait.
Published byBuddy Newman Modified over 9 years ago
1
. Protein Structure Prediction
2
Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more or less) stable 3-dimensional configuration
3
Why Structure is Important? The structure a protein takes is crucial for its function u Forms “pockets” that can recognize an enzyme substrate u Situates side chain of specific groups to co-locate to form areas with desired chemical/electrical properties u Creates firm structures such as collagen, keratins, fibroins
4
Determining Structure u X-Ray and NMR methods allow to determine the structure of proteins and protein complexes u These methods are expensive and difficult l Could take several work months to process one proteins u A centralized database (PDB) contains all solved protein structures l XYZ coordinate of atoms within specified precision l ~19,000 solved structures
5
Growth of the Protein Data Bank
6
Structure is Sequence Dependent u Experiments show that for many proteins, the 3- dimensional structure is a function of the sequence l Force the protein to loose its structure, by introducing agents that change the environment l After sequences put back in water, original conformation/activity is restored u However, for complex proteins, there are cellular processes that “help” in folding
7
Amino Acids
8
What Forces Hold the Structure? u Structure is supported by several types of chemical bonds/forces l Hydrogen Bonds
9
What Forces Hold the Structure? u Charge-charge interactions l Positive charged groups prefer to be situated against negatively charged groups
10
What Forces Hold the Structure? u Disulfide bonds l S-S bonds between cysteine residues l These form during folding
11
What Forces Hold the Structure? u Hydrophobic effect
12
Levels of structure
13
Secondary Structure -helix -strands
14
Hydrogen Bonds in -Helixes
15
-Strands form Sheets parallel Anti-parallel These sheets hold together by hydrogen bonds across strands
16
Angular Coordinates u Secondary structures force specific angles between residues
17
Ramachandran Plot u We can related angles to types of structures
18
Labeling Secondary Structure u Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids l These do not lead to absolute definition of secondary structure
19
Prediction of Secondary Structure Input: u amino-acid sequence Output: u Annotation sequence of three classes: l alpha l beta l other (sometimes called coil/turn) Measure of success: u Percentage of residues that were correctly labeled
20
Protein Folds: sequential, spatial and topological arrangement of secondary structures The Globin fold
21
Approaches for structure prediction Homology modeling l (25-30% identity as a predictor) Fold recognition l Remote homology Ab initio Prediction l Heavy computations
22
Newly Determined Structures- Fraction of New Folds
23
Fraction of new folds (PDB new entries in 1998) Koppensteiner et al., 2000, JMB 296:1139-1152.
24
A Finite Number of Protein Folds Aim: recognize fold that “matches” a given sequence Approaches: l PSI-Blast, Profile HMMs, etc. l Threading
25
E ab E ab A C D E ….. A -3 -1 0 0.. C -1 -4 1 2.. D 0 1 5 6.. E 0 2 6 7....... ACCECADAAC -3-1-4-4-1-4-3-3=-23 structural template structural template neighbor definition neighbor definition energy function energy function 1 2 345 6 7 10 8 9 A C CEC A D A A C Threading: Essential components
26
MAHFPGFGQSLLFGYPVYVFGD... Potential fold... 1)... 56)... n)... -10... -123... 20.5 Find best fold for a protein sequence: Fold recognition (threading)
27
GenTHREADER (Jones, 1999, JMB 287:797-815) For each template provide MSA l align the query sequence with the MSA l assess the alignment by sequence alignment score l assess the alignment by pairwise potentials l assess the alignment by solvation function l record lengths of: alignment, query, template
28
Essentials of GenTHREADER
29
Ab-initio Structure Recognition Goal: l Predict structure from “first principles” Benefits: l Works for novel folds l Shows that we understand the process
30
Approaches to Ab-initio Prediction Molecular Dynamics u Simulates the forces that governs the protein within water u Since proteins natural fold, this would lead to solved structure Problems: u Thousands of atoms u Huge number of time steps to reach folded protein Intractable problem
31
Approaches to Ab-initio Prediction Minimal Energy u Assumption: folded form is the minimal energy conformation of the protein Decomposition: u Define energy function u Search for 3-D conformation that minimize energy
32
Energy Function u Account for the forces that apply on the molecule l Van der wals forces l Covalent bonds l Hydrogen bonds l Charges l Hydrophobic effects Issues: u Estimating parameters u How do we compute it --- O( (# atoms)^2 )
33
Simplified Energy Functions Different levels of granularity u Residue-Residue energy function (Bead model) u Partial model l Backbone as a bid l Side-chain as a rigid body that can move wrt to backbone u Many other variants
34
Search Strategy u High dimensional search problem How do we represent partial solutions? u Position of each atom (too detailed!) u Position of each reside (too coarse!) u Intermediate solutions (e.g., backbone and side chain)
35
Search Strategy Representation tradeoffs u X,Y,Z coordinates l Easy to compute distances between residues l Might represent infeasible solutions u Angles between successive residues l Easy to ensure a “legal” protein l Harder to compute distances
36
Search Strategy Typical approach: u Secondary structure prediction u Attempts at different conformation keeping secondary structure fixed u Finer moves relaxing secondary structure Use u Greedy search u Simulated annealing u …
37
Rosetta Method Idea: l “Structural” signatures are reoccurring within protein structures l Use these as cues during structure search
38
Local structure motifs diverging type-2 turn Serine hairpin Type-I hairpin Frayed helix Proline helix C-cap alpha-alpha corner glycine helix N-cap I-sites Library = a catalog of local sequence-structure correlations
39
Example: Non-polar Alpha-helix
40
Example: Non-polar beta-strand
41
Example: Gly alpha-C-cap Type 1
42
Construction of I-sites library u Construct profiles (PSI-BLAST like) for each solved structure u Collect each possible segments of fixed length (len = 3, 9, 15) u Perform k-means clustering of segments u Check each cluster for a “coherent” structure (in terms of dihedral angles u Prune incoherent structures u Iteratively refine remaining clusters by removing structurally different segments, redefining cluster membership, etc.
43
All proteins can be constructed from fragments Recent experiment: For representative proteins, backbones were assembled from a library of 1000 different 5- residue fragments.
44
Fragment insertion Monte Carlo Energy function change backbone angles Convert to 3D accept or reject Choose a fragment fragments backbone torsion angles Rosetta: a folding simulation program evaluate
45
Sequence dependent features Rosetta’s energy function Residue-residue contact energies are derived from the database
46
Current structure Sequence-independent features The energy score for a contact between secondary structures is summed using database statistics. vector representation Probabilities from the database Rosetta’s energy function
47
Rosetta prediction results 61% “topologically correct” 60% “locally correct” 73% secondary structure (Q3) correct http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
48
Evaluation of partially correct predictions RMSD L=30 L=20 L=8 6.0Å Sequence Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å MDA L=windowsize Teriary structure Local structure mda = maximum deviation in backbone angles over an 8 residue window. Local structure %correct is the fraction of the sequence that has mda < 90°. 90° Sequence
49
T0116 262-322 (61 residues) predictiontrue structure Topologically correct (rmsd=5.9Å) but helix is mis- predicted as loop.
50
T0121 126-199 (66 residues) predictiontrue structure Topologically correct (rmsd=5.9Å) but loop is mis- predicted as helix.
51
T0122 57-153 (97 residues)...contains a 53 residue stretch with max deviation = 96° predictiontrue structure
52
T0112 153-213 Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!! predictiontrue structure (this is rare)
53
Using I-sites library for structure prediction Naïve approach: u Given a sequence, build profile u Score each segment in profile against I-sites Iterate: l Choose highest scoring I-site – segment match, assign dihedral angles based on I-site l Remove all overlapping matches that are inconsistent with assigned angles Use
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.