Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Similar presentations


Presentation on theme: ". Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more."— Presentation transcript:

1 . Protein Structure Prediction

2 Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more or less) stable 3-dimensional configuration

3 Why Structure is Important? The structure a protein takes is crucial for its function u Forms “pockets” that can recognize an enzyme substrate u Situates side chain of specific groups to co-locate to form areas with desired chemical/electrical properties u Creates firm structures such as collagen, keratins, fibroins

4 Determining Structure u X-Ray and NMR methods allow to determine the structure of proteins and protein complexes u These methods are expensive and difficult l Could take several work months to process one proteins u A centralized database (PDB) contains all solved protein structures l XYZ coordinate of atoms within specified precision l ~19,000 solved structures

5 Growth of the Protein Data Bank

6 Structure is Sequence Dependent u Experiments show that for many proteins, the 3- dimensional structure is a function of the sequence l Force the protein to loose its structure, by introducing agents that change the environment l After sequences put back in water, original conformation/activity is restored u However, for complex proteins, there are cellular processes that “help” in folding

7 Amino Acids

8 What Forces Hold the Structure? u Structure is supported by several types of chemical bonds/forces l Hydrogen Bonds

9 What Forces Hold the Structure? u Charge-charge interactions l Positive charged groups prefer to be situated against negatively charged groups

10 What Forces Hold the Structure? u Disulfide bonds l S-S bonds between cysteine residues l These form during folding

11 What Forces Hold the Structure? u Hydrophobic effect

12 Levels of structure

13 Secondary Structure  -helix  -strands

14 Hydrogen Bonds in  -Helixes

15  -Strands form Sheets parallel Anti-parallel These sheets hold together by hydrogen bonds across strands

16 Angular Coordinates u Secondary structures force specific angles between residues

17 Ramachandran Plot u We can related angles to types of structures

18 Labeling Secondary Structure u Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids l These do not lead to absolute definition of secondary structure

19 Prediction of Secondary Structure Input: u amino-acid sequence Output: u Annotation sequence of three classes: l alpha l beta l other (sometimes called coil/turn) Measure of success: u Percentage of residues that were correctly labeled

20 Protein Folds: sequential, spatial and topological arrangement of secondary structures The Globin fold

21 Approaches for structure prediction Homology modeling l (25-30% identity as a predictor) Fold recognition l Remote homology Ab initio Prediction l Heavy computations

22 Newly Determined Structures- Fraction of New Folds

23 Fraction of new folds (PDB new entries in 1998) Koppensteiner et al., 2000, JMB 296:1139-1152.

24 A Finite Number of Protein Folds Aim: recognize fold that “matches” a given sequence Approaches: l PSI-Blast, Profile HMMs, etc. l Threading

25 E ab E ab A C D E ….. A -3 -1 0 0.. C -1 -4 1 2.. D 0 1 5 6.. E 0 2 6 7....... ACCECADAAC -3-1-4-4-1-4-3-3=-23 structural template structural template neighbor definition neighbor definition energy function energy function 1 2 345 6 7 10 8 9 A C CEC A D A A C Threading: Essential components

26 MAHFPGFGQSLLFGYPVYVFGD... Potential fold... 1)... 56)... n)... -10... -123... 20.5 Find best fold for a protein sequence: Fold recognition (threading)

27 GenTHREADER (Jones, 1999, JMB 287:797-815) For each template provide MSA l align the query sequence with the MSA l assess the alignment by sequence alignment score l assess the alignment by pairwise potentials l assess the alignment by solvation function l record lengths of: alignment, query, template

28 Essentials of GenTHREADER

29 Ab-initio Structure Recognition Goal: l Predict structure from “first principles” Benefits: l Works for novel folds l Shows that we understand the process

30 Approaches to Ab-initio Prediction Molecular Dynamics u Simulates the forces that governs the protein within water u Since proteins natural fold, this would lead to solved structure Problems: u Thousands of atoms u Huge number of time steps to reach folded protein  Intractable problem

31 Approaches to Ab-initio Prediction Minimal Energy u Assumption: folded form is the minimal energy conformation of the protein Decomposition: u Define energy function u Search for 3-D conformation that minimize energy

32 Energy Function u Account for the forces that apply on the molecule l Van der wals forces l Covalent bonds l Hydrogen bonds l Charges l Hydrophobic effects Issues: u Estimating parameters u How do we compute it --- O( (# atoms)^2 )

33 Simplified Energy Functions Different levels of granularity u Residue-Residue energy function (Bead model) u Partial model l Backbone as a bid l Side-chain as a rigid body that can move wrt to backbone u Many other variants

34 Search Strategy u High dimensional search problem How do we represent partial solutions? u Position of each atom (too detailed!) u Position of each reside (too coarse!) u Intermediate solutions (e.g., backbone and side chain)

35 Search Strategy Representation tradeoffs u X,Y,Z coordinates l Easy to compute distances between residues l Might represent infeasible solutions u Angles between successive residues l Easy to ensure a “legal” protein l Harder to compute distances

36 Search Strategy Typical approach: u Secondary structure prediction u Attempts at different conformation keeping secondary structure fixed u Finer moves relaxing secondary structure Use u Greedy search u Simulated annealing u …

37 Rosetta Method Idea: l “Structural” signatures are reoccurring within protein structures l Use these as cues during structure search

38 Local structure motifs diverging type-2 turn Serine hairpin Type-I hairpin Frayed helix Proline helix C-cap alpha-alpha corner glycine helix N-cap I-sites Library = a catalog of local sequence-structure correlations

39 Example: Non-polar Alpha-helix

40 Example: Non-polar beta-strand

41 Example: Gly alpha-C-cap Type 1

42 Construction of I-sites library u Construct profiles (PSI-BLAST like) for each solved structure u Collect each possible segments of fixed length (len = 3, 9, 15) u Perform k-means clustering of segments u Check each cluster for a “coherent” structure (in terms of dihedral angles u Prune incoherent structures u Iteratively refine remaining clusters by removing structurally different segments, redefining cluster membership, etc.

43 All proteins can be constructed from fragments Recent experiment: For representative proteins, backbones were assembled from a library of 1000 different 5- residue fragments.

44 Fragment insertion Monte Carlo Energy function change backbone angles Convert to 3D accept or reject Choose a fragment fragments backbone torsion angles Rosetta: a folding simulation program evaluate

45 Sequence dependent features Rosetta’s energy function Residue-residue contact energies are derived from the database

46 Current structure Sequence-independent features The energy score for a contact between secondary structures is summed using database statistics. vector representation Probabilities from the database Rosetta’s energy function

47 Rosetta prediction results 61% “topologically correct” 60% “locally correct” 73% secondary structure (Q3) correct http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php

48 Evaluation of partially correct predictions RMSD L=30 L=20 L=8 6.0Å Sequence Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å MDA L=windowsize Teriary structure Local structure mda = maximum deviation in backbone angles over an 8 residue window. Local structure %correct is the fraction of the sequence that has mda < 90°. 90° Sequence

49 T0116 262-322 (61 residues) predictiontrue structure Topologically correct (rmsd=5.9Å) but helix is mis- predicted as loop.

50 T0121 126-199 (66 residues) predictiontrue structure Topologically correct (rmsd=5.9Å) but loop is mis- predicted as helix.

51 T0122 57-153 (97 residues)...contains a 53 residue stretch with max deviation = 96° predictiontrue structure

52 T0112 153-213 Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!! predictiontrue structure (this is rare)

53 Using I-sites library for structure prediction Naïve approach: u Given a sequence, build profile u Score each segment in profile against I-sites Iterate: l Choose highest scoring I-site – segment match, assign dihedral angles based on I-site l Remove all overlapping matches that are inconsistent with assigned angles Use


Download ppt ". Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more."

Similar presentations


Ads by Google