Download presentation
Presentation is loading. Please wait.
1
Protein Structure Modeling (1)
2
Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL Anfinsen, 1960: denatured proteins can refold to active enzymes
3
Relevance of Protein Structure in the Post-Genome Era sequence structure function medicine
4
Structure-Function Relationship Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism. A predicted structure is a powerful tool for function inference. Trp repressor as a function switch
5
Structure–Function Relationship Do you know any folding diseases?
6
Structure-Based Drug Design HIV protease inhibitor Structure-based rational drug design is still a major method for drug discovery.
7
Protein Structure Prediction Structure: Traditional experimental methods: X-Ray or NMR to solve structures; generate a few structures per day worldwide cannot keep pace for new protein sequences Strong demand for structure prediction: more than 30,000 human genes; 10,000 genomes will be sequenced in the next 10 years. Unsolved problem after efforts of two decades.
8
Protein folding To get from sequence to structure: What principles could one apply?
9
Methods Ab initio modeling Chou-Fasman / GOR PHD: Neural Network Homology modeling Threading
10
Prediction http://www.bmm.icnet.uk/people/rob/CCP11BBS/
11
Ab initio Structure Prediction An energy function to describe the protein bond energy bond angle energy dihedral angle energy van der Waals energy electrostatic energy m. m. Minimize the function and obtain the structure. Not practical yet Algorithm: simulated annealing Computationally very expensive (IBM: Blue gene) Accuracy is still poor Usually used to refine models suggested by other algorithms
12
Part of the problem Free energy minimization: Correctly folded proteins have only marginally less free energy than misfolded proteins
13
Some interesting facts about protein modeling based on primary sequence only accuracy 64% -75% higher accuracy for -helices than strands accuracy is dependent on protein family predictions of engineered proteins are less accurate
14
Principle assumptions The entire information for forming secondary structure is contained in the primary sequence side groups of residues will determine structure examining windows of 13 - 17 residues is sufficient to predict structure basis for window size selection: -helices 5 – 40 residues long -strands 5 – 10 residues long
15
Simplifications Identification of secondary structures focused on -helices -strands turns others (loops, coils, other helices) are collectively called “coils”
16
A surprising result ! Can secondary structure prediction algorithms predict structures of engineered proteins? Test case “the chameleon” sequence Algorithm: PHDsec with alignment (PHD 30) and without alignment (PHD no)
17
The “Chameleon” sequence TEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEK TEAVDAWTVEKAFKTFANDNGVDGAWTVEKAFKTFTVTEK sequence 1 sequence 2 Replace both coloured sequences with engineered peptide (“chameleon”) Source: Minor and Kim 1996, Nature, 380, 730-734 -helix -strand
18
Prediction Methods I.Chou-Fasman / GOR method II.Neural network models
19
I a.Chou-Fasman Method developed by Chou & Fasman in 1974 & 1978 based on frequencies of residues in - helices (H), -sheets (E) and turns Accuracy ~50 - 60% Q3
20
Chou-Fasman Pij-values
21
Improved Chou-Fasman How it works: 1. Assign all of the residues the appropriate set of parameters 2. Identify -helix and -sheet regions. Extend the regions in both directions. 3. If structures overlap compare average values for P(H) and P(E) and assign secondary structure based on best scores. 4. Turns are modeled as tetrapeptides using 2 different probability values.
22
Assign Pij values 1. Assign all of the residues the appropriate set of parameters
23
Scan peptide for helix regions 2.Identify regions where 4/6 have a P(H) >100 “alpha-helix nucleus”
24
Extend -helix nucleus 3.Extend helix in both directions until a set of four residues have an average P(H) <100. Repeat steps 1 – 3 for entire peptide
25
Scan peptide for -sheet regions 4. Identify regions where 3/5 have a P(E) >100 “ -sheet nucleus” 5. Extend -sheet until 4 continuous residues an have an average P(E) < 100 6. If region average > 105 and the average P(E) > average P(H) then “ -sheet”
26
Actual Results CHOFAS predicts protein secondary structure version 2.0u61 September 1998 Please cite: Chou and Fasman (1974) Biochem., 13:222-245 Chou-Fasman plot of @, 12 aa; SEQ1 sequence. TSPTAELMRSTG helix HH sheet EEEEEEE turns T Residue totals: H: 2 E: 7 T: 1 percent: H: 16.7 E: 58.3 T: 8.3 SOURCE: http://fasta.bioch.virginia.edu/o_fasta/cgi/garnier.cgi
27
I b. The GOR method developed by Garnier, Osguthorpe& Robson build on Chou-Fasman Pij values evaluate each residue PLUS adjacent 8 N-terminal and 8 carboxyl-terminal residues sliding window of 17 underpredicts -strand regions GOR III method accuracy ~64% Q3 (Chou-Fasman: 50 – 60%)
28
II.Neural network models -machine learning approach -provide training sets of structures (e.g. - helices, non -helices) -computers are trained to recognize patterns in known secondary structures -provide test set (proteins with known structures) -accuracy ~ 70 –75%
29
Reasons for improved accuracy Align sequence with other related proteins of the same protein family Find members that has a known structure If significant matches between structure and sequence assign secondary structures to corresponding residues
30
How PHD works Step 1. BLAST search with input sequence Step 2. Perform multiple seq. alignment and calculate aa frequencies for each position
31
How PHD works Step 3. First Level: “Sequence to structure net” Input: alignment profile, Output: units for H, E, L Similar to GOR method (window size =13). Calculate “occurrences” of any of the residues to be present in either an -helix, - strand, or loop. 12345671234567 H = 0.05 E = 0.18 L= 0.67 N=0.2, S=0.4, A=0.4
32
How PHD works Step 3. Second Level: “Structure to structure net” Input: First Level values, Output: units for H, E, L Window size = 17 H = 0.59 E = 0.0.9 L= 0.31 E=0.18 Step 4. Decision level
33
II.Neural network models Are able detect interactions between amino acids within a window of amino acids Example: Central aa in a window is = Leu if …AsnLeu… => -helix if…XxxLeu… => -strand
34
II. Prediction tools that use NNs MACMATCH -(Presnell et al., 1993) -for MacIntosh PHD - (Rost & Sander, 1993) www.embl- heidelberg.de/predictprotein/predictprotein.html NNPREDICT -(Kneller et al. 1990) http://www.cmpharm.ucsf.edu/nomi/nnpredict.html
35
Prediction Accuracy Guiness Book Of Records PHDsec 79% accuracy for 31% of the data
36
Measuring prediction accuracy traditional Qindex and Q3 Correlation coefficient –Mathews 1975
37
Qindex Qindex: (Qhelix, Qstrand, Qcoil, Q3) -percentage of residues correctly predicted as - helix, -strand, coil or for all 3 conformations. Draw back: - even a random assignment of structure can achieve a high score (Holley & Karpus 1991)
38
Correlation coefficient True positive p False positive (overpredicted ) o True negative n False negative (misses ) u C = 1 (=100%) The closer to 1 the more succesful the prediction Example: calculation for -helix
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.