Download presentation
Presentation is loading. Please wait.
Published byOrsola Pini Modified over 5 years ago
1
Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor :Prof. R. C. T. Lee CSIE National Chi Nan University
2
Outline Concepts Introduction Approach Example Conclusions Reference
3
polypeptides ->amino acids
4 Concepts 80,000 5 Cell, Chromosome, DNA, Gene trillions Protein synthesis 23 pairs 6 polypeptides ->amino acids 3 billions 1 DNA words 3 2
4
Concepts Protein Synthesis Transcription Enzyme Messenger RNA
5
Concepts Protein Synthesis Translation
6
Concepts
8
Concepts Amino acid
9
Concepts Peptide bound
10
Concepts
11
Concepts Protein Primary structure Secondary structure
Tertiary structure Quaternary structure
12
Introduction What is protein? Why do we prediction protein structure?
X-ray NMR Known protein structure(22-Oct-2002 ) How?
13
Introduction Method of protein structure prediction AI Neural Network
PHI-PSI Potential Energy Statistical method …
14
Introduction To determine the native folded state of a
protein given only the primary sequence of amino acids is referred to as the protein folding problem.
15
Introduction The protein folding problem is, given an
amino acid sequence, to find its correctly folded 3D protein structure. Protein Folding in the Hydrophobic- Hydrophilic(HP) Model is NP-Complete [BL98].
16
Given a test protein sequence, we want
to compare every part of it against every part of every protein in the database, then to select some similar parts of proteins in the database.
17
The Basic Algorithm Step 1:
Specify the initial parameters, such as the initial windows size W, the window weight pattern P, and N, the number of best matches to keep.
18
2.The five and seven are good choices for the initial window size.
1.Large or small 2.The five and seven are good choices for the initial window size. 3.A smaller windows is used in finding the best matches for prediction of the next larger window.
19
The Weight Pattern: 1 2 3 4 5 P
20
The Basic Algorithm Step 2:
Move the window over the test protein sequence, And at each position, extract an amino acid segment S of length W, and do:
21
The Basic Algorithm 2-1. set the window size in every processor to
be of length W; 2-2. send S to every processor; 2-3. match S against all si , i=1,2,..,m in all the processors, and compute a score for each si using a scoring function; 2-4. select the N segments from {s1,…,sm} which have the highest N scores.
22
Compute a score:
23
Why do we bother to use the top N matches rather
than just the one with the highest score? Among the top N matchers, the majority have a similar structure, then the input will at least have the tendency to form that structure as well.
24
The Basic Algorithm Step 3:
If the recursive mode is chosen, adjust the parameters (e.g. the window size) and repeat Step 2 unless the end conditions are met or PHI-PSI has gone though a pre-specified number Recursive levels.
25
Example: Step 1: Initial parameters W : 5 N : 2 Recursive level=1 Sr=0
26
The Weight Pattern: 1 2 3 4 5 P
27
The layout of the known protein structure data
Step 2-1: The layout of the known protein structure data on the Connection Machine A L G G P E P Y KP1: … A L G P -64.19 106.63 -66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 G P E Y -66.44 -92.02 -70.98 -84.58 163.55 -6.88 140.07 141.20 120.99 PHI PSI … Processor 1 Processor 4
28
A L G G A S E W … … KP2: A L G G A S E W PHI PSI Processor 5
-61.48 -94.70 83.20 -28.65 3.88 22.82 -8.01 142.77 G A S E W -61.15 -8.01 142.77 -37.26 171.98 120.48 PHI PSI … Processor 5 Processor 8
29
P1: A L G G P P2: L G G P E P3: G G P E P P4: G P E P Y P5: A L G G A P6: L G G A S P7: G G A S E P8: G A S E W
30
Step 2-2: Testing protein sequence : ALGGPNAWTG A L G G P N A W T G S : ALGGP Send S to P1~P8
31
P1: A L G G P P2: L G G P E P3: G G P E P P4: G P E P Y P5: A L G G A P6: L G G A S P7: G G A S E P8: G A S E W ALGGP
32
Step 2-3 S: ALGGP P1: ALGGP
34
S: ALGGP P2: LGGPE
35
S: ALGGP P3: GGPEP
36
S: ALGGP P4: GPEPY
37
S: ALGGP P5: ALGGA
38
Step 2-4: Score 1=9 Score 2=3 Score 3=1.5 Score 4=0 Score 5=7.5
39
S: ALGGP A L G P A L G P1: PHI PSI P5: PHI PSI -64.19 -100.49 106.63
-66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 PHI PSI P5: A L G -61.48 -94.70 83.20 -28.65 3.88 22.82 -8.01 142.77 PHI PSI
40
S: test protein A L G P A L G P N W T PHI PSI PHI PSI -64.19 -100.49
106.63 -66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 PHI PSI test protein A L G P N W T -64.19 106.63 -66.44 -92.02 -66.18 -73.71 -85.96 -33.26 8.49 0.20 163.55 -6.88 116.38 155.62 125.74 18.36 PHI PSI
41
Step 3: if Sr<=recursive level then W=W+2 Sr++ go to Step 2 else end
42
The Weight Pattern: 1 2 3 4 5 6 7 P
43
The layout of the known protein structure data
Step 2-1: The layout of the known protein structure data on the Connection Machine A L G G P E P Y KP1: A L G P E -64.19 106.63 -66.44 -92.02 -70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 Processor 1 L G P E Y 106.63 -66.44 -92.02 -70.98 -84.58 8.49 0.20 163.55 -6.88 140.07 141.20 120.99 Processor 2
44
A L G G A S E W KP2: A L G S E L G A S E W Processor 3 Processor 4
-61.48 -94.70 83.20 -61.15 -28.65 3.88 22.82 -8.01 142.77 -37.26 171.98 Processor 3 L G A S E W -94.70 83.20 -61.15 3.88 22.82 -8.01 142.77 -37.26 171.98 120.48 Processor 4
45
Step 2-2: Testing protein sequence : AALGGPNA… A L G G P N A S : ALGGPNA Send S to P1~P4
46
P1: A L G G P E P P2: L G G P E P Y P3: A L G G A S E P4: L G G A S E W A L G G P N A
47
Step 2-3 S: ALGGPNA P1: ALGGPEP
49
S: ALGGPNA P2: LGGPEPY
51
S: ALGGPNA P3: ALGGASE
53
S: ALGGPNA P4: LGGASEW
55
Step 2-4: Score 1=-74.9 Score 2=-970.63 Score 3=-1592.74
56
A L G P E L G A S E W Processor 1 Processor 4 -64.19 -100.49 106.63
-66.44 -92.02 -70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 Processor 1 L G A S E W -94.70 83.20 -61.15 3.88 22.82 -8.01 142.77 -37.26 171.98 120.48 Processor 4
57
test protein A L G P N PHI PSI -64.19 -100.49 106.63 -66.44 -92.02
-70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 PHI PSI
58
Prediction errors The prediction errors are measured
in terms of PHI and PSI angles. There are several ways to measure the errors, such as: Residue error Overall errors
59
Residue errors – the difference between
the real angle values computed from the 3D coordinates and the values predicted by the algorithm for a particular residue in a protein. Overall errors – the average of the residue errors of all the proteins in the database.
60
Conclusions Secondary Structure Prediction
62
Reference Protein Structure Prediction PDB (Protein Data Bank)
[BL98] Protein Folding in the Hydrophobic-Hydrophilic(HP) Model is NP-Complete, Berger, B. and Leighton, T., Journal of Computational Biology, Vol. 5, No. 1, 1998, pp Protein Structure Prediction PDB (Protein Data Bank)
63
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.