Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.

Similar presentations


Presentation on theme: "Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor."— Presentation transcript:

1 Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor :Prof. R. C. T. Lee CSIE National Chi Nan University

2 Outline Concepts Introduction Approach Example Conclusions Reference

3 polypeptides ->amino acids
4 Concepts 80,000 5 Cell, Chromosome, DNA, Gene trillions Protein synthesis 23 pairs 6 polypeptides ->amino acids 3 billions 1 DNA words 3 2

4 Concepts Protein Synthesis Transcription Enzyme Messenger RNA

5 Concepts Protein Synthesis Translation

6 Concepts

7

8 Concepts Amino acid

9 Concepts Peptide bound

10 Concepts

11 Concepts Protein Primary structure Secondary structure
Tertiary structure Quaternary structure

12 Introduction What is protein? Why do we prediction protein structure?
X-ray NMR Known protein structure(22-Oct-2002 ) How?

13 Introduction Method of protein structure prediction AI Neural Network
PHI-PSI Potential Energy Statistical method

14 Introduction To determine the native folded state of a
protein given only the primary sequence of amino acids is referred to as the protein folding problem.

15 Introduction The protein folding problem is, given an
amino acid sequence, to find its correctly folded 3D protein structure. Protein Folding in the Hydrophobic- Hydrophilic(HP) Model is NP-Complete [BL98].

16 Given a test protein sequence, we want
to compare every part of it against every part of every protein in the database, then to select some similar parts of proteins in the database.

17 The Basic Algorithm Step 1:
Specify the initial parameters, such as the initial windows size W, the window weight pattern P, and N, the number of best matches to keep.

18 2.The five and seven are good choices for the initial window size.
1.Large or small 2.The five and seven are good choices for the initial window size. 3.A smaller windows is used in finding the best matches for prediction of the next larger window.

19 The Weight Pattern: 1 2 3 4 5 P

20 The Basic Algorithm Step 2:
Move the window over the test protein sequence, And at each position, extract an amino acid segment S of length W, and do:

21 The Basic Algorithm 2-1. set the window size in every processor to
be of length W; 2-2. send S to every processor; 2-3. match S against all si , i=1,2,..,m in all the processors, and compute a score for each si using a scoring function; 2-4. select the N segments from {s1,…,sm} which have the highest N scores.

22 Compute a score:

23 Why do we bother to use the top N matches rather
than just the one with the highest score? Among the top N matchers, the majority have a similar structure, then the input will at least have the tendency to form that structure as well.

24 The Basic Algorithm Step 3:
If the recursive mode is chosen, adjust the parameters (e.g. the window size) and repeat Step 2 unless the end conditions are met or PHI-PSI has gone though a pre-specified number Recursive levels.

25 Example: Step 1: Initial parameters W : 5 N : 2 Recursive level=1 Sr=0

26 The Weight Pattern: 1 2 3 4 5 P

27 The layout of the known protein structure data
Step 2-1: The layout of the known protein structure data on the Connection Machine A L G G P E P Y KP1: A L G P -64.19 106.63 -66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 G P E Y -66.44 -92.02 -70.98 -84.58 163.55 -6.88 140.07 141.20 120.99 PHI PSI Processor 1 Processor 4

28 A L G G A S E W … … KP2: A L G G A S E W PHI PSI Processor 5
-61.48 -94.70 83.20 -28.65 3.88 22.82 -8.01 142.77 G A S E W -61.15 -8.01 142.77 -37.26 171.98 120.48 PHI PSI Processor 5 Processor 8

29 P1: A L G G P P2: L G G P E P3: G G P E P P4: G P E P Y P5: A L G G A P6: L G G A S P7: G G A S E P8: G A S E W

30 Step 2-2: Testing protein sequence : ALGGPNAWTG A L G G P N A W T G S : ALGGP Send S to P1~P8

31 P1: A L G G P P2: L G G P E P3: G G P E P P4: G P E P Y P5: A L G G A P6: L G G A S P7: G G A S E P8: G A S E W ALGGP

32 Step 2-3 S: ALGGP P1: ALGGP

33

34 S: ALGGP P2: LGGPE

35 S: ALGGP P3: GGPEP

36 S: ALGGP P4: GPEPY

37 S: ALGGP P5: ALGGA

38 Step 2-4: Score 1=9 Score 2=3 Score 3=1.5 Score 4=0 Score 5=7.5

39 S: ALGGP A L G P A L G P1: PHI PSI P5: PHI PSI -64.19 -100.49 106.63
-66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 PHI PSI P5: A L G -61.48 -94.70 83.20 -28.65 3.88 22.82 -8.01 142.77 PHI PSI

40 S: test protein A L G P A L G P N W T PHI PSI PHI PSI -64.19 -100.49
106.63 -66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 PHI PSI test protein A L G P N W T -64.19 106.63 -66.44 -92.02 -66.18 -73.71 -85.96 -33.26 8.49 0.20 163.55 -6.88 116.38 155.62 125.74 18.36 PHI PSI

41 Step 3: if Sr<=recursive level then W=W+2 Sr++ go to Step 2 else end

42 The Weight Pattern: 1 2 3 4 5 6 7 P

43 The layout of the known protein structure data
Step 2-1: The layout of the known protein structure data on the Connection Machine A L G G P E P Y KP1: A L G P E -64.19 106.63 -66.44 -92.02 -70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 Processor 1 L G P E Y 106.63 -66.44 -92.02 -70.98 -84.58 8.49 0.20 163.55 -6.88 140.07 141.20 120.99 Processor 2

44 A L G G A S E W KP2: A L G S E L G A S E W Processor 3 Processor 4
-61.48 -94.70 83.20 -61.15 -28.65 3.88 22.82 -8.01 142.77 -37.26 171.98 Processor 3 L G A S E W -94.70 83.20 -61.15 3.88 22.82 -8.01 142.77 -37.26 171.98 120.48 Processor 4

45 Step 2-2: Testing protein sequence : AALGGPNA… A L G G P N A S : ALGGPNA Send S to P1~P4

46 P1: A L G G P E P P2: L G G P E P Y P3: A L G G A S E P4: L G G A S E W A L G G P N A

47 Step 2-3 S: ALGGPNA P1: ALGGPEP

48

49 S: ALGGPNA P2: LGGPEPY

50

51 S: ALGGPNA P3: ALGGASE

52

53 S: ALGGPNA P4: LGGASEW

54

55 Step 2-4: Score 1=-74.9 Score 2=-970.63 Score 3=-1592.74

56 A L G P E L G A S E W Processor 1 Processor 4 -64.19 -100.49 106.63
-66.44 -92.02 -70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 Processor 1 L G A S E W -94.70 83.20 -61.15 3.88 22.82 -8.01 142.77 -37.26 171.98 120.48 Processor 4

57 test protein A L G P N PHI PSI -64.19 -100.49 106.63 -66.44 -92.02
-70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 PHI PSI

58 Prediction errors The prediction errors are measured
in terms of PHI and PSI angles. There are several ways to measure the errors, such as: Residue error Overall errors

59 Residue errors – the difference between
the real angle values computed from the 3D coordinates and the values predicted by the algorithm for a particular residue in a protein. Overall errors – the average of the residue errors of all the proteins in the database.

60 Conclusions Secondary Structure Prediction

61

62 Reference Protein Structure Prediction PDB (Protein Data Bank)
[BL98] Protein Folding in the Hydrophobic-Hydrophilic(HP) Model is NP-Complete, Berger, B. and Leighton, T., Journal of Computational Biology, Vol. 5, No. 1, 1998, pp Protein Structure Prediction PDB (Protein Data Bank)

63 Thank you


Download ppt "Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor."

Similar presentations


Ads by Google