Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.

Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor :Prof. R. C. T. Lee CSIE National Chi Nan University

Outline Concepts Introduction Approach Example Conclusions Reference

polypeptides ->amino acids
4 Concepts 80,000 5 Cell, Chromosome, DNA, Gene trillions Protein synthesis 23 pairs 6 polypeptides ->amino acids 3 billions 1 DNA words 3 2

Concepts Protein Synthesis Transcription Enzyme Messenger RNA

Concepts Protein Synthesis Translation

Concepts

Concepts Amino acid

Concepts Peptide bound

Concepts

Concepts Protein Primary structure Secondary structure
Tertiary structure Quaternary structure

Introduction What is protein? Why do we prediction protein structure?
X-ray NMR Known protein structure(22-Oct-2002 ) How?

Introduction Method of protein structure prediction AI Neural Network
PHI-PSI Potential Energy Statistical method …

Introduction To determine the native folded state of a
protein given only the primary sequence of amino acids is referred to as the protein folding problem.

Introduction The protein folding problem is, given an
amino acid sequence, to find its correctly folded 3D protein structure. Protein Folding in the Hydrophobic- Hydrophilic(HP) Model is NP-Complete [BL98].

Given a test protein sequence, we want
to compare every part of it against every part of every protein in the database, then to select some similar parts of proteins in the database.

The Basic Algorithm Step 1:
Specify the initial parameters, such as the initial windows size W, the window weight pattern P, and N, the number of best matches to keep.

2.The five and seven are good choices for the initial window size.
1.Large or small 2.The five and seven are good choices for the initial window size. 3.A smaller windows is used in finding the best matches for prediction of the next larger window.

The Weight Pattern: 1 2 3 4 5 P

Move the window over the test protein sequence, And at each position, extract an amino acid segment S of length W, and do:

The Basic Algorithm 2-1. set the window size in every processor to
be of length W; 2-2. send S to every processor; 2-3. match S against all si , i=1,2,..,m in all the processors, and compute a score for each si using a scoring function; 2-4. select the N segments from {s1,…,sm} which have the highest N scores.

Compute a score:

Why do we bother to use the top N matches rather
than just the one with the highest score? Among the top N matchers, the majority have a similar structure, then the input will at least have the tendency to form that structure as well.

If the recursive mode is chosen, adjust the parameters (e.g. the window size) and repeat Step 2 unless the end conditions are met or PHI-PSI has gone though a pre-specified number Recursive levels.

Example: Step 1: Initial parameters W : 5 N : 2 Recursive level=1 Sr=0

The Weight Pattern: 1 2 3 4 5 P

The layout of the known protein structure data
Step 2-1: The layout of the known protein structure data on the Connection Machine A L G G P E P Y KP1: … A L G P -64.19 106.63 -66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 G P E Y -66.44 -92.02 -70.98 -84.58 163.55 -6.88 140.07 141.20 120.99 PHI PSI … Processor 1 Processor 4

A L G G A S E W … … KP2: A L G G A S E W PHI PSI Processor 5
-61.48 -94.70 83.20 -28.65 3.88 22.82 -8.01 142.77 G A S E W -61.15 -8.01 142.77 -37.26 171.98 120.48 PHI PSI … Processor 5 Processor 8

P1: A L G G P P2: L G G P E P3: G G P E P P4: G P E P Y P5: A L G G A P6: L G G A S P7: G G A S E P8: G A S E W

Step 2-2: Testing protein sequence : ALGGPNAWTG A L G G P N A W T G S : ALGGP Send S to P1~P8

P1: A L G G P P2: L G G P E P3: G G P E P P4: G P E P Y P5: A L G G A P6: L G G A S P7: G G A S E P8: G A S E W ALGGP

Step 2-3 S: ALGGP P1: ALGGP

S: ALGGP P2: LGGPE

S: ALGGP P3: GGPEP

S: ALGGP P4: GPEPY

S: ALGGP P5: ALGGA

Step 2-4: Score 1=9 Score 2=3 Score 3=1.5 Score 4=0 Score 5=7.5

S: ALGGP A L G P A L G P1: PHI PSI P5: PHI PSI -64.19 -100.49 106.63
-66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 PHI PSI P5: A L G -61.48 -94.70 83.20 -28.65 3.88 22.82 -8.01 142.77 PHI PSI

S: test protein A L G P A L G P N W T PHI PSI PHI PSI -64.19 -100.49
106.63 -66.44 -92.02 -33.26 8.49 0.20 163.55 -6.88 PHI PSI test protein A L G P N W T -64.19 106.63 -66.44 -92.02 -66.18 -73.71 -85.96 -33.26 8.49 0.20 163.55 -6.88 116.38 155.62 125.74 18.36 PHI PSI

Step 3: if Sr<=recursive level then W=W+2 Sr++ go to Step 2 else end

The Weight Pattern: 1 2 3 4 5 6 7 P

The layout of the known protein structure data
Step 2-1: The layout of the known protein structure data on the Connection Machine A L G G P E P Y KP1: A L G P E -64.19 106.63 -66.44 -92.02 -70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 Processor 1 L G P E Y 106.63 -66.44 -92.02 -70.98 -84.58 8.49 0.20 163.55 -6.88 140.07 141.20 120.99 Processor 2

A L G G A S E W KP2: A L G S E L G A S E W Processor 3 Processor 4
-61.48 -94.70 83.20 -61.15 -28.65 3.88 22.82 -8.01 142.77 -37.26 171.98 Processor 3 L G A S E W -94.70 83.20 -61.15 3.88 22.82 -8.01 142.77 -37.26 171.98 120.48 Processor 4

Step 2-2: Testing protein sequence : AALGGPNA… A L G G P N A S : ALGGPNA Send S to P1~P4

P1: A L G G P E P P2: L G G P E P Y P3: A L G G A S E P4: L G G A S E W A L G G P N A

Step 2-3 S: ALGGPNA P1: ALGGPEP

S: ALGGPNA P2: LGGPEPY

S: ALGGPNA P3: ALGGASE

S: ALGGPNA P4: LGGASEW

Step 2-4: Score 1=-74.9 Score 2=-970.63 Score 3=-1592.74

A L G P E L G A S E W Processor 1 Processor 4 -64.19 -100.49 106.63
-66.44 -92.02 -70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 Processor 1 L G A S E W -94.70 83.20 -61.15 3.88 22.82 -8.01 142.77 -37.26 171.98 120.48 Processor 4

test protein A L G P N PHI PSI -64.19 -100.49 106.63 -66.44 -92.02
-70.98 -33.26 8.49 0.20 163.55 -6.88 140.07 141.20 PHI PSI

Prediction errors The prediction errors are measured
in terms of PHI and PSI angles. There are several ways to measure the errors, such as: Residue error Overall errors

Residue errors – the difference between
the real angle values computed from the 3D coordinates and the values predicted by the algorithm for a particular residue in a protein. Overall errors – the average of the residue errors of all the proteins in the database.

Conclusions Secondary Structure Prediction

Reference Protein Structure Prediction PDB (Protein Data Bank)
[BL98] Protein Folding in the Hydrophobic-Hydrophilic(HP) Model is NP-Complete, Berger, B. and Leighton, T., Journal of Computational Biology, Vol. 5, No. 1, 1998, pp Protein Structure Prediction PDB (Protein Data Bank)

Thank you

Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.

Similar presentations

Presentation on theme: "Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.

Similar presentations

Presentation on theme: "Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor."— Presentation transcript:

Similar presentations

About project

Feedback