Presentation is loading. Please wait.

Presentation is loading. Please wait.

? Peter Smooker, Heiko Schröder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Aravinthan, Gad Abraham, Abdullah Al Amin, Nalinda,

Similar presentations


Presentation on theme: "? Peter Smooker, Heiko Schröder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Aravinthan, Gad Abraham, Abdullah Al Amin, Nalinda,"— Presentation transcript:

1 ? Peter Smooker, Heiko Schröder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Aravinthan, Gad Abraham, Abdullah Al Amin, Nalinda, Prashant A new approach to protein structure prediction ? ? ? ? ?

2 What’s on today? Predicting protein structures Fast implementation Special purpose HPC Searching for structural similarity Visualisation of proteins Lots of speculation, some results!

3 Aim: Prediction of protein structures Common methods: Homology modelling – > 30% match  similar fold Molecular modelling – only for small molecules Crystallography very expensive, very slow and not always possible. Only few structures are known and we are falling behind (<1%). Major efforts are being made: e.g. Blue-Gene (fastest supercomputer (IBM)) Linear time method?

4 Genetic sequence databases are growing exponentially (maybe not?) Growth rate will continue, since multiple concurrent genome projects have begun, with more to come Motivation 120% 45% 15%

5 Full Genome Comparison related Organisms, but Tuberculosis causes a disease  find common and different parts 16  10 6 pair-wise sequence comparisons More clever ways? – I guess! Many Genome-Genome Comparisons will be required in the near future 3918 Protein Sequences 1.329.298 AminoAcids 4289 Protein Sequences 1.359.008 AminoAcids

6 Homology Modeling Discovered sequences are analyzed by comparison with databases Complexity of sequence comparison is proportional to the product of query size times database size Analysis too slow on sequential computers  Analysis too slow on sequential computers Two possible approaches –Heuristics –Heuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results –Parallel Processing –Parallel Processing, get high-quality results in reasonable time

7 Protein Sequence Alignment BLAST, FastA, Smith-Waterman GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV |||::::| : |::| ||:::||||:|:|||:: ::| |:::: BLAST FastA Smith- Waterman Slower Faster Search Speed Data Quality LowerHigher T=O(|S|)

8 Smith-Waterman Algorithm ATCTCGTATGATGGTCTATCAC Align S1=ATCTCGTATGATG S2=GTCTATCAC  G T C T A T C A C  ATCTCGTATGATG 000002100210 0 0 0 0 0 0 0 0 0 0 0000000000000 2 0212114321132 0 0 2 1 0 2 1 1 2 2 4 3 2 1 4 3 2 3 6 5 4 3 6 5 4 5 5 4 4 5 5 4 6 5 7 3 4 4 4 5 5 6 3 5 4 6 5 4 5 3 4 7 5 5 7 6 2 5 6 9 8 7 6 1 4 5 8 8 7 6 0 3 6 7 7 10 9 2 2 5 8 7 9 9 2 1 4 7 7 8 8 8 9 7 5 34 2 0  =1,  =1 A T C T C G T A T G A T G G T C T A T C A C G T C  T A T C A C

9

10 Context sensitivity!

11 Protein folding Our approach: Linear method – we do not compute electromagnetic fields nature has done it for us! Physical forces have short range (decreasing quadratic with the distance) → context sensitivity: Find the same protein with the same context in the database – copy that structure.

12 Dihedral Angles The 6 atoms in each peptide unit lie in the same plane  φ and  are free to rotate The structure of a protein is almost totally determined, if all angles φ and  are known

13 φ  Abdullah Al Amin

14 φ

15

16 Ramachandran Plots # choices ALA ARG ASN GLN CYS GLY HIS ASP GLU LYS PRO ILE LEU PHE MET SER THR TRP VAL TYR 3 4? 3 5 2 22 2 Abdullah Al Amin

17 val-’val-ile val-’val-val val-’val-asn Σ val-’val-xxx Which φ ? Abdullah Al Amin

18

19 φ val-val-ala  φ →  same AA φ →  neighbour Abdullah Al Amin

20 GLU-CYS’-SER  GLU-’CYS-SER φ GLU-CYS’-ALA  GLU-’CYS-ALA φ confidence # peaks? Abdullah Al Amin

21 Complexity – Reducing the size of search space Reducing the number of peaks. 2 x size of search space 2 X-Y assuming we have predicted Y angles with high confidence Our aim: Large Y (Y=X is not possible) Method: Increase the context Problem: Longer the context → fewer matches Example: 20 k different sequences of length k. E k =|PDB|/20 k. k=3, E 3 =1000. k=5, E 5 =3. k=9, E 9 =1/50000.

22 I I O ALA LYS SER O O I (E=20)  reduce number of peaks Different lists for different groups of proteins? (inside cells, outside cells), Saravanan  reduce number of peaks Short and perfect  to longer and less perfect? Rajalingam Aravinthan, Gad Abraham  reduce number of peaks Reduce the size of the search space! Hydrophobic (O) Hydrophil (I) Which context??

23 13 7 9 3 Rajalingam Aravinthan Gad Abraham

24 Prediction based on length 3

25 φ   -Helix Abdullah Al Amin

26 Why 9?

27 Suffix trie for abcacbcabacb (all suffixes up to length 4). c a b a cbcba cacbbc cc a bc a b c a Find all strings that are similar to aacb (tolerance 1). 0 1 a Breadth first search! 1 1 11 111 1 1 0 b 1 Suffix trie and suffix tree – fast search! a a c b Prashant

28 Parallel Architectures for Bioinformatics Embedded Massively Parallel Accelerators –Systola 1024: PC add-on board with 1024 processors (ISATEC, Germany) –Fuzion 150: 1536 processors on a single chip (Clearspeed Technology, UK) –FPGA ?

29 Parallel Architectures for Bioinformatics High speed Myrinet switch Systola 1024 –Supercomputer performance at low cost Hybrid Computer –combines SIMD and MIMD paradigm within a parallel architecture  Hybrid Computer

30 Speculation: Finding similar structures based on sequences of φs and  s. We could search for a structure that has a high degree of similarity with a predicted structure (instead of similarity of the sequence – particularly in hydrophobic parts). Modify Smith-Waterman: What should be the penalty for gaps (do gaps make any sense?) – how do we treat confidence information?

31 Smith-Waterman Algorithm ATCTCGTATGATGGTCTATCAC Align S1=ATCTCGTATGATG S2=GTCTATCAC  =1,  =1 A T C T C G T A T G A T G G T C T A T C A C G T C  T A T C A C 000002100210 0 0 0 0 0 0 0 0 0 0 0000000000000 2 0212114321132 0 0 2 1 0 2 1 1 2 2 4 3 2 1 4 3 2 3 6 5 4 3 6 5 4 5 5 4 4 5 5 4 6 5 7 3 4 4 4 5 5 6 3 5 4 6 5 4 5 3 4 7 5 5 7 6 2 5 6 9 8 7 6 1 4 5 8 8 7 6 0 3 6 7 7 10 9 2 2 5 8 7 9 9 2 1 4 7 7 8 8 8 9 7 5 34 2 0  G T C T A T C A C  ATCTCGTATGATG H function 1            )2,1()1,1( 1)1,( ),1( 0 max),( ji SSSbtjiH jiH jiH jiH ???

32 degrees difference Nalinda 1500 Score = ------------------------- - 10 50 + (| ai – aj | x 0.9) 2  =  = -10

33 Nalinda

34 Look ahead

35 Visualisation tool Sequence of dihedral angles  Structure of protein  Visualise structure  Indicate confidence  Translate change of dihedral angle into change of 3D-structure  Emphasise physical collisions  Show positions for potential S-S bonds and hydrogen bonds  Show fields?

36 Speculation: Simulation of the folding process: Predict the structure of the following hydrophobic subsequence – needs to be tested whether hydrophobicity is highly correlated with being “inside a protein”. Mark all positions of cysteines Mark all positions of potential hydrogen bonds Simulate the bending process Look for similar structures “up to here similar” Compare structures of identical O/I sequences Compare surfaces (cut protein at a hydrophil position and look at the set of exposed hydrophobic amino acids) Develop an algorithm to determine structural similarity, either based on dihedral angles or on Euclidian positions using dynamic programming. With such an algorithm similar “surroundings” can be found. Do new parts deform old parts significantly?

37 ? ? ? ? ? ? ? ? ? ?

38 Thank you !


Download ppt "? Peter Smooker, Heiko Schröder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Aravinthan, Gad Abraham, Abdullah Al Amin, Nalinda,"

Similar presentations


Ads by Google