Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Similar presentations


Presentation on theme: "Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique."— Presentation transcript:

1 Structural alignment marian@xray.bmc.uu.se

2 Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique shape (tertiary or three-dimensional structure). However, proteins with similar sequences adopt very similar structures. Cyclophilin from B. malayi Cyclophilin A from H. sapiens

3 Why structural alignment ? we have sequence alignment - Clustal… KTHLCV KSHA -V that gives us an idea about a correspondence of amino acids of two (or more ) proteins That enables to infer information about function And evolution of the Protein If the sequences are similar enough !!!!

4 What is twilight zone ? Sequence alignment unambiguously distinguish only between protein pairs of similar structure and non-similar structures when the pairwise sequence identity is high. High sequence identity roughly means over 40 %. The signal gets blurred in the twilight zone of 20-35 % sequence identity.

5 More of the twilight zone More than 90 % sequence pairs with the sequence identity lower than 25 % have different structures. Significance of sequence alignments is length dependent. The longer the sequence the lower identity is required to be be called significant.Nevertheless, it converges to 25% with alignments longer than 80 amino acids. ‘The more similar than identical’ rule can reduce a number of false positives. Using of intermediate sequences for finding links between more distant families can also reduce a number of false positives.

6 How far can the sequence identity drop? Average sequence identity of random alignments - 5.6 % Average sequence identity of remote homologues 8.5 %

7 How does it work? From http://www.biochem.unizh.ch/antibody/Introduction/Institutsseminar97/source/slide2.htm

8 Numbers Given the average length of a protein 300 amino acid, there are 20 300 possibilities of building the average protein - more than atoms in universe. In reality just few hundred thousand sequences are known. It is believed that a number of basic protein folds is between 1500 - 5000.

9 Structural alignment because: Structures are better conserved than sequences structural alignment can imply a functional similarity that is not detectable from a sequence alignment. Might help to improve sequence alignment when structures are available (phylogenetic studies, homology modeling). Will improve sequence alignment methods (use of structural alignments’ substitution matrices, gap penalties). Will improve sequence prediction methods

10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS Sequence versus structural alignment

11 Material

12 Is it difficult to make structural alignment? Structural alignment is NP-hard (nondeterministic polynomial time) problem. In other words, it is not tractable properly. Even, if it would, the result would be correct from technical point of view not necessary from biological point of view. Yes, it is.

13 General solution Use a heuristic approach: 1.Represent the proteins A and B in some coordinate independent space 2.Compare A and B 3.Optimize the alignment between A and B (e.g. minimize R.M.S.d.) 4.Measure the statistical significance of the alignment against some random set of structure comparisons

14 “..in some coordinate independent space…” Make the problem easier by: - comparing only distance matrices of atoms -comparing secondary structure element (SSE) - comparing cartoons - comparing vectors of SSE - combination of mentioned methods - ….

15 None of the methods guarantee the finding of the closest structure and two methods can disagree at all amino acid positions. Nevertheless they can still provide a valuable insight into the history of the protein and give hints concerning the function.

16 ServerLocationMethod CE http://cl.sdsc.edu Extension of optimal path 1 DALI http://www2.ebi.ac.uk/dali Distance-matrix alignment 2 DEJAVU http://portray.bmc.uu.se/cgi-bin/dennis/dejavu.pl SSE alignment with C  atom optimisation 3 LOCK http://gene.stanford.edu/LOCK/ Absolute orientation of corresponding points 4 MATRAS http://bongo.lab.nig.ac.jp/~takawaba/Matras.html Markov transition model of evolution 5 PRIDE http://hydra.icgeb.trieste.it/pride/ C  C  atom distances 6 SSM http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html Graph matching algorithm TOP http://bioinfo1.mbfys.lu.se/TOP SSE alignment 7 TOPS http:// tops.ebi.ac.uk/tops/compare1. html TOPS-diagram alignment 8 TOPSCAN http://www.rubic.rdg.ac.uk/~andrew/bioinf.org/to pscan Secondary topology-string alignment 9 VAST http://www.ncbi.nlm.nih.gov/Structure/VAST/vas tsearch.html Vector alignment 10 Methods for fold comparison

17 Protein structure classification If you want to know which structures are similar to a known structure, these systems might help: A)Manual - SCOP B)Semi-automatic - CATH C)Automatic - FSSP

18 CATH C (class) - secondary structure composition A (architecture) - overall shape, secondary structure elements orientation T (topology) - overall shape, secondary structure elements orientation + connectivity H (homologous superfamily) - Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20% 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smaller and domains which have related functions S (sequence families) - clustering based on the sequence identity level

19 Summary Structural alignment can help with protein annotations even when the sequence similarity is not significant. Sequence identity of two proteins with similar structures can be lower than 10 % - number of folds is limited. Recent progress in the protein structure determination increases the usefulness of structural alignment. Structural alignment is difficult problem that is solved by heuristic methods. These methods simplify the problem by moving from 3D space to 2D space sacrificing the optimum result for the speed.

20 Summary II Different methods can provide completely different alignments. In our results, CE, Dali,Matras and Vast were the best servers for finding structural relatives. A few structural classification systems were developed (CATH, FSSP, SCOP), they provide hierarchical classification of protein structures and enable to infer functional and evolutional relationships between proteins.


Download ppt "Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique."

Similar presentations


Ads by Google