1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.

1 Multiple sequence alignment Lesson 4

2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWWSNG-- Like pairwise alignment BUT compare n sequences instead of 2 Each row represents an individual sequence Each column represents the ‘same’ position May be gaps in some sequences

3 MSA & Evolution MSA can give you a picture of the forces that shape evolution!  Important amino acids or nucleotides are not “ allowed ” to mutate  Less important positions change more easily

4 Conserved positions  Columns where all the sequences contain the same amino acids or nucleotides  Important for the function or structure VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGSSSNIGS--ITVNWYQQLPG LRLSCTGSGFIFSS--YAMYWYQQAPG LSLTCTGSGTSFDD-QYYSTWYQQPPG

5 Consensus Sequence  A consensus sequence holds the most frequent character of the alignment at each column TGTTCTA TGTTCAA TCTTCAA TGTTCAA

6 Profile TGTTCTA TGTTCAA TCTTCAA 654321..000.671A..110.330T..0000C..0000G Profile = PSSM – Position Specific Score (probability) Matrix

7 Alignment methods There is no available optimal solution for MSA – all methods are heuristics:  Progressive/hierarchical alignment (Clustal)  Iterative alignment (mafft, muscle)

8 ABCDEABCDE EDCBA A 11B 13C 1022D 1111E Compute the pairwise alignments for all against all (6 pairwise alignments) the similarities are stored in a table First step: Progressive alignment

9 A D C B E Cluster the sequences to create a tree (guide tree): represents the order in which pairs of sequences are to be alignedrepresents the order in which pairs of sequences are to be aligned similar sequences are neighbors in the treesimilar sequences are neighbors in the tree distant sequences are distant from each other in the treedistant sequences are distant from each other in the tree Second step: EDCBA A 11B 13C 1022D 1111E The guide tree is imprecise and is NOT the tree which truly describes the relationship between the sequences!

10 Third step: A D C B E 1. Align the most similar (neighboring) pairs sequence

11 Third step: A D C B E 2. Align pairs of pairs sequence profile

12 Third step: A D C B E 3. Align out group sequence profile Main disadvantages: 1.sub-optimal tree topology 2.Misalignments resulting from globally aligning a pair of sequences will only cause further deterioration

13 ABCDEABCDE Iterative alignment Guide tree MSA Pairwise distance table A D C B EDCBA A 11B 13C 1022D 1111E Iterate until the MSA doesn ’ t change (convergence) E

14 Searching for remote homologs  Sometimes BLAST isn ’ t enough.  Large protein family, and BLAST only gives close members. We want more distant members  PSI-BLAST  Profile HMMs

15 Profile HMM  Similar to PSI-BLAST: also uses a profile  Takes into account:  Dependence among sites (if site n is conserved, it is likely that site n+1 is conserved  part of a domain  The probability of a certain column in an alignment

16 PSI BLAST Vs. profile HMM Profile HMM PSI BLAST More exact Slower Less exact Faster

17 Case study: Using homology searching  The human kinome

18 Kinases and phosphatases

19 Multi-tasking enzymes  Signal transduction  Metabolism  Transcription  Cell-cycle  Differentiation   Function of nervous and immune system  …  And more

20 How many kinases in the human genome?  1950 ’ s, discovery of that reversible phosphorylation regulates the activity of glycogen phosphorylase  1970 ’ s, advent of cloning and sequencing produced a speculation that the vertebrate genome encodes as many as 1001 kinases

21  2001 – human genome sequence …  As well – databases of Genbank, Swissprot, and dbEST  How can we find out how many kinases are out there? How many kinases in the human genome?

22 The human kinome  In 2002, Manning, Whyte, Martinez, Hunter and Sudarsanam set out to: 1. Search and cross-reference all these databases for all kinases 2. Characterize all found kinases

23 ePKs and aPKs Eukaryotic protein kinase (majority) catalytic domain Atypical protein kinases Sequence homology of the catalytic domain; additional regulatory domains are non-homologous No sequence homology to ePKs; some aPK subfamilies have structural similarity to ePKs

24 The search  Several profiles were built: based on the catalytic domain of: (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (b) each subfamily of known aPKs  HMM-profile searches and PSI-BLAST searches were performed

25 The results…  478 apKs  40 ePKs  Total of 518 kinases in the human genome (half of the prediction in the 1970 ’ s)

26 Classifying the kinases 1. Classification based on the catalytic domain 2. Classification based on the regulatory domains 189 sub-families of kinases

27 Comparison to other species  209 subfamilies of ePKs in human, worm, yeast and fly

28  The human genome has x2 kinases (in number) as fly or worm. Many are aPKs.  Most of them are receptor tyrosine kinases (RTKs) The human-expanded kinase families function predominantly in processes of the:  Nervous system  Immune system  Angiogenesis  Hemopoiesis

29 The discovery of new kinases: a new front for battling human diseases

30 Correlating with human diseases  160 kinases mapped to amplicons seen in tumors  80 kinases mapped to amplicons in other major illnesses  Usually kinases are over-expressed in cancer and other diseases

31 Correlating with human diseases  6 kinase inhibitors have been approved till today for the use against cancer  >70 other inhibitors are in clinical trials

1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.

Similar presentations

Presentation on theme: "1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.

Similar presentations

Presentation on theme: "1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG."— Presentation transcript:

Similar presentations

About project

Feedback