Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.

Similar presentations


Presentation on theme: "Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics."— Presentation transcript:

1 Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics. 19(11):1404-1411. CECS 694-04 Bioinformatics Journal Club Eric Rouchka, D.Sc. September 10, 2003

2 Eric C. Rouchka, University of Louisville What is Multiple Sequence Alignment (MSA) ? Taking more than two sequences and aligning based on similarity

3 Eric C. Rouchka, University of Louisville Globin Example >gamma_A MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDAIKHLDDLKGTF AQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVASALSSRYH >alfa VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD LHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR >beta VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTF ATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH >delta VHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTF SQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH >epsilon VHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFA KLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH >gamma_G MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDAIKHLDDLKGTF AQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVASALSSRYH >myoglobin MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEI KPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG >teta1 ALSAEDRALVRALWKKLGSNVGVYTTEALERTFLAFPATKTYFSHLDLSPGSSQVRAHGQKVADALSLAVERLDDLPHALSALSHLH ACQLRVDPASFQLLGHCLLVTLARHYPGDFSPALQASLDKFLSHVISALVSEYR >zeta SLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYI LRVDPVNFKLLSHCLLVTLAARFPADFTAEAHAAWDKFLSVVSSVLTEKYR

4 Eric C. Rouchka, University of Louisville Globin Multiple Alignment

5 Eric C. Rouchka, University of Louisville Why do MSA? Homology Searching –Important regions conserved across (or within) species Genic Regions Regulatory Elements Phylogenetic Classification Subfamily classification Identification of critical residues

6 Eric C. Rouchka, University of Louisville MSA Approaches All columns alignable across all sequences –MSA –ClustalW Columns alignable throughout all sequences singled out (Profile HMM) –HMMER –SAM

7 Eric C. Rouchka, University of Louisville MSA N-dimensional dynamic programming Time consuming High memory usage Guaranteed to yield maximum alignment

8 Eric C. Rouchka, University of Louisville ClustalW Progressive Alignment –Sequences aligned in pair-wise fashion –Alignment scores produce phylogenetic tree –Enhanced dynamic programming approach

9 Eric C. Rouchka, University of Louisville Hidden Markov Models Match State, Insert State, Delete State

10 Eric C. Rouchka, University of Louisville HMMs Models conserved regions Successful at detecting and aligning critical motifs and conserved core structure Difficulty in aligning sequence outside of these regions

11 Eric C. Rouchka, University of Louisville SATCHMO Simultaneous Alignment and Tree Construction using Hidden Markov mOdels www.lib.jmu.edu/music/composers/ armstrong.htm

12 Eric C. Rouchka, University of Louisville SATCHMO Progressive Alignment –Built iteratively in pairs –Profile HMMs used Alignments of same sequences not same at each node Number of columns predicted smaller as structures diverge Output not represented by single matrix

13 Eric C. Rouchka, University of Louisville Why HMMs? Homologs ranked through scoring Accurate profiles from small numbers of sequences Accurately combines two alignments having low sequence similarity

14 Eric C. Rouchka, University of Louisville Bits saved relative to background K = 1..M: HMM node number a: amino acid type P k (a): emission probability of a in k th match state P 0 (a): approximation of background probability of a

15 Eric C. Rouchka, University of Louisville Sequence weights Sequences weighted such that b converges on a desired value Weights compensate for correlation in sequences

16 Eric C. Rouchka, University of Louisville HMM Construction Profile HMM constructed from multiple alignment Some columns alignable; others not

17 Eric C. Rouchka, University of Louisville HMM Construction Given an alignment a, a profile HMM is generated Each column in a is assigned to an emitter state – transition probabilities are calculated based on observed amino acids

18 Eric C. Rouchka, University of Louisville Transition Probabilities If we have a total of five match states, the probabilities can be stored in the following table:

19 Eric C. Rouchka, University of Louisville HMM Terminology  : Path through an HMM to produce a sequence s P(A|  ) =  P(s|  s )  + : maximum probability path through the HMM

20 Eric C. Rouchka, University of Louisville Aligning Two Alignments One alignment is converted to an HMM Second alignment is aligned to the HMM –Some columns remain alignable –Affinities (relative match scores) calculated New MSA results HMM Constructed from new MSA

21 Eric C. Rouchka, University of Louisville Aligning Two Alignments

22 Eric C. Rouchka, University of Louisville SATCHMO Algorithm Step 1: –Create a cluster for each input sequence and construct an HMM from the sequence Step 2: –Calculate the similarity of all pairs of clusters and identify a pair with highest similarity –align the target and template to produce a new node

23 Eric C. Rouchka, University of Louisville SATCHMO Algorithm Repeat set 2 until: –All sequences assigned to a cluster –Highest similarity between clusters is below a threshold –No alignable positions are predicted Output: A set of binary trees –Nodes are sequences –Each node contains an HMM aligning the sequences in the subtree

24 Eric C. Rouchka, University of Louisville Graphical Interface for SATCHMO

25 Eric C. Rouchka, University of Louisville Demonstration of SATCHMO

26 Eric C. Rouchka, University of Louisville Validation Set BAliBASE benchmark alignment set used –Ref1: equidistant sequences –Ref2: distantly related sequences –Ref3: subgroups of sequences; < 25% similarity between groups –Ref4: alignments with long extensions on the ends –Ref5: alignments with long insertions

27 Eric C. Rouchka, University of Louisville Comparision of Results SATCHMO compared to: –ClustalW (Progressive Pairwise Alignment) –SAM (HMM)

28 Eric C. Rouchka, University of Louisville

29 Discussion SATCHMO effective in identifying protein domains Comparison to T-Coffee and PRRP would be useful –Time and sensitivity Tree representation is unique, modeling structural similarity


Download ppt "Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics."

Similar presentations


Ads by Google