Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSIE NCNU1 Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04.

Similar presentations


Presentation on theme: "CSIE NCNU1 Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04."— Presentation transcript:

1 CSIE NCNU1 Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04

2 CSIE NCNU2 Multiple Sequence Alignment Input: k sequences on alphabet { a, g, c, t } Output: An alignment A aligns these sequences (allowing gap) attgcc, ttacgg, aatgga, tatcgt, cgatag

3 CSIE NCNU3 Progressive Methods Multiple Sequence Alignment is NP-hard. (Wang and Jiang 1994, sum of pair) 2-Approximation by Gulsfield (1991) –Input: k sequences –Output: An alignment of k sequences with performance ratio smaller than 2 Idea: Do several times of pairwise alignment to combine a multiple sequence alignment.

4 CSIE NCNU4 Remarks In progressive methods, we always consider sequences, and we always use adding gaps to achieve multiple sequence alignment. In Gulsfield’s 2-approximation, it doesn’t handle sequences containing clusters well. Can we align more than 2 sequences at once with a short period of time?

5 CSIE NCNU5 Data Structure of Block Alignment We use a matrix to present a sequence or an alignment. Given. We can use to present the alignment.

6 CSIE NCNU6 Aligning Matrices From now on, what we consider is a set of matrices which represent sequences or alignments. We use the idea the same with pairwise alignment to align two matrices. We define that and are two matrices which present sequences or alignments to be aligned.

7 CSIE NCNU7 Scoring Columns In pairwise alignment, what we align is two characters. And in block alignment, what we align will be column vectors. Let there be two column vectors P and Q, where and.

8 CSIE NCNU8 Aligning Columns

9 CSIE NCNU9 Recurrence Formula

10 CSIE NCNU10 The Algorithm Based of Block Alignment Input: k sequences Output: an alignment Step1: Initialize every sequence as a block. Step 2: Merge the two nearest blocks. Step 3: Repeat Step 2 until there is only one block.

11 CSIE NCNU11 Given S1=atttaagggc, S2=aattaagggc, S3=atttacgggc, S4=cccttaacg, S5=cccataacg The following is the corresponding graph. 2 2 9 2

12 CSIE NCNU12 Experimental Results We generate ten sets of data, and each set has ten sequences which have two clusters and their lengths are all about 500. metho d block alignment2 approximation score 1767819630 1717918301 1382616468 1578217168 1481315435 1491117965 1492717609 1625217792 1597816712 1615918047 avera ge 15750.517512.7

13 CSIE NCNU13 Experimental Results We generate four sets of data, and each set has nine sequences which has three clusters. method block alignment2 approximation score 1037810944 1114212898 1153211790 1138612526 average 11109.512039.5

14 CSIE NCNU14 Experimental Results We took ten DNA sequences of 5 hepatitis B viruses and 5 hepatitis C viruses to test with block alignment and 2- approximation. We also took seven sequences of 3 dogs and 4 wolves to test. method block alignment2 approximation hepatitis B and C viruses 1302316133 dogs and wolves 51765300

15 CSIE NCNU15 Discussions and Future Works We may use other score function to evaluate. We also can try other strategy to merge blocks. We can expand our program to align protein sequence, and then applying PAM matrix to replace our score function.

16 CSIE NCNU16 Thank you


Download ppt "CSIE NCNU1 Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04."

Similar presentations


Ads by Google