1 Advisor: Professor R. C. T. Lee Speaker: Jui Peng Lu ( 盧瑞鵬 ) DNA Sequence Assembly
2 DNA Sequence Assembly Problem We are given a set of strings S = {s 1, s 2,…, s n } which are cut from an original sequence by using shotgun method, our job is to reconstruct the original string. Original Sequence First Cutting Second Cutting … s1s1 s2s2 s3s3 s4s4 s5s5
3 Basic Ideas of Our Algorithm For each input string s i, there is a string s j whose prefix is equal to the suffix of s i. Original Sequence First Cutting Second Cutting s1s1 s2s2 s3s3 s4s4 s5s5
4 Example Suppose we are given the following sequence: AGCCTGCCTAGCCCTAATCTG AGCCT, GCCTAGCCC, TAATCTG AGC, CTGCC, TAGCCCTA, ATCTG Assume the first shot gun method cuts the sequence into the following segments: The second cutting produces the following segments:
5 Example Input strings S = {AGCCT, GCCTAGCCC, TAATCTG, AGC, CTGCC, TAGCCCTA, ATCTG} AGCCTGCCTAGCCCTAATCTG GCCTAGCCC TAATCTG AGCCT AGC CTGCC TAGCCCTA ATCTG
6 Experimental Results LOCUS in NCBI The length of the original DNA sequence (base pairs) The number of input strings Time (Sec.) NC_ NC_ NC_ BX AP AP AP BX
7 2-Matching Double Digest Problem Given three sets of distances : A = {2, 9, 5} B = {7, 3, 6} C = {1, 4, 2, 7, 2} Our job is to find the following solution: A B C i 1 = 1, 2, …, p i 2 = 1, 2, …, q i 3 = 1, 2, …, r
8 Basic Ideas of Our Algorithm There are two blocks in A or B whose lengths are equal to the length of starting and ending block in C. For each two adjacent blocks in C, there is a block in either A or B whose length is equal to the sum of length of those two adjacent blocks in C A B C
9 Example Input: A = {3, 4, 7} B = {6, 3, 5} C = {1, 2, 3, 4, 4} A B C
10 Experimental Results We designed a visual displaying tool to display our experimental results.