Presentation is loading. Please wait.

Presentation is loading. Please wait.

Su ffi x Tree of Alignment: An E ffi cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT.

Similar presentations


Presentation on theme: "Su ffi x Tree of Alignment: An E ffi cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT."— Presentation transcript:

1 Su ffi x Tree of Alignment: An E ffi cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT MOUCHARD5, AND KUNSOO PARK6 Presented by Ramin Fallahzadeh

2 Problem definition Indexing multiple data which are very similar: ◦Modifying existing data (e.g., new version of a source code) ◦Today’s back up vs yesterday’s back up ◦Individual’s genome vs Human reference genome (99% indentical)

3 Storing vs Indexing data Storing data: ◦Using alignment to store only the differences ◦Data compression schemes Indexing data: ◦Example: Search Engines ◦Suffix tree: linear time and space complexity ◦One solution: constructing generalized suffix tree

4 Generalized suffix tree GST(A,B): ◦|A|+|B| leaves ◦O(|A|+|B|) construction time ◦Drawbacks: ◦Some suffixes may be stored twice A = aaatcaaa B = aaatgaaa {aaa, aa, a} are stored twice in GST ◦two similar suffixes aaatcaaa and aaatgaaa are stored in distinct leaves even though they are very similar ◦Therefore for similar data most of the leaves are redundant!

5 Contribution Neither the suffix tree nor any variant of the suffix tree uses this similarity or alignment to index similar data efficiently!

6 Alignment

7 given alignment is not required to be optimal we can use a near-optimal alignment instead of the optimal alignment if the time to compute an alignment is important Since the given strings are assumed to be highly similar, a near-optimal alignment can be computed fast from exact string matching instead of dynamic programming requiring much time.

8 Naïve approach constructing the generalized suffix tree and deleting unnecessary leaves not time/space-efficient! The proposed algorithm is incremental, i.e., we construct the suffix tree of A and then transform it to the suffix tree of the alignment This algorithm uses constant-size extra working space except for our suffix tree itself  more space-efficient compared to the naïve method

9 Simple alignment α

10 General Alignment

11 Definitions

12

13

14 Example Generalized suffix tree: A = aaabaaabbaaba# B = aaabaabaabbaba#

15 Example Suffix tree of alignment A = aaabaaabbaaba# B = aaabaabaabbaba# Alignment: aaabaa(abba/baabb)aba# Type-1 Type-2 Type-3 Type-4

16 Construction

17

18 Example ST(A) A = aaabaaabbaaba#

19 Example ST’(A) when step A is applied: A = aaabaaabbaaba# B = aaabaabaabbaba#

20 Example Suffix tree of alignment A = aaabaaabbaaba# B = aaabaabaabbaba# Alignment: aaabaa(abba/baabb)aba#

21 Su ffi x Tree of General Alignments

22 Construction

23 Space Complexity

24 Time complexity

25 Thank you for your attention Any questions?


Download ppt "Su ffi x Tree of Alignment: An E ffi cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT."

Similar presentations


Ads by Google