Su ﬃ x Tree of Alignment: An E ﬃ cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT.

Su ﬃ x Tree of Alignment: An E ﬃ cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT MOUCHARD5, AND KUNSOO PARK6 Presented by Ramin Fallahzadeh

Problem definition Indexing multiple data which are very similar: ◦Modifying existing data (e.g., new version of a source code) ◦Today’s back up vs yesterday’s back up ◦Individual’s genome vs Human reference genome (99% indentical)

Storing vs Indexing data Storing data: ◦Using alignment to store only the differences ◦Data compression schemes Indexing data: ◦Example: Search Engines ◦Suffix tree: linear time and space complexity ◦One solution: constructing generalized suffix tree

Generalized suffix tree GST(A,B): ◦|A|+|B| leaves ◦O(|A|+|B|) construction time ◦Drawbacks: ◦Some suffixes may be stored twice A = aaatcaaa B = aaatgaaa {aaa, aa, a} are stored twice in GST ◦two similar suﬃxes aaatcaaa and aaatgaaa are stored in distinct leaves even though they are very similar ◦Therefore for similar data most of the leaves are redundant!

Contribution Neither the suffix tree nor any variant of the suffix tree uses this similarity or alignment to index similar data efficiently!

Alignment

given alignment is not required to be optimal we can use a near-optimal alignment instead of the optimal alignment if the time to compute an alignment is important Since the given strings are assumed to be highly similar, a near-optimal alignment can be computed fast from exact string matching instead of dynamic programming requiring much time.

Naïve approach constructing the generalized suffix tree and deleting unnecessary leaves not time/space-efficient! The proposed algorithm is incremental, i.e., we construct the suffix tree of A and then transform it to the suffix tree of the alignment This algorithm uses constant-size extra working space except for our suffix tree itself  more space-efficient compared to the naïve method

Simple alignment α

General Alignment

Definitions

Example Generalized suffix tree: A = aaabaaabbaaba# B = aaabaabaabbaba#

Example Suffix tree of alignment A = aaabaaabbaaba# B = aaabaabaabbaba# Alignment: aaabaa(abba/baabb)aba# Type-1 Type-2 Type-3 Type-4

Construction

Example ST(A) A = aaabaaabbaaba#

Example ST’(A) when step A is applied: A = aaabaaabbaaba# B = aaabaabaabbaba#

Example Suffix tree of alignment A = aaabaaabbaaba# B = aaabaabaabbaba# Alignment: aaabaa(abba/baabb)aba#

Su ﬃ x Tree of General Alignments

Construction

Space Complexity

Time complexity

Thank you for your attention Any questions?

Su ﬃ x Tree of Alignment: An E ﬃ cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT.

Similar presentations

Presentation on theme: "Su ﬃ x Tree of Alignment: An E ﬃ cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Su ﬃ x Tree of Alignment: An E ﬃ cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT.

Similar presentations

Presentation on theme: "Su ﬃ x Tree of Alignment: An E ﬃ cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT."— Presentation transcript:

Similar presentations

About project

Feedback