Download presentation
Presentation is loading. Please wait.
Published byBrandon Lloyd Modified over 9 years ago
1
Su ffi x Tree of Alignment: An E ffi cient Index for Similar Data JOONG CHAE NA1, HEEJIN PARK2, MAXIME CROCHEMORE3, JAN HOLUB4, COSTAS S. ILIOPOULOS3, LAURENT MOUCHARD5, AND KUNSOO PARK6 Presented by Ramin Fallahzadeh
2
Problem definition Indexing multiple data which are very similar: ◦Modifying existing data (e.g., new version of a source code) ◦Today’s back up vs yesterday’s back up ◦Individual’s genome vs Human reference genome (99% indentical)
3
Storing vs Indexing data Storing data: ◦Using alignment to store only the differences ◦Data compression schemes Indexing data: ◦Example: Search Engines ◦Suffix tree: linear time and space complexity ◦One solution: constructing generalized suffix tree
4
Generalized suffix tree GST(A,B): ◦|A|+|B| leaves ◦O(|A|+|B|) construction time ◦Drawbacks: ◦Some suffixes may be stored twice A = aaatcaaa B = aaatgaaa {aaa, aa, a} are stored twice in GST ◦two similar suffixes aaatcaaa and aaatgaaa are stored in distinct leaves even though they are very similar ◦Therefore for similar data most of the leaves are redundant!
5
Contribution Neither the suffix tree nor any variant of the suffix tree uses this similarity or alignment to index similar data efficiently!
6
Alignment
7
given alignment is not required to be optimal we can use a near-optimal alignment instead of the optimal alignment if the time to compute an alignment is important Since the given strings are assumed to be highly similar, a near-optimal alignment can be computed fast from exact string matching instead of dynamic programming requiring much time.
8
Naïve approach constructing the generalized suffix tree and deleting unnecessary leaves not time/space-efficient! The proposed algorithm is incremental, i.e., we construct the suffix tree of A and then transform it to the suffix tree of the alignment This algorithm uses constant-size extra working space except for our suffix tree itself more space-efficient compared to the naïve method
9
Simple alignment α
10
General Alignment
11
Definitions
14
Example Generalized suffix tree: A = aaabaaabbaaba# B = aaabaabaabbaba#
15
Example Suffix tree of alignment A = aaabaaabbaaba# B = aaabaabaabbaba# Alignment: aaabaa(abba/baabb)aba# Type-1 Type-2 Type-3 Type-4
16
Construction
18
Example ST(A) A = aaabaaabbaaba#
19
Example ST’(A) when step A is applied: A = aaabaaabbaaba# B = aaabaabaabbaba#
20
Example Suffix tree of alignment A = aaabaaabbaaba# B = aaabaabaabbaba# Alignment: aaabaa(abba/baabb)aba#
21
Su ffi x Tree of General Alignments
22
Construction
23
Space Complexity
24
Time complexity
25
Thank you for your attention Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.