Presentation is loading. Please wait.

Presentation is loading. Please wait.

Two equivalent problems

Similar presentations


Presentation on theme: "Two equivalent problems"— Presentation transcript:

1 Two equivalent problems
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Two equivalent problems RMQ on integer arrays  RMQ on ±1 integer arrays LCA on a Cartesian Tree Euler tour of the Cartesian tree RMQ on the array of node-levels LCA on general trees  RMQ on ±1 integer arrays Euler tour of the Cartesian tree RMQ on the array of node-levels

2 Search for k-mismatches
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Search for k-mismatches T CCGTACGATCAGTACAGTACAGTACTTTTTTAAACCGGAGACTACA P If O(1)  O(k) time CCGAACTATC Problem: Find longest match between P[i,…] and T[j,…] Data Structure Concatenate P and T into a string X = T$P Construct a data structure on X that retrieves FAST the longest match between any pair of suffixes of X LCA or LCP query

3 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Suffix Tree & LCA 12 11 8 5 2 1 10 9 7 4 6 3 # i ppi# ssi mississippi# p i# pi# s ssippi# si 14 $ 13 LCA Longest match(3,13) T#P = mississippi#si$

4 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Suffix Array & LCP  RMQ SA Lcp Longest match(3,13) 12 11 14 8 5 2 1 10 9 13 7 4 6 3 1 4 2 3 LCP T#P = mississippi#si RMQ si sippi# sissippi# ssippi# ssissippi# LCP Surprisingly, also LCA  RMQ

5 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
The RMQ problem RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j]. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 2 12 1 12 13 21 15 14 10 RMQ(2,7) = 3 Trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time

6 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
RMQ on a general array A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 25 22 34 7 19 9 12 26 16 Cartesian Tree 4 6 2 5 7 1 3 9 8

7 RMQ(i,j) = LCA(i,j) on Cartesian trees
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" RMQ(i,j) = LCA(i,j) on Cartesian trees A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 25 22 34 7 19 9 12 26 16 4 6 2 5 7 1 3 9 8

8 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Generic RMQ  LCA  RMQ ±1 LCA(u,v) = shallowest node between u and v during a depth first search traversal of T. Node at the lowest level ±1 array 3 Node Level 12 9 3 8 1 3 2 3 1 4 1 7 1 3 5 6 5 3 2 5 1 Euler tour 11 4 10 7 5 6 2 4 7 6 LCAT(4,6) = 3

9 We are left with “RMQ on ±1 array”
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" We are left with “RMQ on ±1 array” RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j]. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 11 12 11 12 13 14 15 14 13 RMQ(2,7) = 3 Recall the trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time

10 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Sparse Table Preprocess sub arrays of len 2k, for every k=0,1,…, log n M(i,j) = index of min value in A[i, i+ 2j -1] A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 11 12 11 12 13 14 15 14 13 M(2,0)=2 Total space is O(n log n) RMQ query ? M(2,1)=3 M(2,2)=3 M(2,3)=3

11 Querying the Sparse Table
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Querying the Sparse Table i j 2k elements a1 ... ... 2k elements Total space is O(n log n) RMQ query takes O(1) time

12 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Bucketing A’[i] = min in the i-th block of A. B’[i] is the position (index) of that min. A A’[0] A’[i] A’[2n/logn] A’ ... ... ... ... ... B[0] B[i] B[2n/logn]

13 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Use the Bucketing A’[0] A’[i] A’[2n/logn] A’ ... ... ... ... ... Preprocess A’ for RMQ using SparseTable Space is (2n/log n) * log (2n/log n) = O(n) RMQ queries on A’ take O(1) time Preprocess every block of A’ for border RMQ Space is O(n), border RMQ take O(1) time. RMQ(i,j) takes O(1) time, if i,j lie in distinct blocks

14 In-block RMQ over ±1 arrays
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" In-block RMQ over ±1 arrays There are normalized blocks Set Table[BlockEnc, i, j] = RMQ(i,j) X 3 4 5 6 5 4 5 6 5 4 1 3 2 Y DX = DY +1 -1 Table entries

15 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
LZ-parsing (gzip) # s i 1 12 si 1 p i 3 ssi mississippi# 2 ppi# ppi# 1 4 ssippi# # ppi# ssippi# 6 3 i# ppi# ssippi# pi# 7 4 11 8 5 2 1 10 9 <m><i><s><si><ssip><pi> T = mississippi#

16 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
LZ-parsing (gzip) It is on the path to 6 # s By maximality check only nodes i 1 12 1 p Leftmost occ = 3 < 6 si i 3 ssi mississippi# 2 ppi# ppi# 1 4 ssippi# Leftmost occ = 3 < 6 # ppi# ssippi# 6 3 i# ppi# ssippi# pi# 7 4 11 8 5 2 1 10 9 <ssip> Longest repeated prefix of T[6,...] Repeat is on the left of 6 T = mississippi#

17 Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
LZ-parsing (gzip) min-leaf  Leftmost copy # s 3 i 1 12 si 2 Parsing: Scan T Visit ST and stop when min-leaf ≥ current pos 1 p 3 i 3 ssi mississippi# 4 9 2 2 ppi# ppi# 1 4 ssippi# # ppi# ssippi# 6 3 i# ppi# ssippi# pi# 7 4 11 8 5 2 1 10 9 <m><i><s><si><ssip><pi> Precompute the min descending leaf at every node in O(n) time. T = mississippi#


Download ppt "Two equivalent problems"

Similar presentations


Ads by Google