Two equivalent problems

Two equivalent problems
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Two equivalent problems RMQ on integer arrays  RMQ on ±1 integer arrays LCA on a Cartesian Tree Euler tour of the Cartesian tree RMQ on the array of node-levels LCA on general trees  RMQ on ±1 integer arrays Euler tour of the Cartesian tree RMQ on the array of node-levels

Search for k-mismatches
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Search for k-mismatches T CCGTACGATCAGTACAGTACAGTACTTTTTTAAACCGGAGACTACA P If O(1)  O(k) time CCGAACTATC Problem: Find longest match between P[i,…] and T[j,…] Data Structure Concatenate P and T into a string X = T$P Construct a data structure on X that retrieves FAST the longest match between any pair of suffixes of X LCA or LCP query

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Suffix Tree & LCA 12 11 8 5 2 1 10 9 7 4 6 3 # i ppi# ssi mississippi# p i# pi# s ssippi# si 14 $ 13 LCA Longest match(3,13) T#P = mississippi#si$

Suffix Array & LCP  RMQ SA Lcp Longest match(3,13) 12 11 14 8 5 2 1 10 9 13 7 4 6 3 1 4 2 3 LCP T#P = mississippi#si RMQ si sippi# sissippi# ssippi# ssissippi# LCP Surprisingly, also LCA  RMQ

The RMQ problem RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j]. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 2 12 1 12 13 21 15 14 10 RMQ(2,7) = 3 Trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time

RMQ on a general array A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 25 22 34 7 19 9 12 26 16 Cartesian Tree 4 6 2 5 7 1 3 9 8

RMQ(i,j) = LCA(i,j) on Cartesian trees
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" RMQ(i,j) = LCA(i,j) on Cartesian trees A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 25 22 34 7 19 9 12 26 16 4 6 2 5 7 1 3 9 8

Generic RMQ  LCA  RMQ ±1 LCA(u,v) = shallowest node between u and v during a depth first search traversal of T. Node at the lowest level ±1 array 3 Node Level 12 9 3 8 1 3 2 3 1 4 1 7 1 3 5 6 5 3 2 5 1 Euler tour 11 4 10 7 5 6 2 4 7 6 LCAT(4,6) = 3

We are left with “RMQ on ±1 array”
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" We are left with “RMQ on ±1 array” RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j]. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 11 12 11 12 13 14 15 14 13 RMQ(2,7) = 3 Recall the trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time

Sparse Table Preprocess sub arrays of len 2k, for every k=0,1,…, log n M(i,j) = index of min value in A[i, i+ 2j -1] A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 11 12 11 12 13 14 15 14 13 M(2,0)=2 Total space is O(n log n) RMQ query ? M(2,1)=3 M(2,2)=3 M(2,3)=3

Querying the Sparse Table
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Querying the Sparse Table i j 2k elements a1 ... ... 2k elements Total space is O(n log n) RMQ query takes O(1) time

Bucketing A’[i] = min in the i-th block of A. B’[i] is the position (index) of that min. A A’[0] A’[i] A’[2n/logn] A’ ... ... ... ... ... … B[0] B[i] B[2n/logn]

Use the Bucketing A’[0] A’[i] A’[2n/logn] A’ ... ... ... ... ... Preprocess A’ for RMQ using SparseTable Space is (2n/log n) * log (2n/log n) = O(n) RMQ queries on A’ take O(1) time Preprocess every block of A’ for border RMQ Space is O(n), border RMQ take O(1) time. RMQ(i,j) takes O(1) time, if i,j lie in distinct blocks

In-block RMQ over ±1 arrays
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" In-block RMQ over ±1 arrays There are normalized blocks Set Table[BlockEnc, i, j] = RMQ(i,j) X 3 4 5 6 5 4 5 6 5 4 1 3 2 Y DX = DY +1 -1 Table entries

LZ-parsing (gzip) # s i 1 12 si 1 p i 3 ssi mississippi# 2 ppi# ppi# 1 4 ssippi# # ppi# ssippi# 6 3 i# ppi# ssippi# pi# 7 4 11 8 5 2 1 10 9 <m><i><s><si><ssip><pi> T = mississippi#

LZ-parsing (gzip) It is on the path to 6 # s By maximality check only nodes i 1 12 1 p Leftmost occ = 3 < 6 si i 3 ssi mississippi# 2 ppi# ppi# 1 4 ssippi# Leftmost occ = 3 < 6 # ppi# ssippi# 6 3 i# ppi# ssippi# pi# 7 4 11 8 5 2 1 10 9 <ssip> Longest repeated prefix of T[6,...] Repeat is on the left of 6 T = mississippi#

LZ-parsing (gzip) min-leaf  Leftmost copy # s 3 i 1 12 si 2 Parsing: Scan T Visit ST and stop when min-leaf ≥ current pos 1 p 3 i 3 ssi mississippi# 4 9 2 2 ppi# ppi# 1 4 ssippi# # ppi# ssippi# 6 3 i# ppi# ssippi# pi# 7 4 11 8 5 2 1 10 9 <m><i><s><si><ssip><pi> Precompute the min descending leaf at every node in O(n) time. T = mississippi#

Two equivalent problems

Similar presentations

Presentation on theme: "Two equivalent problems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Two equivalent problems

Similar presentations

Presentation on theme: "Two equivalent problems"— Presentation transcript:

Similar presentations

About project

Feedback