Download presentation
Presentation is loading. Please wait.
Published byChrystal Gibson Modified over 5 years ago
1
On the Range Maximum-Sum Segment Query Problem
Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2019/2/21 Chen and Chao
2
An example – Locating GC-rich regions (1)
One reasonable scoring expression to measure the richness of a region is x-p×l , where x is the C+G count of the region, l is the length of the region, and p is a positive ratio constant. The goal is to design an algorithm to report the region that maximizes the expression x-p×l 2019/2/21 Chen and Chao
3
An example – Locating GC-rich regions (2)
Let x be the C+G count of the region, and y be the A+T count of the region Hence, we have x-p×l = x-p×(x+y) = (1-p)×x - p×y Therefore, to calculate the value of x-p×l, one can assign w(G)= w(C)=1-p w(A)=w(T)=-p 2019/2/21 Chen and Chao
4
The Maximum-Sum Segment
Also called the maximum-sum interval or the maximum-scoring region Given a sequence of numbers, the maximum-sum segment is simply the contiguous subsequence having the greatest total sum. <5, -5.1, 1, 3, -4, 2, 3, -4, 7> With greatest total sum = 8 Zero prefix-/suffix-sums are possible. 2019/2/21 Chen and Chao
5
A Relevant Problem - RMQ
Range Minima (Maxima) Query Problem (also called Discrete Range Searching) Given a sequence of numbers, by preprocessing the sequence we wish to retrieve the minimum (maximum) value within a given querying interval efficiently <5, -5.1, 1, 3, -4, 2, 3, -4, 7> Minimum Maximum 2019/2/21 Chen and Chao
6
Range Maximum-Sum Segment Query Problem
Definition: The input is a sequence <a1,a2, …… an> of real numbers which is to be preprocessed. A query is comprised of two intervals S and E. Our goal is to return the maximum-sum segment whose starting index lies in S and end index lies in E. 2019/2/21 Chen and Chao
7
A Nonoverlapping Example
Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 6 Starting region End region 2019/2/21 Chen and Chao
8
An Overlapping Example
Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 8 Starting region End region 2019/2/21 Chen and Chao
9
Our Results We propose an algorithm that runs in O(n) preprocessing time and O(1) query time under the unit-cost RAM model. In fact, we show that RMSQ and RMQ are computationally linearly equivalent. We show that the RMSQ techniques yield alternative O(n) time algorithms for the following problems: The maximum-sum segment with length constraints All maximal-sum segments 2019/2/21 Chen and Chao
10
Strategy Reduce the RMSQ to the RMQ problem
Theorem. If there is a <f(n), g(n)>-time solution for the RMQ problem, then there is a <f(n)+O(n), g(n)+O(1)>-time solution for the RMSQ problem. O(n) RMSQ RMQ O(1) 2019/2/21 Chen and Chao
11
Computing sum(i,j) in O(1) time
prefix-sum(i) = a1+a2+…+ai all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) i j prefix-sum(j) prefix-sum(i-1) 2019/2/21 Chen and Chao
12
Find the highest point here Find the lowest point here
Case 1: Nonoverlapping Maximize Maximize Minimize sum(i, j) = prefix-sum(j) – prefix-sum(i-1) Prefix-sum sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Range Minima Query Find the highest point here Find the lowest point here 2019/2/21 Chen and Chao
13
Find the highest point here Find the lowest point here
Case 2: Overlapping Some problems may occur Prefix-sum sequence 9, -10, 4, -2, 5, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Negative Sum !! Find the highest point here Find the lowest point here 2019/2/21 Chen and Chao
14
Case 2: Overlapping Divide into 3 possible cases: Prefix-sum sequence:
9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Range Minima Query Preprocessing time = f(n) Query time = g(n) Range Minima Query Preprocessing time = f(n) Query time = g(n) Find the highest point here Find the highest point here What should we do? Find the lowest point here Find the lowest point here 2019/2/21 Chen and Chao
15
Dealing with the Special Case: Single Range Query
Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Challenge: Can this special case be reduced to the RMQ problem? Total sum = 6 2019/2/21 Chen and Chao
16
Reduction Procedure Step 1. Find a partner for each index.
Step 2. Record the sum of each pair in an array Step 3. Retrieve the maximum-sum pair by applying the RMQ techniques 2019/2/21 Chen and Chao
17
Find a partner within this region
Our First Attempt (1) Step 1: For each index i, we define the lowest point preceding i as its partner Prefix-sum sequence: i Lowest point Find a partner within this region 2019/2/21 Chen and Chao
18
Our First Attempt (2) Step 2: Record sum(partner(i), i) in an array i
Lowest point sum(partner(i), i) 2019/2/21 Chen and Chao
19
Applying RMQ to this sequence The maximum-sum pair can be retrieved
Our First Attempt (3) Step 3: Apply the RMQ techniques to the array i Applying RMQ to this sequence Querying this interval The maximum-sum pair can be retrieved Lowest point sum(partner(i), i) 2019/2/21 Chen and Chao
20
Bump into Difficulties
What if its partners go beyond the querying interval? i We might have to update every pair! Needs to be updated partner(i) sum(partner(i), i) 2019/2/21 Chen and Chao
21
Find the nearest point at least as large as prefix-sum(i)
A Better Partner How? Prefix-sum sequence Find the nearest point at least as large as prefix-sum(i) i Left_bound(i) Find the lowest point New partner(i) 2019/2/21 Chen and Chao
22
Why Is It Better? (1) It remains the best choice.
It saves lots of update steps. It turns out that zero or one point needs to be updated. 2019/2/21 Chen and Chao
23
Why Is It Better? (2) -- Remains the Best
Find the nearest point at least as large as prefix-sum(i) i Left_bound(i) Find the lowest point partner(i) 2019/2/21 Impossible region Chen and Chao
24
Why Is It Better? (3) -- Minimal-Maximal Property
Height(partner(i))< Height(j) < Height(i), for all partner(i)< j< i Next higher point Maximal point Minimal point i partner(i) No one higher than i No one lower than partner(i) 2019/2/21 Chen and Chao
25
Why Is It Better? (4) -- Save Some Updates
Prefix-sum sequence Next higher point Can not be the right end of the maximum-sum segment Querying interval i partner(i) No one higher than i 2019/2/21 Chen and Chao
26
Why Is It Better? (5) -- Nesting Property
For two indices i < j, it cannot be the case that partner(i)<partner(j) ≦i<j Maximal point i j Minimal point Minimal point Maximal point partner(j) partner(i) 2019/2/21 Chen and Chao
27
Why Is It Better? (6) -- An example
No overlapping is allowed 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Nesting Property 2019/2/21 Chen and Chao
28
When a Query Comes -- Case 1: No Exceeding
The maximum pair (partner(i), i) lies in the querying interval Retrieve the maximum pair Querying interval i partner(i) We are done. Output (partner(i), i). 2019/2/21 Chen and Chao
29
When a Query Comes -- Case 2: Exceeding
The maximum pair (partner(i), i) goes beyond the querying interval Retrieve the maximum pair Retrieve the maximum pair Querying interval j i Maximal Minimal partner(i) Update partner(i) partner(j) (Partner(i), i) is the maximum pair. Compare (new_partner(i), i) and (partner(j), j) Can not be the right end of the maximum-sum segment. Nesting property 2019/2/21 Chen and Chao
30
Time Complexity RMSQ can be reduced to the RMQ problem in O(n) time
Since under the unit-cost RAM model, there is a <O(n), O(1)>-time solution for the RMQ problem, there is a <O(n), O(1)>-time solution for the RMSQ problem. Preprocessing: O(n) RMQ RMSQ Query: O(1) 2019/2/21 Chen and Chao
31
RMQ RMSQ On the other hand, RMQ can be reduced to the RMSQ problem in O(n) time, too. (Range Maxima Query: For each two adjacent elements, we augment a negative number whose absolute value is larger than them or simply a negative number larger than the maximum number of the sequence.) RMQ Instance: RMSQ Instance: or RMSQ Instance: 2019/2/21 Chen and Chao
32
Use RMSQ Techniques to Solve Two Relevant Problems
1. Finding the Maximum-Sum Segment with length constraints in O(n) time. - Y.-L. Lin, T. Jiang, K.-M. Chao, 2002 - T.-H Fan et al., 2003 2. Finding all maximal scoring subsequences in O(n) time. - W. L. Ruzzo & M. Tompa, 1999 2019/2/21 Chen and Chao
33
Problem 1:The Maximum-Sum Segment with Length Constraints
Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave O(n)-time algorithms for this problem. Length at least L, and at most U L U 2019/2/21 Chen and Chao
34
Problem 1: Finding the Maximum-Sum Segment with Length Constraints
Length at least L, at most U For each index i, find the maximum-sum segment whose starting point lies in [i-U+1, i-L+1] and end point is i i RMSQ query L U Runs in O(n) time since each query costs O(1) time 2019/2/21 Chen and Chao
35
Problem 2: All Maximal-Sum Segments
Ruzzo and Tompa [ISMB 1999] gave a O(n)-time algorithm for this problem. Recursive definition. L(S) R(S) S 2019/2/21 Chen and Chao
36
Problem 2: Finding All Maximal Scoring Subsequences
Recursive calls. Input sequence: L(S) R(S) S RMSQ query Runs in O(n) time since each query costs O(1) time 2019/2/21 Chen and Chao
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.