On the Range Maximum-Sum Segment Query Problem

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Longest Common Subsequence
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Fast Algorithms For Hierarchical Range Histogram Constructions
Lecture 3: Parallel Algorithm Design
Finding a Length-Constrained Maximum-Density Path in a Tree Rung-Ren Lin, Wen-Hsiung Kuo, and Kun-Mao Chao.
April 9, 2015Applied Discrete Mathematics Week 9: Relations 1 Solving Recurrence Relations Another Example: Give an explicit formula for the Fibonacci.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Dynamic Programming Reading Material: Chapter 7..
Lowest common ancestors. Write an Euler tour of the tree LCA(1,5) = 3 Shallowest node.
Copyright © Cengage Learning. All rights reserved.
CS 206 Introduction to Computer Science II 09 / 12 / 2008 Instructor: Michael Eckmann.
Dynamic Programming Reading Material: Chapter 7 Sections and 6.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Managerial Economics Managerial Economics = economic theory + mathematical eco + statistical analysis.
Time Complexity Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Analysis of Algorithms
Dynamic Programming.
CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
ALG0183 Algorithms & Data Structures Lecture 6 The maximum contiguous subsequence sum problem. 8/25/20091 ALG0183 Algorithms & Data Structures by Dr Andy.
The LCA Problem Revisited
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba.
CS 206 Introduction to Computer Science II 01 / 30 / 2009 Instructor: Michael Eckmann.
A Different Solution  alternatively we can use the following algorithm: 1. if n == 0 done, otherwise I. print the string once II. print the string (n.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 18.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.
1 Chapter 8 The Discrete Fourier Transform (cont.)
Introduction to Algorithms: Divide-n-Conquer Algorithms
The NP class. NP-completeness
All-pairs Shortest paths Transitive Closure
Applied Discrete Mathematics Week 2: Functions and Sequences
Modeling with Recurrence Relations
Analysis of Algorithms
Recursion notes Chapter 8.
Copyright © Cengage Learning. All rights reserved.
Sequence Alignment Kun-Mao Chao (趙坤茂)
Homology Search Tools Kun-Mao Chao (趙坤茂)
Ariel Rosenfeld Bar-Ilan Uni.
Enough Mathematical Appetizers!
Computation.
Analysis and design of algorithm
Homology Search Tools Kun-Mao Chao (趙坤茂)
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Data Structures Review Session
Heaviest Segments in a Number Sequence
Applied Discrete Mathematics Week 6: Computation
A Quick Note on Useful Algorithmic Strategies
Minimizing the Aggregate Movements for Interval Coverage
Divide and Conquer Algorithms Part I
A Note on Useful Algorithmic Strategies
Solving Recurrence Relations
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Sequence Alignment Kun-Mao Chao (趙坤茂)
Approximation Algorithms for the Selection of Robust Tag SNPs
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Homology Search Tools Kun-Mao Chao (趙坤茂)
Divide and Conquer Merge sort and quick sort Binary search
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Dynamic Programming Kun-Mao Chao (趙坤茂)
Algorithms Tutorial 27th Sept, 2019.
Presentation transcript:

On the Range Maximum-Sum Segment Query Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2019/2/21 Chen and Chao

An example – Locating GC-rich regions (1) One reasonable scoring expression to measure the richness of a region is x-p×l , where x is the C+G count of the region, l is the length of the region, and p is a positive ratio constant. The goal is to design an algorithm to report the region that maximizes the expression x-p×l 2019/2/21 Chen and Chao

An example – Locating GC-rich regions (2) Let x be the C+G count of the region, and y be the A+T count of the region Hence, we have x-p×l = x-p×(x+y) = (1-p)×x - p×y Therefore, to calculate the value of x-p×l, one can assign w(G)= w(C)=1-p w(A)=w(T)=-p 2019/2/21 Chen and Chao

The Maximum-Sum Segment Also called the maximum-sum interval or the maximum-scoring region Given a sequence of numbers, the maximum-sum segment is simply the contiguous subsequence having the greatest total sum. <5, -5.1, 1, 3, -4, 2, 3, -4, 7> With greatest total sum = 8 Zero prefix-/suffix-sums are possible. 2019/2/21 Chen and Chao

A Relevant Problem - RMQ Range Minima (Maxima) Query Problem (also called Discrete Range Searching) Given a sequence of numbers, by preprocessing the sequence we wish to retrieve the minimum (maximum) value within a given querying interval efficiently <5, -5.1, 1, 3, -4, 2, 3, -4, 7> Minimum Maximum 2019/2/21 Chen and Chao

Range Maximum-Sum Segment Query Problem Definition: The input is a sequence <a1,a2, …… an> of real numbers which is to be preprocessed. A query is comprised of two intervals S and E. Our goal is to return the maximum-sum segment whose starting index lies in S and end index lies in E. 2019/2/21 Chen and Chao

A Nonoverlapping Example Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 6 Starting region End region 2019/2/21 Chen and Chao

An Overlapping Example Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 8 Starting region End region 2019/2/21 Chen and Chao

Our Results We propose an algorithm that runs in O(n) preprocessing time and O(1) query time under the unit-cost RAM model. In fact, we show that RMSQ and RMQ are computationally linearly equivalent. We show that the RMSQ techniques yield alternative O(n) time algorithms for the following problems: The maximum-sum segment with length constraints All maximal-sum segments 2019/2/21 Chen and Chao

Strategy Reduce the RMSQ to the RMQ problem Theorem. If there is a <f(n), g(n)>-time solution for the RMQ problem, then there is a <f(n)+O(n), g(n)+O(1)>-time solution for the RMSQ problem. O(n) RMSQ RMQ O(1) 2019/2/21 Chen and Chao

Computing sum(i,j) in O(1) time prefix-sum(i) = a1+a2+…+ai all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) i j prefix-sum(j) prefix-sum(i-1) 2019/2/21 Chen and Chao

Find the highest point here Find the lowest point here Case 1: Nonoverlapping Maximize Maximize Minimize sum(i, j) = prefix-sum(j) – prefix-sum(i-1) Prefix-sum sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Range Minima Query Find the highest point here Find the lowest point here 2019/2/21 Chen and Chao

Find the highest point here Find the lowest point here Case 2: Overlapping Some problems may occur Prefix-sum sequence 9, -10, 4, -2, 5, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Negative Sum !! Find the highest point here Find the lowest point here 2019/2/21 Chen and Chao

Case 2: Overlapping Divide into 3 possible cases: Prefix-sum sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Range Minima Query Preprocessing time = f(n) Query time = g(n) Range Minima Query Preprocessing time = f(n) Query time = g(n) Find the highest point here Find the highest point here What should we do? Find the lowest point here Find the lowest point here 2019/2/21 Chen and Chao

Dealing with the Special Case: Single Range Query Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Challenge: Can this special case be reduced to the RMQ problem? Total sum = 6 2019/2/21 Chen and Chao

Reduction Procedure Step 1. Find a partner for each index. Step 2. Record the sum of each pair in an array Step 3. Retrieve the maximum-sum pair by applying the RMQ techniques 2019/2/21 Chen and Chao

Find a partner within this region Our First Attempt (1) Step 1: For each index i, we define the lowest point preceding i as its partner Prefix-sum sequence: i Lowest point Find a partner within this region 2019/2/21 Chen and Chao

Our First Attempt (2) Step 2: Record sum(partner(i), i) in an array i Lowest point sum(partner(i), i) 2019/2/21 Chen and Chao

Applying RMQ to this sequence The maximum-sum pair can be retrieved Our First Attempt (3) Step 3: Apply the RMQ techniques to the array i Applying RMQ to this sequence Querying this interval The maximum-sum pair can be retrieved Lowest point sum(partner(i), i) 2019/2/21 Chen and Chao

Bump into Difficulties What if its partners go beyond the querying interval? i We might have to update every pair! Needs to be updated partner(i) sum(partner(i), i) 2019/2/21 Chen and Chao

Find the nearest point at least as large as prefix-sum(i) A Better Partner How? Prefix-sum sequence Find the nearest point at least as large as prefix-sum(i) i Left_bound(i) Find the lowest point New partner(i) 2019/2/21 Chen and Chao

Why Is It Better? (1) It remains the best choice. It saves lots of update steps. It turns out that zero or one point needs to be updated. 2019/2/21 Chen and Chao

Why Is It Better? (2) -- Remains the Best Find the nearest point at least as large as prefix-sum(i) i Left_bound(i) Find the lowest point partner(i) 2019/2/21 Impossible region Chen and Chao

Why Is It Better? (3) -- Minimal-Maximal Property Height(partner(i))< Height(j) < Height(i), for all partner(i)< j< i Next higher point Maximal point Minimal point i partner(i) No one higher than i No one lower than partner(i) 2019/2/21 Chen and Chao

Why Is It Better? (4) -- Save Some Updates Prefix-sum sequence Next higher point Can not be the right end of the maximum-sum segment Querying interval i partner(i) No one higher than i 2019/2/21 Chen and Chao

Why Is It Better? (5) -- Nesting Property For two indices i < j, it cannot be the case that partner(i)<partner(j) ≦i<j Maximal point i j Minimal point Minimal point Maximal point partner(j) partner(i) 2019/2/21 Chen and Chao

Why Is It Better? (6) -- An example No overlapping is allowed 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Nesting Property 2019/2/21 Chen and Chao

When a Query Comes -- Case 1: No Exceeding The maximum pair (partner(i), i) lies in the querying interval Retrieve the maximum pair Querying interval i partner(i) We are done. Output (partner(i), i). 2019/2/21 Chen and Chao

When a Query Comes -- Case 2: Exceeding The maximum pair (partner(i), i) goes beyond the querying interval Retrieve the maximum pair Retrieve the maximum pair Querying interval j i Maximal Minimal partner(i) Update partner(i) partner(j) (Partner(i), i) is the maximum pair. Compare (new_partner(i), i) and (partner(j), j) Can not be the right end of the maximum-sum segment. Nesting property 2019/2/21 Chen and Chao

Time Complexity RMSQ can be reduced to the RMQ problem in O(n) time Since under the unit-cost RAM model, there is a <O(n), O(1)>-time solution for the RMQ problem, there is a <O(n), O(1)>-time solution for the RMSQ problem. Preprocessing: O(n) RMQ RMSQ Query: O(1) 2019/2/21 Chen and Chao

RMQ  RMSQ On the other hand, RMQ can be reduced to the RMSQ problem in O(n) time, too. (Range Maxima Query: For each two adjacent elements, we augment a negative number whose absolute value is larger than them or simply a negative number larger than the maximum number of the sequence.) RMQ Instance: 6 8 12 7 18 5  RMSQ Instance: 6 -9 8 -13 12 -13 7 -19 18 -19 5 or RMSQ Instance: 6 -19 8 -19 12 -19 7 -19 18 -19 5 2019/2/21 Chen and Chao

Use RMSQ Techniques to Solve Two Relevant Problems 1. Finding the Maximum-Sum Segment with length constraints in O(n) time. - Y.-L. Lin, T. Jiang, K.-M. Chao, 2002 - T.-H Fan et al., 2003 2. Finding all maximal scoring subsequences in O(n) time. - W. L. Ruzzo & M. Tompa, 1999 2019/2/21 Chen and Chao

Problem 1:The Maximum-Sum Segment with Length Constraints Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave O(n)-time algorithms for this problem. Length at least L, and at most U L U 2019/2/21 Chen and Chao

Problem 1: Finding the Maximum-Sum Segment with Length Constraints Length at least L, at most U For each index i, find the maximum-sum segment whose starting point lies in [i-U+1, i-L+1] and end point is i i RMSQ query L U Runs in O(n) time since each query costs O(1) time 2019/2/21 Chen and Chao

Problem 2: All Maximal-Sum Segments Ruzzo and Tompa [ISMB 1999] gave a O(n)-time algorithm for this problem. Recursive definition. L(S) R(S) S 2019/2/21 Chen and Chao

Problem 2: Finding All Maximal Scoring Subsequences Recursive calls. Input sequence: L(S) R(S) S RMSQ query Runs in O(n) time since each query costs O(1) time 2019/2/21 Chen and Chao