On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

MATH 224 – Discrete Mathematics
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Fast Algorithms For Hierarchical Range Histogram Constructions
Finding a Length-Constrained Maximum-Density Path in a Tree Rung-Ren Lin, Wen-Hsiung Kuo, and Kun-Mao Chao.
Optimal Testing of Digital Microfluidic Biochips: A Multiple Traveling Salesman Problem R. Garfinkel 1, I.I. Măndoiu 2, B. Paşaniuc 2 and A. Zelikovsky.
CS 206 Introduction to Computer Science II 03 / 02 / 2009 Instructor: Michael Eckmann.
Approaches to Problem Solving greedy algorithms dynamic programming backtracking divide-and-conquer.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Efficient Algorithms for Locating Maximum Average Consecutive Substrings Jie Zheng Department of Computer Science UC, Riverside.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Lowest common ancestors. Write an Euler tour of the tree LCA(1,5) = 3 Shallowest node.
Complexity Analysis (Part I)
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin Tao Jiang.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.
Indexing of Time Series by Major Minima and Maxima Eugene Fink Kevin B. Pratt Harith S. Gandhi.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
CS 206 Introduction to Computer Science II 10 / 13 / 2008 Instructor: Michael Eckmann.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
CSE 421 Algorithms Richard Anderson Lecture 16 Dynamic Programming.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Jan 6-10th, 2007VLSI Design A Reduced Complexity Algorithm for Minimizing N-Detect Tests Kalyana R. Kantipudi Vishwani D. Agrawal Department of Electrical.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.
Jan. 6, 2006VLSI Design '061 On the Size and Generation of Minimal N-Detection Tests Kalyana R. Kantipudi Vishwani D. Agrawal Department of Electrical.
Branch and Bound Algorithm for Solving Integer Linear Programming
1 A Linear Space Algorithm for Computing Maximal Common Subsequences Author: D.S. Hirschberg Publisher: Communications of the ACM 1975 Presenter: Han-Chen.
Data Structure Algorithm Analysis TA: Abbas Sarraf
Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:
Time Complexity Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.
Longest Increasing Subsequences in Windows Based on Canonical Antichain Partition Erdong Chen (Joint work with Linji Yang & Hao Yuan) Shanghai Jiao Tong.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Dynamic Programming.
CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.
Optimizing multi-pattern searches for compressed suffix arrays Kalle Karhu Department of Computer Science and Engineering Aalto University, School of Science,
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Efficient Processing of Top-k Spatial Preference Queries
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Approaches to Problem Solving greedy algorithms dynamic programming backtracking divide-and-conquer.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 18.
Never-ending stories Kun-Mao Chao ( 趙坤茂 ) Dept. of Computer Science and Information Engineering National Taiwan University, Taiwan
Space-Saving Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.
Approximation Algorithms based on linear programming.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Finding Fastest Paths on A Road Network with Speed Patterns
Heaviest Segments in a Number Sequence
Searching: linear & binary
Dynamic Data Structures for Simplicial Thickness Queries
Minimizing the Aggregate Movements for Interval Coverage
On the Range Maximum-Sum Segment Query Problem
An O(n log n)-Time Algorithm for the k-Center Problem in Trees
Danny Z. Chen1, Yan Gu2, Jian Li2, and Haitao Wang1
Time Relaxed Spatiotemporal Trajectory Joins
Outlier Respecting Points Approximation
Continuous Density Queries for Moving Objects
Programming with data Lecture 3
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2004/12

Outline Motivation Motivation Problems that raised from Bioinformatics applications Problems that raised from Bioinformatics applications Definition of our research problem (RMSQ) Definition of our research problem (RMSQ) Our main idea Our main idea Finding partners for each indices Finding partners for each indices Reduce the problem to the Range Minima Query problem (RMQ) Reduce the problem to the Range Minima Query problem (RMQ) Conclusions and applications Conclusions and applications Solving three relevant problems in O(n) time Solving three relevant problems in O(n) time

Applications to biomolecular sequence analysis Locating conserved regions or GC-rich regions Locating conserved regions or GC-rich regions Assign a real number (also called scores) to each residue Assign a real number (also called scores) to each residue Looking for the maximum-sum or maximum-average segments Looking for the maximum-sum or maximum-average segments With length constraints or average lower bound With length constraints or average lower bound

What is a Maximum-Sum Segment? Also called maximum-sum intervals or maximum scoring regions Also called maximum-sum intervals or maximum scoring regions Given a sequence of numbers, the maximum-sum segment is simply the continuous subsequence having the greatest total sum. Given a sequence of numbers, the maximum-sum segment is simply the continuous subsequence having the greatest total sum. zero prefix/suffix sum is not allowed Total sum = 8

Finding the maximum-sum segment with length constraints Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave the O(n)-time algorithm for this problem, respectively. Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave the O(n)-time algorithm for this problem, respectively. Length at least L, at most U Length at least L, at most U L U

Finding all maximal-sum segments Ruzzo and Tompa [ISMB 1999] gave a O(n) time algorithm for this problem. Ruzzo and Tompa [ISMB 1999] gave a O(n) time algorithm for this problem. Recursive calls. Recursive calls. S RL

Finding the longest segment with average constraints Wang and Xu [Bioinformatics 2003] gave a linear time algorithm Wang and Xu [Bioinformatics 2003] gave a linear time algorithm

Our results We propose an algorithm that runs in O(n) preprocessing time and O(1) query time We propose an algorithm that runs in O(n) preprocessing time and O(1) query time We use the RMSQ techniques we developed to solve the three problems mentioned above in O(n) time We use the RMSQ techniques we developed to solve the three problems mentioned above in O(n) time

Problem Definition R ange M aximum-Sum S egment Q uery problem R ange M aximum-Sum S egment Q uery problem The input is a sequence of real numbers which is to be preprocessed. A query is comprised of two intervals [i, j] and [k, l], our goal is to return the maximum-sum segment whose starting index lies in [i, j] and ending index lies in [k, l]. The input is a sequence of real numbers which is to be preprocessed. A query is comprised of two intervals [i, j] and [k, l], our goal is to return the maximum-sum segment whose starting index lies in [i, j] and ending index lies in [k, l].

A Nonoverlapping Example Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 6 Startin g region End region

An Overlapping Example Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 8 Startin g region End region

Main Idea Reduce to the RMQ problem Reduce to the RMQ problem Theorem. If there is a -time solution for the RMQ problem, then there is a -time solution for the RMSQ problem. Theorem. If there is a -time solution for the RMQ problem, then there is a -time solution for the RMSQ problem. RMSQ RMQ O(n) O(1)

A relevant problem - RMQ Range Minima Query Problem (also called Discrete Range Searching) Range Minima Query Problem (also called Discrete Range Searching)

Cumulative sum

Case 1: Nonoverlapping sum(i, j ) = prefix-sum(j) – prefix-sum(i-1) 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a lowest point here Find a highest point here Can be reduced to the RMQ problem Maximize Minimize

Case 2: Overlapping Some problems occur in the overlapping case: Some problems occur in the overlapping case: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a lowest point here Find a highest point here Negative Sum !!

Case 2: Overlapping Divide into 3 possible cases: Divide into 3 possible cases: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a lowest point here Find a highest point here Find a lowest point here Find a highest point here

A special case of RMSQ: single range query Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Challenge: Can this special case be reduced to the RMQ problem? Challenge: Can this special case be reduced to the RMQ problem? Total sum = 6

Idea Step 1. Find a partner for each index. Step 1. Find a partner for each index. Step 2. Record the sum of each pair in an array Step 2. Record the sum of each pair in an array Step 3. Reduce to the RMQ problem -- retrieve the maximum-sum pair within the querying interval Step 3. Reduce to the RMQ problem -- retrieve the maximum-sum pair within the querying interval

Our First Attempt (1) Step 1: For each index i, we define the lowest point preceding i as its partner Step 1: For each index i, we define the lowest point preceding i as its partner i partner(i)

Our First Attempt (2) Step 2: Record sum(i, partner(i)) in an array Step 2: Record sum(i, partner(i)) in an array i partner(i) sum(i, partner(i))

Our First Attempt (3) Step 3: Apply the RMQ techniques to an array Step 3: Apply the RMQ techniques to an array i partner(i) sum(i, partner(i)) Retrieve the maximum-sum pair

Faults What if its partner go beyond the querying interval? What if its partner go beyond the querying interval? i partner(i) sum(i, partner(i)) Needs to be updated Worst case

A Better Partner

Nesting Property Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Update can be done in O(1) time 9,-10, 4,-2, 6,-5, 4,-3,8, -11, 8,-3, 9,-5, 3 Apply RMQ techniques

Use RMSQ Techniques to Solve the Other two relevant problems 1. Finding the Maximum-Sum Segment with length constraints in O(n) time. 1. Finding the Maximum-Sum Segment with length constraints in O(n) time. - Y.-L. Lin, T. Jiang, K.-M. Chao, T.-H Fan, S. Lee, H.-I. Lu, T.-S. Tsou, Finding all maximal scoring subsequences in O(n) time. 2. Finding all maximal scoring subsequences in O(n) time. - W. L. Ruzzo & M. Tompa, 1999

Maximum-Sum Segment with length constraints Length at least L, at most U Length at least L, at most U L U Runs in O(n) time since each query costs O(1) time

All Maximal Scoring Subsequences Recursive calls. Recursive calls. S RL Runs in O(n) time since each query costs O(1) time

The End Thank You. Thank You.