Heaviest Segments in a Number Sequence

Slides:



Advertisements
Similar presentations
Longest Common Subsequence
Advertisements

Finding a Length-Constrained Maximum-Density Path in a Tree Rung-Ren Lin, Wen-Hsiung Kuo, and Kun-Mao Chao.
Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Efficient Algorithms for Locating Maximum Average Consecutive Substrings Jie Zheng Department of Computer Science UC, Riverside.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin Tao Jiang.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Fall 2004COMP 3351 Languages. Fall 2004COMP 3352 A language is a set of strings String: A sequence of letters/symbols Examples: “cat”, “dog”, “house”,
Space-Saving Strategies for Computing Δ-points Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Counting Spanning Trees Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences.
Dynamic Programming Method for Analyzing Biomolecular Sequences Tao Jiang Department of Computer Science University of California - Riverside (Typeset.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet:
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
Apple Raises $17 Billion in Record Debt Sale Kun-Mao Chao Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
Dynamic Programming Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
Never-ending stories Kun-Mao Chao ( 趙坤茂 ) Dept. of Computer Science and Information Engineering National Taiwan University, Taiwan
Space-Saving Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.
On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.
D ESIGN & A NALYSIS OF A LGORITHM 13 – D YNAMIC P ROGRAMMING (C ASE S TUDIES ) Informatics Department Parahyangan Catholic University.
Homology Search Tools Kun-Mao Chao (趙坤茂)
Languages Costas Busch - LSU.
Sequence Alignment Kun-Mao Chao (趙坤茂)
Homology Search Tools Kun-Mao Chao (趙坤茂)
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences
Homology Search Tools Kun-Mao Chao (趙坤茂)
SMA5422: Special Topics in Biotechnology
Data Structures Review Session
Shortest-Paths Trees Kun-Mao Chao (趙坤茂)
Sequence Alignment Kun-Mao Chao (趙坤茂)
The Largest Known Prime Number
A Quick Note on Useful Algorithmic Strategies
On the Range Maximum-Sum Segment Query Problem
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Sequence Alignment Kun-Mao Chao (趙坤茂)
A Note on Useful Algorithmic Strategies
Sequence Alignment Kun-Mao Chao (趙坤茂)
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Analyzing Biomolecular Sequences
Multiple Sequence Alignment
Facebook’s WhatsApp Purchase
Approximation Algorithms for the Selection of Robust Tag SNPs
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Analyzing Biomolecular Sequences
Space-Saving Strategies for Computing Δ-points
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Homology Search Tools Kun-Mao Chao (趙坤茂)
Trees Kun-Mao Chao (趙坤茂)
Languages Fall 2018.
Minimum Spanning Trees
Multiple Sequence Alignment
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Dynamic Programming Kun-Mao Chao (趙坤茂)
Presentation transcript:

Heaviest Segments in a Number Sequence Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW: http://www.csie.ntu.edu.tw/~kmchao

C+G rich regions locate a region with high C+G ratio ATGACTCGAGCTCGTCA 00101011011011010 Average C+G ratio

Defining scores for alignment columns infocon [Stojanovic et al., 1999] Each column is assigned a score that measures its information content, based on the frequencies of the letters both within the column and within the alignment. CGGATCAT—GGA CTTAACATTGAA GAGAACATAGTA

Maximum-sum segment Given a sequence of real numbers a1a2…an , find a consecutive subsequence with the maximum sum. 9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9 For each position, we can compute the maximum-sum interval ending at that position in O(n) time. Therefore, a naive algorithm runs in O(n2) time.

Maximum-sum segment (The recurrence relation) Define S(i) to be the maximum sum of the segments ending at position i. ai If S(i-1) < 0, concatenating ai with its previous segment gives less sum than ai itself.

Maximum-sum segment (Tabular computation) 9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9 S(i) 9 6 7 14 –1 2 5 1 3 –4 6 4 12 16 7 The maximum sum

Maximum-sum interval (Traceback) 9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9 S(i) 9 6 7 14 –1 2 5 1 3 –4 6 4 12 16 7 The maximum-sum segment: 6 -2 8 4

Computing segment sum in O(1) time? Input: a sequence of real numbers a1a2…an Query: the sum of ai ai+1…aj

Computing segment sum in O(1) time prefix-sum(i) = S[1]+S[2]+…+S[i], all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) j i prefix-sum(j) prefix-sum(i-1)

Computing segment average in O(1) time prefix-sum(i) = S[1]+S[2]+…+S[i], all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) density(i, j) = sum(i, j) / (j-i+1) j i prefix-sum(j) prefix-sum(i-1)

Maximum-average segment Maximum-average interval 3 2 14 6 6 2 10 2 6 6 14 2 1 The maximum element is the answer. It can be done in O(n) time.

Maximum average segments Define A(i) to be the maximum average of the segments ending at position i. How to compute A(i) efficiently?

Left-Skew Decomposition Partition S into substrings S1,S2,…,Sk such that each Si is a left-skew substring of S the average of any suffix is always less than or equal to the average of the remaining prefix. density(S1) < density(S2) < … < density(Sk) Compute A(i) in linear time

Left-Skew Decomposition Increasingly left-skew decomposition (O(n) time) 5 6 7.5 5 8 7 8 9 8 9 8 2 7 3 8 9 1 8 7 9

Right-Skew Decomposition Partition S into substrings S1,S2,…,Sk such that each Si is a right-skew substring of S the average of any prefix is always less than or equal to the average of the remaining suffix. density(S1) > density(S2) > … > density(Sk) [Lin, Jiang, Chao] Unique Computable in linear time. The Inventors of the Right-Skew Decomposition (Oops! Wrong photo!) The Inventors of the Right-Skew Decomposition (This is a right one. more)

Right-Skew Decomposition Decreasingly right-skew decomposition (O(n) time) 5 6 7.5 5 9 8 9 8 7 8 9 7 8 1 9 8 3 7 2 8

Right-Skew pointers p[ ] 5 6 7.5 5 9 8 9 8 7 8 9 7 8 1 9 8 3 7 2 8 1 2 3 4 5 6 7 8 9 10 p[ ] 1 3 3 6 5 6 10 8 10 10