Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.

Slides:

Advertisements

Similar presentations

Longest Common Subsequence

Advertisements

Suffix Trees Construction and Applications João Carreira 2008.

Finding a Length-Constrained Maximum-Density Path in a Tree Rung-Ren Lin, Wen-Hsiung Kuo, and Kun-Mao Chao.

Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan

Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.

Efficient Algorithms for Locating Maximum Average Consecutive Substrings Jie Zheng Department of Computer Science UC, Riverside.

296.3: Algorithms in the Real World

The number of edge-disjoint transitive triples in a tournament.

Minimum Spanning Trees Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan

6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.

Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin Tao Jiang.

Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.

Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting CS 202 – Fundamental Structures of Computer Science II Bilkent.

Space-Saving Strategies for Computing Δ-points Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,

Great Theoretical Ideas in Computer Science.

Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?

Counting Spanning Trees Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan

Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.

Sorting HKOI Training Team (Advanced)

Dynamic-Programming Strategies for Analyzing Biomolecular Sequences.

Dynamic Programming Method for Analyzing Biomolecular Sequences Tao Jiang Department of Computer Science University of California - Riverside (Typeset.

Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.

Minimum Routing Cost Spanning Trees Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.

Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.

Great Theoretical Ideas in Computer Science.

Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:

Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,

Apple Raises $17 Billion in Record Debt Sale Kun-Mao Chao Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.

Eye-Tracking Tech Kun-Mao Chao Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan A note.

Dynamic Programming Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan

Great Theoretical Ideas in Computer Science for Some.

Never-ending stories Kun-Mao Chao ( 趙坤茂 ) Dept. of Computer Science and Information Engineering National Taiwan University, Taiwan

Space-Saving Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.

National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.

On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.

D ESIGN & A NALYSIS OF A LGORITHM 13 – D YNAMIC P ROGRAMMING (C ASE S TUDIES ) Informatics Department Parahyangan Catholic University.

Homology Search Tools Kun-Mao Chao (趙坤茂)

Sequence Alignment Kun-Mao Chao (趙坤茂)

Homology Search Tools Kun-Mao Chao (趙坤茂)

Dynamic-Programming Strategies for Analyzing Biomolecular Sequences

SMA5422: Special Topics in Biotechnology

Heaviest Segments in a Number Sequence

Great Theoretical Ideas in Computer Science

A Quick Note on Useful Algorithmic Strategies

KMP String Matching Donald Knuth Jim H. Morris Vaughan Pratt 1997.

On the Range Maximum-Sum Segment Query Problem

A Note on Useful Algorithmic Strategies

A Note on Useful Algorithmic Strategies

A Note on Useful Algorithmic Strategies

Sequence Alignment Kun-Mao Chao (趙坤茂)

Sequence Alignment Kun-Mao Chao (趙坤茂)

Multiple Sequence Alignment

Approximation Algorithms for the Selection of Robust Tag SNPs

Space-Saving Strategies for Computing Δ-points

Space-Saving Strategies for Computing Δ-points

Space-Saving Strategies for Analyzing Biomolecular Sequences

A Note on Useful Algorithmic Strategies

Homology Search Tools Kun-Mao Chao (趙坤茂)

Sorting Sorting is a fundamental problem in computer science.

Trees Kun-Mao Chao (趙坤茂)

Multiple Sequence Alignment

Space-Saving Strategies for Computing Δ-points

Space-Saving Strategies for Computing Δ-points

Dynamic Programming Kun-Mao Chao (趙坤茂)

Presentation transcript:

Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:

2 C+G rich regions locate a region with high C+G ratio ATGACTCGAGCTCGTCA Average C+G ratio

3 Defining scores for alignment columns infocon [Stojanovic et al., 1999] –Each column is assigned a score that measures its information content, based on the frequencies of the letters both within the column and within the alignment. CGGATCAT—GGA CTTAACATTGAA GAGAACATAGTA

4 Maximum-sum segment Given a sequence of real numbers a 1 a 2 …a n, find a consecutive subsequence with the maximum sum. 9 –3 1 7 – –4 2 –7 6 – For each position, we can compute the maximum- sum interval ending at that position in O(n) time. Therefore, a naive algorithm runs in O(n 2 ) time.

5 Maximum-sum segment (The recurrence relation) Define S(i) to be the maximum sum of the segments ending at position i. aiai If S(i-1) < 0, concatenating a i with its previous segment gives less sum than a i itself.

6 Maximum-sum segment (Tabular computation) 9 –3 1 7 – –4 2 –7 6 – S(i) – – The maximum sum

7 Maximum-sum interval (Traceback) 9 –3 1 7 – –4 2 –7 6 – S(i) – – The maximum-sum segment:

8 Computing segment sum in O(1) time? Input: a sequence of real numbers a 1 a 2 …a n Query: the sum of a i a i+1 …a j

9 Computing segment sum in O(1) time prefix-sum(i) = S[1]+S[2]+…+S[i], all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) prefix-sum(j) i j prefix-sum(i-1)

10 Computing segment average in O(1) time prefix-sum(i) = S[1]+S[2]+…+S[i], all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) density(i, j) = sum(i, j) / (j-i+1) prefix-sum(j) i j prefix-sum(i-1)

11 Maximum-average segment Maximum-average interval The maximum element is the answer. It can be done in O(n) time.

12 Maximum average segments Define A(i) to be the maximum average of the segments ending at position i. How to compute A(i) efficiently?

13 Left-Skew Decomposition Partition S into substrings S 1,S 2,…,S k such that each S i is a left-skew substring of S the average of any suffix is always less than or equal to the average of the remaining prefix. density(S 1 ) < density(S 2 ) < … < density(S k ) Compute A(i) in linear time

14 Left-Skew Decomposition Increasingly left-skew decomposition (O(n) time)

15 Right-Skew Decomposition Partition S into substrings S 1,S 2,…,S k such that each S i is a right-skew substring of S the average of any prefix is always less than or equal to the average of the remaining suffix. density(S 1 ) > density(S 2 ) > … > density(S k ) [Lin, Jiang, Chao] Unique Computable in linear time. The Inventors of the Right-Skew Decomposition (Oops! Wrong photo!) The Inventors of the Right-Skew Decomposition The Inventors of the Right-Skew Decomposition (This is a right one. more) The Inventors of the Right-Skew Decomposition more

16 Right-Skew Decomposition Decreasingly right-skew decomposition (O(n) time)

17 Right-Skew pointers p[ ] p[ ]

18

19 Any more interested problems? Theorem Biology easily has 500 years of exciting problems to work on. Proof. This was said by Donald Knuth in Corollary Biology still has at least 485 years of exciting problems to work on. (Re-Stated by Kun-Mao Chao in 2008) Proof. 500 – (2008 – 1993) = 485.