Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin Tao Jiang Kun-Mao Chao Dept CSIM, Providence Univ, Taiwan Dept CS and Engineering, UC Riverside, USA Dept Life Science, Nat. Yang-Ming Univ, Taiwan
Yaw-Ling Lin, Providence, Taiwan2 Outline Introduction. Applications to Biomolecular Sequence Analysis. Maximum Sum Consecutive Subsequence. Maximum Average Consecutive Subsequence. Implementation and Preliminary Experiments Concluding Remarks
Yaw-Ling Lin, Providence, Taiwan3 Introduction Two fundamental algorithms in searching for interesting regions in sequences: Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm. Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm.
Yaw-Ling Lin, Providence, Taiwan4 Applications to Biomolecular Sequence Analysis (I) Locating GC-Rich Regions –Finding GC-rich regions: an important problem in gene recognition and comparative genomics. –CpG islands ( 200 ~ 1400 bp ) –[Huang’94]: O(n L)-time algorithm. Post-Processing Sequence Alignments –Comparative analysis of human and mouse DNA: useful in gene prediction in human genome. –Mosaic effect: bad inner sequence. –Normalized local alignment. –Post-processing local aligned subsequences
Yaw-Ling Lin, Providence, Taiwan5 Applications to Biomolecular Sequence Analysis (II) Annotating Multiple Sequence Alignments – [Stojanovic’99]: conserved regions in biomolecular sequences. –Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value. Ungapped Local Alignments with Length Constraints –Computing the length-constrained segment of each diagonal in the matrix with the largest sum (or average) of scores. –Applications in motif identification.
Yaw-Ling Lin, Providence, Taiwan6 Maximum Sum Consecutive Subsequence is left-negative is not. is minimal left-negative partitioned.
Yaw-Ling Lin, Providence, Taiwan7 Minimal left-negative partition
Yaw-Ling Lin, Providence, Taiwan8 MLN-partition: linear time
Yaw-Ling Lin, Providence, Taiwan9 Max-Sum with LC
Yaw-Ling Lin, Providence, Taiwan10 Analysis of MSLC
Yaw-Ling Lin, Providence, Taiwan11 Max Average Subsequence is right-skew is not. is decreasing right- skew partitioned.
Yaw-Ling Lin, Providence, Taiwan12 Decreasing right-skiew partition
Yaw-Ling Lin, Providence, Taiwan13 DRS-partition: linear time
Yaw-Ling Lin, Providence, Taiwan14 Max-Avg-Seq with LC
Yaw-Ling Lin, Providence, Taiwan15 Locate good-partner
Yaw-Ling Lin, Providence, Taiwan16 Analysis of MaxAvgSeq
Yaw-Ling Lin, Providence, Taiwan17 Implementation and Preliminary Experiments
Yaw-Ling Lin, Providence, Taiwan18 Implementation and Preliminary Experiments
Yaw-Ling Lin, Providence, Taiwan19 Conclusion Find a max-sum subsequence of length at most U can be done in O(n)-time. Find a max-avg subsequence of length at least L can be done in O(n log L)-time. Is there a linear-time algorithm to find a max-avg subsequence of length at least L?
Yaw-Ling Lin, Providence, Taiwan20 Future Research Best k (nonintersecting) subsequences? Normalized local alignment? Measurement of goodness?