Download presentation
Presentation is loading. Please wait.
1
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang Kun-Mao Chao * Dept CS & Info Mngmt, Providence Univ, Taiwan Dept CS & Engineering, UC Riverside, USA Dept CS & Info Engnr, Nat. Taiwan Univ, Taiwan
2
Yaw-Ling Lin, Providence, Taiwan2 Outline Introduction. Applications to Biomolecular Sequence Analysis. Maximum Sum Consecutive Subsequence. Maximum Average Consecutive Subsequence. Implementation and Preliminary Experiments Concluding Remarks
3
Yaw-Ling Lin, Providence, Taiwan3 Motivation: GC-rich Region
4
Yaw-Ling Lin, Providence, Taiwan4 Introduction Two fundamental algorithms in searching for interesting regions in sequences: Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm. Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm.
5
Yaw-Ling Lin, Providence, Taiwan5 Applications to Biomolecular Sequence Analysis (I) Locating GC-Rich Regions –Finding GC-rich regions: an important problem in gene recognition and comparative genomics. –CpG islands ( 200 ~ 1400 bp ) –[Huang’94]: O(n L)-time algorithm. Post-Processing Sequence Alignments –Comparative analysis of human and mouse DNA: useful in gene prediction in human genome. –Mosaic effect: bad inner sequence. –Normalized local alignment. –Post-processing local aligned subsequences
6
Yaw-Ling Lin, Providence, Taiwan6 Applications to Biomolecular Sequence Analysis (II) Annotating Multiple Sequence Alignments – [Stojanovic’99]: conserved regions in biomolecular sequences. –Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value. Ungapped Local Alignments with Length Constraints –Computing the length-constrained segment of each diagonal in the matrix with the largest sum (or average) of scores. –Applications in motif identification.
7
Yaw-Ling Lin, Providence, Taiwan7 Maximum Sum Consecutive Subsequence is left-negative is not. is minimal left-negative partitioned.
8
Yaw-Ling Lin, Providence, Taiwan8 Minimal left-negative partition
9
Yaw-Ling Lin, Providence, Taiwan9 MLN-partition: linear time
10
Yaw-Ling Lin, Providence, Taiwan10 Max-Sum with LC
11
Yaw-Ling Lin, Providence, Taiwan11 Analysis of MSLC
12
Yaw-Ling Lin, Providence, Taiwan12 Max Average Subsequence is right-skew is not. is decreasing right- skew partitioned.
13
Yaw-Ling Lin, Providence, Taiwan13 Decreasing right-skiew partition
14
Yaw-Ling Lin, Providence, Taiwan14 DRS-partition: linear time
15
Yaw-Ling Lin, Providence, Taiwan15 Max-Avg-Seq with LC
16
Yaw-Ling Lin, Providence, Taiwan16 Locate good-partner
17
Yaw-Ling Lin, Providence, Taiwan17 Analysis of MaxAvgSeq
18
Yaw-Ling Lin, Providence, Taiwan18 Implementation and Preliminary Experiments
19
Yaw-Ling Lin, Providence, Taiwan19 Implementation and Preliminary Experiments
20
Yaw-Ling Lin, Providence, Taiwan20 Conclusion Find a max-sum subsequence of length at most U can be done in O(n)-time. Find a max-avg subsequence of length at least L can be done in O(n log L)-time.
21
Yaw-Ling Lin, Providence, Taiwan21 Recent Progress Lu (CMCT’2002): finding the max-avg subsequence of length at least L on binary (0,1) sequences. O(n)-time. Goldwasser, Kao, Lu (WABI’2002): finding the max-avg subsequence of length at least L and at most U on real sequences. O(n)-time Tools: finding CpG islands using MAVG (joint work with Huang, X., Jiang, T. and Chao, K.-M.) http://deepc2.zool.iastate.edu/aat/mavg/cgdoc.html http://deepc2.zool.iastate.edu/aat/mavg/cg.html
22
Goldwasser, Kao, Lu (WABI’2002)’s Linear-Time Algorithm
23
Yaw-Ling Lin, Providence, Taiwan23 A new important observation i < j < g(j) < g(i) implies density(i, g(i)) is no more than density(j, g(j)) ig(i) j g(j)
24
Yaw-Ling Lin, Providence, Taiwan24 ig(i) j g(j)g(j)
25
Yaw-Ling Lin, Providence, Taiwan25 Searching for all g(i) in linear time
26
Yaw-Ling Lin, Providence, Taiwan26 Some thoughts Attacking new problems with new ideas. Collaboration is important for bioinformatics –Communication –Work on what you are good at
27
Yaw-Ling Lin, Providence, Taiwan27 Future Research Best k (nonintersecting) subsequences? Normalized local alignment? Measurement of goodness?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.