Finding a Length-Constrained Maximum-Density Path in a Tree Rung-Ren Lin, Wen-Hsiung Kuo, and Kun-Mao Chao
Overview Introduction of the problem. A brief discussion of one-dimensional sequence. Two approaches of finding the maximum- density path in a tree.
Input & Output Input : A weighted tree with n edges and a lower bound L. The lower bound is necessary. Output : A maximum-density path with length at least L.
The Density of a Path The density of a path is defined as follow : Given a path with k edges E={e 1, e 2, …, e k }, and w(e) is an edge weight function. The density of such path is defined as
Lower Bound L = Max-Density = ( ) / 5 = 6
The Efficiency We propose two efficient algorithms reach O(nL) time. One of them is further modified to solve some special cases such as full m-ary trees in linear time.
One-Dimensional Sequence The maximum-average segment problem arises naturally in several areas of sequence analysis. For example, given a DNA sequence, which segment of the sequence of length at least L has the highest GC ratio. The efficiency of the naïve algorithm of one- dimensional sequence is O(n 2 ).
Lower Bound L There exists an optimal segment of length at most 2L-1. It can be proved by a counter argument. The naïve algorithm is O(nL) now. AB
The Relation between Two Overlapped Sequences Let P[x] denotes the maximum-density of those segments that start from x. For a given j<i ≦ P[i]<P[j], the density of (i,P[i]) is greater than the density of (j,P[j]) i P[i] jP[j] XYZ Y
Right-Skew Segment A sequence S={s 1, s 2, …, s k } is right-skew if and only if the average of any prefix {s 1, s 2, …, s i ) is never larger than the average of the remaining subsequence {s i+1, …, s k ). According to the tricks mentioned above, Goldwasser et al. proposed a linear time algorithm for one-dimensional sequence.
Right-skew Decomposition Decreasing right-skew decomposition is unique
Finding Maximum-Density Path In a Tree The path in a tree is similar to the segment of an one-dimensional sequence. That is, there exists a maximum-density path of length at most 2L-1. We propose two approaches reach the time complexity of O(nL).
Downward & Upward Paths We classify the paths that start from node K into two types called downward and upward paths. One is to stretch downward to its children only, called downward paths. And the other is to include at least its parent, called upward paths.
An upward path of node K K A downward path of node K A general tree
Notation Let denotes the maximum-density downward path of node K of length i. Let denotes the second-best one. If there is a tie, choose an arbitrary path. Similarly, represent the maximum upward path of node K of length i.
A
Contributor The node which determines its parent’s maximum-density downward path is called contributor. Each maximum-density downward path of a given length has its own contributor. The upward path has no contributor because each node has at most one parent.
A B C D node D node C node D node B
Downward & Upward Table The downward table of node K is composed of and, where 1 ≦ i ≦ 2L-1. Note that there exists an optimal path with length at most 2L-1 Similarly, the upward table of node K is composed of, where 1 ≦ i ≦ 2L-1.
Constructing Downward Table For a given internal node K with m children {K 1, K 2, …, K m }. Let e j denotes the edge (K, K j ). We can construct the downward table of node K by bottom-up dynamic programming. All contributors should be recorded.
Constructing Upward Table Suppose K’ is the parent of node K, and e’ denote the edge (K’, K). Upward Table of node K can be constructed by top-down dynamic programming.
Time Complexity Deciding a given length of downward and upward path of any node is O(1). Thus, constructing downward and upward table of a node takes O(L). There are totally n nodes in a tree, so it take O(nL) time to complete all downward and upward tables.
Approach I : Finding the Path from its End Node Once the downward and upward table of each node are constructed, then we can determine the maximum-density path of a given node with length from L to 2L-1 in O(L). Therefore, it takes O(nL) time to check all nodes since there are n nodes in such tree.
Approach II : Finding the Path from its LCA Node LCA stands for least common ancestor. We are now combining two downward paths together.