Tatsuie Tsukiji (speaker) Tokyo Denki University Computing Phylogenetic Roots with Bounded Degrees and Errors is NP-complete Tatsuie Tsukiji (speaker) Zhi-Zhong Chen Tokyo Denki University
The phylogenetic kth root problem (PRk) (k ≧ 2 is a fixed constant.) PR4 Called “phylogeny” k Output: a tree T such that b Given: a graph G. j i a d c h f g e f j c b d a e g h i Leaves of T = vertices of G. Each vertex represents an extant specie. Two vertices are adjacent in G iff their distance in T is at most 4. Each edge corresponds to similarity in evolutionary characteristics. k Degree of each internal node of T is at least 3.
What are known about PRk ? PRk is solvable in polynomial time for k = 2, 3, 4 . The complexity of PRk for k > 4 is still known. ΔPRk : a natural special case of PRk where the output phylogeny has maximum degree Δ. ΔPRk can be solved in linear time
An optimization problem The closest phylogeneitic kth root problem (CPRk) Given: a graph G = (V, E ). Output: a phylogeny T that minimizes the number of errors |T k E | |T 3 E | =4 T T3 T3-E G = (V, E ) E-T3
An optimization problem The closest phylogeneitic kth root problem (CPRk) Given: a graph G = (V, E ). Output: a phylogeny T that minimizes the number of errors , where |T k E | Motivation: G is derived from some similarity data, which are usually inexact in practice. CPR2 has been studied extensively. (See correlation clustering papers in FOCS and STOC.)
Results Known results PRk is Solvable in polynomial time for k = 2, 3, 4 . ΔPRk can be solved in linear time CPRk is NP-hard for any fixed k ≧ 2. New Result ΔCPRk is NP-complete, for any fixed k ≧ 3 and Δ≧ 3
NP-completeness: CPRk 1. CPR2 = Correlation Clustering Correlation Clustering: Minimize #(inner nonedges) + #(outer edges) of G clique of T2 a b c d e f g h unbounded degree 2. CPR2 ≦CPRk If <the clique size then |T 3 E(gaget )| dT(a,b) = 3 G a b gaget clique T
of graphs with maximum degree 3 NP-completeness: 3CPR3 from Hamiltonian Path of graphs with maximum degree 3 ∃T |T 3 E(G’) | ≦ #(degree-3 vertices)/2 G has HP T, 3 × G error = ½ at degree-3 vertices of G error = 0 at degree-2 vertices of G G’ 1/2 1 ½ error ≧ ½ at degree-3 vertices of G
NP-completeness: 3CPR3 ≦ 3CPR5 Pad distance 1 at every vertex of G If |T5 E(7-clique)| ≦ 2 then T is 7-clique or 7-clique ∃1 degree-2 internal node port Distance( , ) = 1 Distance( , ) ≧ 2 7-clique = (5,1,2)-core graph
NP-completeness: 3CPR3 ≦ 3CPR5 : i-port 7-clique ,G’ ∃T |T 5 E(G’) | ≦ #(degree-3 vertices)/2 T lifted G G ∃T | T 3 at lifted G E(lifted G) | ≦ #(degree-3 vertices)/2
Core graph: 3CPR3 ≦ 3CPR7 Pad distance 1 ∃1 port Distance( , ) = 1 If |T7 E(11-clique)| ≦ 2 then T is ∃1 port Distance( , ) = 1 Distance( , ) ≧ 2 (7,1,2)-core graph
Core graph: 3CPR3 ≦ 3CPR7 ∃1 port Distance( , ) = 2 Distance( , ) ≧ 3 Pad distance 2 Phylogeny of 5-clique Phylogeny of 11-clique copies If |T7 E((the obtained tee)7)| ≦ 2 then ∃1 port Distance( , ) = 2 Distance( , ) ≧ 3 (7,2,2)-core graph
Summary and Open Problems The complexity of PRk for k > 4 ? ΔPRk ∈P new CPRk is NP-hard. ΔCPRk is NP-hard TRk,ΔTRk ∈P CTRk is NP-hard open Is ΔCTRk NP-hard ? Tree 3rd power Phylogenetic 3rd power