Download presentation
Presentation is loading. Please wait.
Published byBryan Woods Modified over 9 years ago
1
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (7) Partial k-Trees, Color Coding, and Comparison of Graphs Tatsuya Akutsu Bioinformatics Center Institute for Chemical Research Kyoto University
2
Tree Decomposition and Partial k-Tree [Flum, Grohe: Parameterized Complexity Theory, Springer]
3
Tree Decomposition Tree decomposition of G(V,E) Pair of rooted tree and family of sets of vertices For all v ∊ V, is connected For all {u,v} ∊ E, u, v ∊ B t holds for some t ∊ V T Width max t |B t |-1 Treewidth Minimum width of possible tree decompositions
4
Examples ⇒ treewidth of tree is 1 ⇒ treewidth of cycle is 2
5
Prop. Let s be parent and t 1,…,t h be children of node t. For all j, Several Properties Prop. Let t 1,…,t h be children of node t in T(V T,E T ). For all i≠j, Thm. Graphs with treewidth k is partial k-tree, and treewidth of partial k -tree is k Definition of partial k -tree is omitted. Thm. For fixed k, tree decomposition of partial k -tree can be computed in linear time Thm. Determination of treewidth is NP-hard ⇒ Many optimization problems can be solved in a bottom up manner
6
DP Algorithm for Partial k-Trees For fixed k, many NP-hard problems can be solved in polynomial time using dynamic programming Ex. Vertex cover problem Ch(t) : Set of children of node t in tree T Dynamic programming algorithm where W t is a vertex cover for a subgraph induced by B t, r is the root of T.
7
Explanation of DP Algorithm BtBt BsBs B s’ OPT t (W t ) : size of minimum vertex cover of G(t) under the condition that W t is cover of B t T(t): subtree of T induced by t and its descendants G(t): subgraph of G induced by
8
Analysis of Time Complexity Let k be a constant. Tree decomposition can be computed in linear time. For each t ∊ V T, at most 2 k+1 W t are tested. To compute min in Σ, 2 k+1 × 2 k+1 =4 k+1 pairs are tested per edge in T Thus, the total complexity is O(4 k poly(n)).
9
Applications to Bioinformatics Graphs representing structures of proteins and RNAs are considered to have small treewidth Examples Protein threading Protein side-chain packing Protein structure alignment Comparison of RNA secondary structures Attractor detection in Boolean networks
10
Color Coding [Alon et al.: J. ACM 1995]
11
k-Path Problem Input : undirected graph G(V,E), integer k Output : vertex disjoint path of G with length k NP-hard ⇐ Hamilton path problem if k=n(=|V|) Naïve algorithm : For each vertex v, examine neighbors, neighbors of neighbors, … ⇒ O(n k ) time Idea Partition V into k subsets ( color vertices using k colors ) If lucky, all vertices lie in different subsets ( analysis of such probability ⇒ randomized algorithm )
12
DP Algorithm P(u,C): 1 if there exists a path from v to u using each color in C exactly once, otherwise 0 ( C is a subset of {1,2,…,k} ) Initialization : P(v,{f(v)})←1, others be 0 (f(v) is color of v ) Recursion : ( in the order of |C|=1 to |C|=k-1 ) {u,w} ∈ E For each v, examine whether there exists k-path starting from v Path can be reconstructed by traceback P(v,{R})=1 v w u1u1 u2u2 P(w,{R,Y,B})=1 P(u 1,{R,Y, B,G})=1
13
Analysis of Time Complexity Lemma : The above algorithm works in O(2 k poly(n)) time Proof : Numbr of C is 2 k. Thus, it is enough to examine 2 k n P(u,C)s. This computation should be done for all initial vertex v, which needs additional O(n) factor P(u,C): 1 if there exists a path from v to u using each color in C exactly once, otherwise 0 ( C is a subset of {1,2,…,k} ) Initialization : P(v,{f(v)})←1, others be 0 (f(v) is color of v ) Recursion : ( in the order of |C|=1 to |C|=k-1 ) {u,w} ∈ E
14
Analysis of Success Probability Lemma : Let P be k -path of G. When randomly coloring, the probability that k vertices in P have different colors is ≧ e -k Proof : #coloring to P is k k. On the other hand, #(successful coloring) is k!. Therefore, by using Stirling formula, we have Theorem : By repeating the algorithm at least e k times, a solution can be obtained (if any) with probability ≧ 1/2 Proof : The probability of all fails is bounded by The algorithm never outputs a wrong solution
15
Derandomization Idea : use of hash function families k -perfect hash functions : Let F be a family of hash functions from V={1,2,…,n} to {1,2,…, k}. F is called a family of k-perfect hash functions if, for any k -element subsets of V, there exists a function f ∊ F that gives one-to-one mapping Corollary : k -Path Problem can be solved in 2 O(k) ・ poly(n) time Theorem : For any n and k, k -perfect hash functions with 2 O(k) ・ log 2 n functions can be constructed in 2 O(k) ・ n ・ log 2 n time ⇒ In place of random coloring, it is enough to examine all f given by this theorem
16
Applications of Color Coding `Path’ is color coding can be extended to small trees and small subgraphs (network motifs) ⇒ Applications to bioinformatics Network motif [Alon et al.: Bioinformatics, 2008] Signal pathway analysis [Huffner et al.: Bioinformatics 2007 & Algorithmica 2008] Network marker [Dao et al.: Bioinformatics 2011] Pathway search/alignment [Shlomi et al.: BMC Bioinformaics 2006]
17
Comparison of Chemical Graphs
18
Chemical Structures and Graphs Tree graph without cycle Almost tree tree + some edges (in each strongly connected component) Outerplanar graph No crossing edges No internal vertex Partial k -tree Decomposed into tree by identifying k+1 vertices as one node
19
Partial k -trees Partial k -tree ( tree width ≦ k ) Decomposed into tree by identifying k+1 vertices as one node Outerplanar graphs are 2-trees Chemical compounds in NCI database [Horvath & Ramon, TCS 2010] tree width 1 ( tree ) 21,950 2221,675 36,548 ≧4≧4 65 If we can design efficient algorithms for partial 4-trees, we can cover almost all chemical compounds
20
Three Matching Problems Graph isomorphism Are two graphs are essentially the same ? Subgraph isomorpshim Is one graph a part of the other graph ? Maximum common subgraph Largest (connected) common part between two given graphs
21
Complexity of Graph Comparison Problems Graph isomorphism Polynomial time for bounded degree graphs [Luks, JCSS, 1982] However, not practical because the algorithm is too complicated (based on group theory) Subgraph isomorphism Polynomial time for partial k -trees of bounded degree [Matousek & Thomas, Disc. Math., 1992] However, the algorithm is still too complicated Maximum common subgraph trees : polynomial time [Matula, Ann. Disc. Math, 1978] almost trees: polynomial time [Akutsu, IEICE Trans., 1993] outerplanar graphs : polynomial time [Akutsu & Tamura, Algorithms, 2013] partial k -trees : NP-hard for k=11 [Akutsu & Tamura, Proc. ISAAC 2013] partial k -trees with k=3 : open problem (since we recently improved to k=4 )
22
Algorithm for Outerplanar Graphs: Key Idea Difficulty: need to find cut points ⇒ easily lead to combinatorial explosion Idea: introduction of the concept of blade Lemma: #blades is O(n 2 ). ⇒ polynomial time algorithm
23
Maximum Common Subgraph: Summary Trees polynomial time [Matula, Ann. Disc. Math, 1978 ] Almost trees polynomial time [Akutsu, IEICE Trans.,1993] Outerplanar graphs of bounded degree polynomial time [Akutsu & Tamura, Algorithms, 2013] Partial k -trees of bounded degree NP-hard [Akutsu & Tamura, Proc. ISAAC 2013] ⇔ Polynomial time for subgraph isomorphism [Matousek & Thomas, Disc. Math., 1992]
24
Summary Tree Decomposition For fixed k, many NP-hard problems can be solved in polynomial time by DP algorithms Applications to analysis of protein/RNA structures Color Coding Useful for finding small paths/subgraphs in networks Applications to biological pathway analysis Comparison of Chemical Graphs The maximum common subgraph problem is NP-hard even for partial k -trees for k=4, but is solvable in polynomial time for outerplanar graphs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.