Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems (2005) Student: Vishal Kapoor
Presentation Outline Problem Introduction Past Research Results of the paper CLUSTER EDITING –Kernelization –Search Tree CLUSTER DELETION Questions
Problem Statement Make k changes to the edge set of an input graph to get vertex disjoint cliques. Each connected component is a clique in the resulting cluster graph CLUSTER EDITING –Both edge additions and deletions are allowed CLUSTER DELETION –Only edge deletions are allowed Used in clustering of data – vertices are adjacent iff their similarity exceeds a threshold
Past Research [2000] Study of both these problems started by Shamir et. al. who proved that they are NPC and APX-hard [1996] Cai studied the problem of edge additions and deletions and vertex deletions for certain graphs and showed it is FPT [2001] Natanzon et. al. gave a general c-approximation for deletion and editing problems on bounded degree graphs for graphs with certain properties [2002] Khot and Raman investigated the complexity of vertex deletion problems to find subgraphs with hereditary properties
Results of this paper CLUSTER EDITING – O(2.27 k +|V| 3 ) CLUSTER DELETION – O(1.77 k +|V| 3 ) By using certain reduction rules, the resulting kernel size = O(k 3 ) –Has at most 2k vertices and 2k 3 +k 2 edges.
u v common neighbor non-common neighbor CLUSTER EDITING
Reduction Rules Rule1: a.If u and v have more than k common neighbors then {u,v} is set to ADDED and added to E if not already there b.If u and v have more than k non-common neighbors then {u,v} is set to DELETED and deleted from E if already there c.If u and v have both more than k common neighbors and more than k non-common neighbors then the instance has no solution
Reduction Rules Rule2: For every 3 vertices u, v and w: a.If {u,v} = ADDED and {u,w} = ADDED then {v,w} should be set to ADDED and added if not already in E b.If {u,v} = ADDED and {u,w} = DELETED then {v,w} should be set to DELETED and deleted from E if already present
Running Time What is checked? –Every pair of vertices Every vertex which is a neighbor of both of them Takes time O(|V| 3 )
Kernel Size The kernel contains at most (2k+1).k vertices and at most (2k+1 choose 2).k edges. Proof Skipped
Branch and Search Algorithm Identify a bad triple (of 3 vertices) in the kernel and repair it by adding/deleting edges to/from it, to transform the graph into disjoint cliques Overall at most k edge additions/deletions are allowed 2 branching strategies: –Basic = O(3 k ) –Advanced = O(2.27 k )
Lemma: A graph consists of disjoint cliques iff there are no three vertices u,v,w such that {u,v}, {u,w} are edges, but {v,w} is not an edge i.e. among such a triple, there should either be a single edge or a triangle Thus if a graph is not a union of disjoint cliques, then a bad triple can be found and repaired Basic Branching vw u
Basic Branch Algorithm 1.If G is a union of disjoint cliques, return SUCCESS 2.If k <= 0, return FAIL 3.Otherwise, find 3 vertices u,v,w such that edges {u,v}, {u,w} exist and {v,w} does not and branch on 3 instances of G’ as follows: a.E’ = E – {u,v}, k’=k-1 and set {u,v}=DELETED b.E’ = E – {u,w}, k’=k-1 and set {u,w} and {v,w}=DELETED, {u,v}=ADDED c.E’ = E + {v,w}, k’=k-1 and set all edges=ADDED
Branching Rules vw u vw u vw u vw u ? ? BR3 BR2 BR1
Running time The algorithm solves CLUSTER EDITING in time = O(3 k.k 2 +|V| 3 ) 1.O(|V| 3 ) is the time required to find all bad triples 2.O(3 k ) is the size of the search tree 3.The kernel (modified input G’) has |V| = O(k 2 ) vertices. So a newly added/deleted edge can create/delete at most O(k 2 ) bad triples. [And the edge list can then be updated only for vertices affected by that edge in O(k 2 ) time.]
Eg. NOTE: The time can be improved to O(3 k +|V| 3 ) by using repeated kernelization at every search tree node whenever possible for a polynomial size problem kernel Similarly CLUSTER-DELETION can be solved in time = O(2 k +|V| 3 )
Advanced Branch Algorithm 1.Bad triples are considered, but their classification is refined further as follows: v w u v w u v w u C1 C2 C3
Branching for each case For C1: BR3 cannot give a solution better than both BR1 and BR2 and can be omitted If N(v) >= N(w), then total edges changed to make 1 clique >= total edges changed to make 2 cliques u2 v2 w2 v1 w1 u1 v w u C1
Edges added to make 1 clique = –{v,w} added = +1 –{v,N(w)} added – {u,v} existing = N(v) – 1 –{w,N(v)} added – {u,w} existing = N(w) – 1 –joining all N(w) and N(v) = ([N(w)+N(v)] choose 2) –joining each N(v) and N(w) with u = N(v)+N(w) –Total = 2.[N(v) + N(w)] + ([N(w)+N(v)] choose 2) – 1 =>(A) Edges changed to make 2 cliques = –N(w) deleted = N(w) –{v,N(w)} added – {u,v} existing = N(v) – 1 –joining all N(w) and N(v) = ([N(w)+N(v)] choose 2) –joining each N(v) and N(w) with u = N(v)+N(w) –Total = N(v) + 3.N(w) + ([N(w)+N(v)] choose 2) – 1 =>(B) Conclusion: As N(v) >= N(w) So (A) >= (B). u2 v2 w2 v1 w1 u1 v w u C1
Thus only BR1 and BR2 can be used: So resulting graphs = G\{u,v} or G\{u,w} and branching vector = (1,1) And final recurrence relation: T(k) = 2.T(k-1) with root = 2. So final tree size for C1 = 2 k. vw u vw u ? ? BR2 BR1
For C2: Branching Vector = (1,2,3,2,3)
For C3: Branching Vector = (1,2,3,2,3)
Overall Running Time Solve T(k) = T(k-1) + 2 [T(k-2) + T(k-3)] So final worst search tree size = O(2.27 k ) Thus CLUSTER-EDITING can be solved in O(2.27 k +|V| 3 )
Cases for CLUSTER-DELETION: Branching Vector = (2,3,2,3) and running time = O(1.77 k + |V| 3 )
Questions? Thanks.