Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Algorithms for Enumerating All Maximal Cliques

Similar presentations


Presentation on theme: "New Algorithms for Enumerating All Maximal Cliques"— Presentation transcript:

1 New Algorithms for Enumerating All Maximal Cliques
Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN Informatics, JAPAN 9/Jul/2004 SWAT 2004

2 Background ・ There are still many unsolved nice problems
Recently, Enumeration algorithms are interesting ・ There are still many unsolved nice problems (unlike to ordinal discrete algorithms) ・ Recent increase of computer power makes many enumeration problems practically solvable  many applications have been appearing, such as, genome, data mining, clustering, so on ・ Some (theoretical) algorithms use enumeration as subroutines (recognition of perfect graph)

3 5000 researchers use enumeration algorithms ?????
Background (cont.) ・ My institute has 100 researchers of informatics ・ At least 5 researchers (independently) use implementations of enumeration algorithms ・ Suppose that there are 100,000 researchers of informatics in the world 5000 researchers use enumeration algorithms ?????

4 Problems and Results Problem1 : for a given graph G=(V, E),
enumerate all maximal cliques in G Problem2 : for a given bipartite graph G=(V1∪V2, E), enumerate all maximal bipartite cliques in G ( Problem2 is a special case of Problem1 ) ・ We propose algorithms for solving these problems, reduce the time complexity in dense cases and sparse cases. ・ Computational experiments for random graphs and real-world data

5 Difficulty ・ Consider branch-and-bound type enumeration:
divide maximal cliques into two groups maximal cliques including v / not including v ・ If a group includes no maximal clique,  cut off the branch  Finding a maximal clique not including given vertices of S is NP-Complete  Can not cut off subproblems(branches) including no maximal clique v1∈K v1∈K v2∈K v2∈K

6 Existing Studies and Ours
O(|V||E|): Tsukiyama, Ide, Ariyoshi & Shirakawa, O(|V||E|), lexicographic order: Johnson, Yanakakis & Papadimitriou O(a(G)|E|): Chiba & Nishizeki ( a(G): arboricity of G with m/(n-1) ≦ a(G) ≦m1/2 ) ・ many heuristic algorithms in data mining, for bipartite case Ours: O(|V|2.376) (dense case) O(Δ4) (sparse case) O((Δ*)4 + θ3 ) (θ vertices have degree > Δ* ) O(Δ3) (bipartite case) O(Δ2) (bipartite case with using much memory)

7 Enumeration of Maximal Cliques
・ Improved version of algorithm of Tsukiyama et. al. Idea: Construct a route on all maximal cliques to be traversed ・ For a maximal clique K of G = ( V, E ) : C (K) : lexicographically maximum maximal clique including K K≦i : vertices of K with indices ≦ i i(K) : minimum index s.t. C(K≦i) = C(K≦i+1) parent of a maximal clique K : C(K≦i(K)-1) ・ parent is lexicographically larger than K 3 4 6 7 9 9 Lexicographically larger 4 1 11 7 8 10 1,2,3 > 1,2,4 3 10 1,3,6 > 1,4,5 2 K 6 8 i(K) 5

8 Graph Representation of Relation
・ Parent-child relation is acyclic    graph representation forms a tree (enumeration tree) Visit all maximal cliques by depth-first search ・ need to find children of a maximal clique

9 Child of Maximal Clique
Γ(vi) : vertices adjacent to vi K[i] = C ( K≦i ∩ Γ(vi) ∪ {vi} ) ・ H is a child of K only if H = K[i] for some i>i(K) (H is a child of K if the parent of K[i] is K ) ・ i(K[i]) = i ・construct K[i] in O(|E|) time ・construct parent in O(|E|) time ( O(Δ2 ) time) ・for i=i(K)+1,…,|V| in O(|V||E|) time  enumerate O(|V||E|) time per maximal clique K,i(K)=6 4 9 9 4 1 11 7 K[8] 10 8 3 10 2 6 8 5

10 Characterization of Child
The parent of K[i] = K ⇔ (1) no vj , j<i is adjacent to all vertices in K≦i ∩Γ(vi) ∪ {vi} (2) no vj , j<i is adjacent to all vertices in K≦i∩Γ(vi) ∪ K≦j (1) is not satisfied ⇔ K[i] and parent of K[i] includes vj∈K (2) is not satisfied ⇔ parent of K[i] includes vj∈K 5 K = {3,4,7,9} K[10] = {3,7,10} K≦ = {3,4} K ≦7∩Γ(v10) = {3,7} 7 1 4 K≦5∪ 4 9 3 10 K ≦10∩Γ(v10) ∪ {v10}

11 Use of Matrix Multiplication
・ Check the conditions (1) and (2) by matrix multiplication (1) no vj , j<i is adjacent to all vertices in K ≦i ∩Γ(vi) ∪ {vi} ith row of left ⇒ K≦i∩Γ(vi)∪{vi} jth column of right ⇒ Γ(vj) ij cell of product ⇒ | K≦i∩Γ(vi)∪{vi} ∩ Γ(vj) | = |K≦i∩Γ(vi)∪{vi}| ? Γ(vj) ∩ K ≦i ∩Γ(vi) ∪ {vi} K≦i∩Γ(vi)∪{vi} Γ(vj) Condition (2) can be checked in the same way Checked in O( |V|2.368 ) time ⇒ time complexity is O( |V|2.368 ) for each

12 Sparse Cases ・ If vi is adjacent to no vertex in K
 K[i] = C ( K≦i ∩ Γ(vi) ∪ {vi} ) = C ({vi})  parent of K[i] = C ( C ({vi}) ≦i ) If C ({vi}) ≦i =φ, parent of K[i] is K0 If C ({vi}) ≦i ≠φ, (1) is not satisfied  If K ≠ K0, K[i] is not a child of K ・ Since |K|≦Δ+1 , at most Δ(Δ+1) vertices are adjacent to K ・ Each K[i] takes O(Δ2) time to construct the parent Δ: max. degree O((Δ*)4 + |Θ|3 ) if partially dense Δ*: max. degree in V\Θ O(Δ4 ) per maximal clique

13 Bipartite Clique ・ Enumerate maximal bipartite cliques in G =(V1 ∪V2 ,E ) ( = maximal cliques in G’ = (V1 ∪V2 , E ∪V1 ×V1 ∪V2×V2 ))  enumerated in O( |V|2.368 ) time for each ・ But a sparse bipartite graph will be dense  need some improvements for sparse cases V1 V2

14 Fast Construction of K[i]
・ For any maximal bipartite clique K K ∩V2 = ∩v∈K ∩V1 Γ(v) K ∩V1 = ∩v∈K ∩V2 Γ(v) ・ K[i]∩V1 for all i are computed in O(Δ2) time ・ K[i] for all i are computed in O(Δ3) time K[v1] K[v6] v1 v2 v5 v6 Γ(1) Γ(2) Γ(3) Γ(4) K[i] V1 1 2 3 4 vi V2

15 Enumerated in O(Δ3) time for each
Checking the Parent ・ Put small indices to V1 , large indices to V2  K[i] is a child of K ⇔ K[i]≦i = K≦i  checked in O(Δ) time ・・・ V1 1 2 3 |V1|-1 |V1| V2 ・・・ |V1|+1 |V1|+2 K[i] V1 vi V2 Enumerated in O(Δ3) time for each O(Δ2) by using memory

16 Computational Experiments
・ for graphs randomly generated ・ vertex vi is connected to vertices from i-r to i+r with probability 1/2 ・ Faster than Tsukiyama’s algorithm ・ Computation time is linear in maximum degree

17 Benchmark Problems ・ Problem of finding frequent closed item sets from database  equivalent to maximal bipartite clique enumeration ・ Used on KDDcup (data mining algorithm competition ) BMS-WebView1  (from Web-log data)     |V|= 60,000, ave. degree 2.5 BMS-WebView2 (from Web-log data)     |V|= 80,000, ave. degree 5 BMS-POS  (from POS data)    |V|= 510,000, ave. degree 6 IBM-Artificial  (artificial data)    |V|= 100,000 , ave.degree 10

18 Results

19 Conclusion and Future Work
・ Proposed fast algorithms for enumerating maximal cliques: O(|V|2.376), O(Δ4 ), O((Δ*)4 + θ3 ) maximal bipartite cliques: O(|V|2.376), O(Δ3 ), O(Δ2) ・ Examined benchmark problems of data mining, and showed that our algorithm performs well. Future work: ・ Can we improve more? What is the difficulty ? ・ Can we enumerate other maximal (minimal) graph objects ? ・ Can we apply matrix multiplication to other enumeration problems ? ・ What can be enumerated efficiently in practice ?

20 Frequent Sets Input graph: An item and a customer is connected
iff the customer purchased the item In a maximal bipartite clique: Customers: have similar favorites Items: frequently purchased together [Agrawal et al. 96, Zaki et al. 02, Pei 00, Han 00, … ] customer1 customer2 customer3 customer4 beer nappy milk

21 Few Large Degree Vertices
・ Very few vertices (denoted by Θ) have large degrees ・ Divide the maximal cliques into two groups: (a) cliques not included in Θ (b) cliques included in Θ ・ (a) can be enumerated in O(Δ’4) time ・ Maximal clique K in the induced graph by Θ is a maximal clique of G ⇔ K is not included in any of (a)  O(|Θ|3) time for each small degree < Δ’ large degree O(Δ’4 + |Θ|3 ) per maximal clique

22 Avoid Duplications by Using Memory
・We can avoid duplications by storing all maximal bipartite cliques ・ From K ∩V1 =Γ(K ∩V2) , we store all K ∩V1 1. Get a K from memory (which is un-operated) 2. generate all K[i]∩V1 3. Store each K[i]∩V1 if it is not in memory 4. Go to 1 if a maximal clique is un-operated Enumerated in O(Δ2) time for each


Download ppt "New Algorithms for Enumerating All Maximal Cliques"

Similar presentations


Ads by Google