New Algorithms for Enumerating All Maximal Cliques

Slides:

Advertisements

Similar presentations

Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.

Advertisements

Lauritzen-Spiegelhalter Algorithm

More Efficient Generation of Plane Triangulations Shin-ichi Nakano Takeaki Uno Gunma University National Institute of JAPAN Informatics, JAPAN 23/Sep/2003.

Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.

Theory of Computing Lecture 16 MAS 714 Hartmut Klauck.

Lecture 17 Path Algebra Matrix multiplication of adjacency matrices of directed graphs give important information about the graphs. Manipulating these.

CSE 2331/5331 Topic 11: Basic Graph Alg. Representations Undirected graph Directed graph Topological sort.

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo.

5-1 Chapter 5 Tree Searching Strategies. 5-2 Satisfiability problem Tree representation of 8 assignments. If there are n variables x 1, x 2, …,x n, then.

On the Enumeration of Bipartite Minimum Edge Colorings Yasuko Matsui (Tokai Univ. JAPAN) Takeaki Uno (National Institute of Informatics, JAPAN)

Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.

1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.

Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.

Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.

CS 410 Applied Algorithms Applied Algorithms Lecture #3 Data Structures.

An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.

1 Data Structures and Algorithms Graphs I: Representation and Search Gal A. Kaminka Computer Science Department.

COMP171 Depth-First Search.

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.

2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.

1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.

Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.

Important Problem Types and Fundamental Data Structures

Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.

Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.

1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.

FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,

Chapter 9 – Graphs A graph G=(V,E) – vertices and edges

Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.

A Fast Algorithm for Enumerating Bipartite Perfect Matchings Takeaki Uno (National Institute of Informatics, JAPAN)

Simple and Improved Parameterized Algorithms for Multiterminal Cuts Mingyu Xiao The Chinese University of Hong Kong Hong Kong SAR, CHINA CSR 2008 Presentation,

Computer Science 112 Fundamentals of Programming II Introduction to Graphs.

Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration May/25/2008 PAKDD 2008 Takeaki Uno (1), Hiroki Arimura (2) (1) National Institute of.

Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,

Combinatorial Algorithms Reference Text: Kreher and Stinson.

An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,

Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.

1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)

NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.

NP-Complete problems.

Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)

Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.

Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.

GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,

Output Sensitive Algorithm for Finding Similar Objects Jul/2/2007 Combinatorial Algorithms Day Takeaki Uno Takeaki Uno National Institute of Informatics,

Detailed Description of an Algorithm for Enumeration of Maximal Frequent Sets with Irredundant Dualization I rredundant B order E numerator Takeaki Uno.

Chapter 13 Backtracking Introduction The 3-coloring problem

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

Fast Algorithms for BIG DATA (title means “I make slides according to the interests of audience ) 14/Jan/2012 NII Shonan-meeting (open problem seminar)

Proof of correctness of Dijkstra’s algorithm: Basically, we need to prove two claims. (1)Let S be the set of vertices for which the shortest path from.

Approach to Data Mining from Algorithm and Computation Takeaki Uno, ETH Switzerland, NII Japan Hiroki Arimura, Hokkaido University, Japan.

1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.

Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.

Introduction to Algorithms

CONNECTED-COMPONENTS ALGORITHMS FOR MESH-CONNECTED PARALLEL COMPUTERS

Lecture 11 Graph Algorithms

The Taxi Scheduling Problem

Graph Algorithms Using Depth First Search

T. C. van Dijk1, J.-H. Haunert2, J. Oehrlein2 1University of Würzburg

Subtree Isomorphism in O(n2.5)

Output Sensitive Enumeration

Important Problem Types and Fundamental Data Structures

Lecture 10 Graph Algorithms

Presentation transcript:

New Algorithms for Enumerating All Maximal Cliques Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN Informatics, JAPAN 9/Jul/2004 SWAT 2004

Background ・ There are still many unsolved nice problems Recently, Enumeration algorithms are interesting ・ There are still many unsolved nice problems (unlike to ordinal discrete algorithms) ・ Recent increase of computer power makes many enumeration problems practically solvable  many applications have been appearing, such as, genome, data mining, clustering, so on ・ Some (theoretical) algorithms use enumeration as subroutines (recognition of perfect graph)

5000 researchers use enumeration algorithms ????? Background (cont.) ・ My institute has 100 researchers of informatics ・ At least 5 researchers (independently) use implementations of enumeration algorithms ・ Suppose that there are 100,000 researchers of informatics in the world 5000 researchers use enumeration algorithms ?????

Problems and Results Problem1 : for a given graph G=(V, E), enumerate all maximal cliques in G Problem2 : for a given bipartite graph G=(V1∪V2, E), enumerate all maximal bipartite cliques in G ( Problem2 is a special case of Problem1 ) ・ We propose algorithms for solving these problems, reduce the time complexity in dense cases and sparse cases. ・ Computational experiments for random graphs and real-world data

Difficulty ・ Consider branch-and-bound type enumeration: divide maximal cliques into two groups maximal cliques including v / not including v ・ If a group includes no maximal clique,  cut off the branch  Finding a maximal clique not including given vertices of S is NP-Complete  Can not cut off subproblems(branches) including no maximal clique v1∈K v1∈K v2∈K v2∈K

Existing Studies and Ours O(|V||E|): Tsukiyama, Ide, Ariyoshi & Shirakawa, O(|V||E|), lexicographic order: Johnson, Yanakakis & Papadimitriou O(a(G)|E|): Chiba & Nishizeki ( a(G): arboricity of G with m/(n-1) ≦ a(G) ≦m1/2 ) ・ many heuristic algorithms in data mining, for bipartite case Ours: O(|V|2.376) (dense case) O(Δ4) (sparse case) O((Δ*)4 + θ3 ) (θ vertices have degree > Δ* ) O(Δ3) (bipartite case) O(Δ2) (bipartite case with using much memory)

Enumeration of Maximal Cliques ・ Improved version of algorithm of Tsukiyama et. al. Idea: Construct a route on all maximal cliques to be traversed ・ For a maximal clique K of G = ( V, E ) : C (K) : lexicographically maximum maximal clique including K K≦i : vertices of K with indices ≦ i i(K) : minimum index s.t. C(K≦i) = C(K≦i+1) parent of a maximal clique K : C(K≦i(K)-1) ・ parent is lexicographically larger than K 3 4 6 7 9 9 Lexicographically larger 4 1 11 7 8 10 1,2,3 > 1,2,4 3 10 1,3,6 > 1,4,5 2 K 6 8 i(K) 5

Graph Representation of Relation ・ Parent-child relation is acyclic 　  graph representation forms a tree (enumeration tree) Visit all maximal cliques by depth-first search ・ need to find children of a maximal clique

Child of Maximal Clique Γ(vi) : vertices adjacent to vi K[i] = C ( K≦i ∩ Γ(vi) ∪ {vi} ) ・ H is a child of K only if H = K[i] for some i>i(K) (H is a child of K if the parent of K[i] is K ) ・ i(K[i]) = i ・construct K[i] in O(|E|) time ・construct parent in O(|E|) time ( O(Δ2 ) time) ・for i=i(K)+1,…,|V| in O(|V||E|) time  enumerate O(|V||E|) time per maximal clique K,i(K)=6 4 9 9 4 1 11 7 K[8] 10 8 3 10 2 6 8 5

Characterization of Child The parent of K[i] = K ⇔ (1) no vj , j<i is adjacent to all vertices in K≦i ∩Γ(vi) ∪ {vi} (2) no vj , j<i is adjacent to all vertices in K≦i∩Γ(vi) ∪ K≦j (1) is not satisfied ⇔ K[i] and parent of K[i] includes vj∈K (2) is not satisfied ⇔ parent of K[i] includes vj∈K 5 K = {3,4,7,9} K[10] = {3,7,10} K≦5 = {3,4} K ≦7∩Γ(v10) = {3,7} 7 1 4 K≦5∪ 4 9 3 10 K ≦10∩Γ(v10) ∪ {v10}

Use of Matrix Multiplication ・ Check the conditions (1) and (2) by matrix multiplication (1) no vj , j<i is adjacent to all vertices in K ≦i ∩Γ(vi) ∪ {vi} ith row of left ⇒ K≦i∩Γ(vi)∪{vi} jth column of right ⇒ Γ(vj) ij cell of product ⇒ | K≦i∩Γ(vi)∪{vi} ∩ Γ(vj) | = |K≦i∩Γ(vi)∪{vi}| ? Γ(vj) ∩ K ≦i ∩Γ(vi) ∪ {vi} K≦i∩Γ(vi)∪{vi} Γ(vj) Condition (2) can be checked in the same way Checked in O( |V|2.368 ) time ⇒ time complexity is O( |V|2.368 ) for each

Sparse Cases ・ If vi is adjacent to no vertex in K  K[i] = C ( K≦i ∩ Γ(vi) ∪ {vi} ) = C ({vi})  parent of K[i] = C ( C ({vi}) ≦i ) If C ({vi}) ≦i ＝φ, parent of K[i] is K0 If C ({vi}) ≦i ≠φ, (1) is not satisfied  If K ≠ K0, K[i] is not a child of K ・ Since |K|≦Δ+1 , at most Δ(Δ+1) vertices are adjacent to K ・ Each K[i] takes O(Δ2) time to construct the parent Δ: max. degree O((Δ*)4 + |Θ|3 ) if partially dense Δ*: max. degree in V＼Θ O(Δ4 ) per maximal clique

Bipartite Clique ・ Enumerate maximal bipartite cliques in G =(V1 ∪V2 ,E ) ( = maximal cliques in G’ = (V1 ∪V2 , E ∪V1 ×V1 ∪V2×V2 ))  enumerated in O( |V|2.368 ) time for each ・ But a sparse bipartite graph will be dense  need some improvements for sparse cases V1 V2

Fast Construction of K[i] ・ For any maximal bipartite clique K K ∩V2 = ∩v∈K ∩V1 Γ(v) K ∩V1 = ∩v∈K ∩V2 Γ(v) ・ K[i]∩V1 for all i are computed in O(Δ2) time ・ K[i] for all i are computed in O(Δ3) time K[v1] K[v6] v1 v2 v5 v6 Γ(1) Γ(2) Γ(3) Γ(4) K[i] V1 1 2 3 4 vi V2

Enumerated in O(Δ3) time for each Checking the Parent ・ Put small indices to V1 , large indices to V2  K[i] is a child of K ⇔ K[i]≦i = K≦i  checked in O(Δ) time ・・・ V1 1 2 3 |V1|-1 |V1| V2 ・・・ |V1|+1 |V1|+2 K[i] V1 vi V2 Enumerated in O(Δ3) time for each O(Δ2) by using memory

Computational Experiments ・ for graphs randomly generated ・ vertex vi is connected to vertices from i-r to i+r with probability 1/2 ・ Faster than Tsukiyama’s algorithm ・ Computation time is linear in maximum degree

Benchmark Problems ・ Problem of finding frequent closed item sets from database  equivalent to maximal bipartite clique enumeration ・ Used on KDDcup (data mining algorithm competition ) BMS-WebView1　 (from Web-log data) 　　　 |V|= 60,000, ave. degree 2.5 BMS-WebView2　(from Web-log data) 　　　 |V|= 80,000, ave. degree 5 BMS-POS 　(from POS data) 　　　|V|= 510,000, ave. degree 6 IBM-Artificial　 (artificial data) 　　　|V|= 100,000 , ave.degree 10

Results

Conclusion and Future Work ・ Proposed fast algorithms for enumerating maximal cliques: O(|V|2.376), O(Δ4 ), O((Δ*)4 + θ3 ) maximal bipartite cliques: O(|V|2.376), O(Δ3 ), O(Δ2) ・ Examined benchmark problems of data mining, and showed that our algorithm performs well. Future work: ・ Can we improve more? What is the difficulty ? ・ Can we enumerate other maximal (minimal) graph objects ? ・ Can we apply matrix multiplication to other enumeration problems ? ・ What can be enumerated efficiently in practice ?

Frequent Sets Input graph: An item and a customer is connected iff the customer purchased the item In a maximal bipartite clique: Customers: have similar favorites Items: frequently purchased together [Agrawal et al. 96, Zaki et al. 02, Pei 00, Han 00, … ] customer1 customer2 customer3 customer4 beer nappy milk

Few Large Degree Vertices ・ Very few vertices (denoted by Θ) have large degrees ・ Divide the maximal cliques into two groups: (a) cliques not included in Θ (b) cliques included in Θ ・ (a) can be enumerated in O(Δ’4) time ・ Maximal clique K in the induced graph by Θ is a maximal clique of G ⇔ K is not included in any of (a)  O(|Θ|3) time for each small degree < Δ’ large degree O(Δ’4 + |Θ|3 ) per maximal clique

Avoid Duplications by Using Memory ・We can avoid duplications by storing all maximal bipartite cliques ・ From K ∩V1 =Γ(K ∩V2) , we store all K ∩V1 1. Get a K from memory (which is un-operated) 2. generate all K[i]∩V1 3. Store each K[i]∩V1 if it is not in memory 4. Go to 1 if a maximal clique is un-operated Enumerated in O(Δ2) time for each