An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.

Slides:



Advertisements
Similar presentations
An Introduction to Artificial Intelligence
Advertisements

More Efficient Generation of Plane Triangulations Shin-ichi Nakano Takeaki Uno Gunma University National Institute of JAPAN Informatics, JAPAN 23/Sep/2003.
Heuristics for the Hidden Clique Problem Robert Krauthgamer (IBM Almaden) Joint work with Uri Feige (Weizmann)
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Approximating Maximum Subgraphs Without Short Cycles Guy Kortsarz Join work with Michael Langberg and Zeev Nutov.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Theory of Computing Lecture 16 MAS 714 Hartmut Klauck.
Movie theatre service on brightness and volume range leading to maximum clique graph By, Usha Kavirayani.
LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Testing of ‘massively parametrized problems’ - Ilan Newman Haifa University Based on joint work with: Sourav Chakraborty, Eldar Fischer, Shirley Halevi,
Computational problems, algorithms, runtime, hardness
Randomized Algorithms and Randomized Rounding Lecture 21: April 13 G n 2 leaves
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
Approximation Algorithms
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
CSE 589 Applied Algorithms Spring Colorability Branch and Bound.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Graph Algorithms: Minimum.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
A Fast Algorithm for Enumerating Bipartite Perfect Matchings Takeaki Uno (National Institute of Informatics, JAPAN)
Fixed Parameter Complexity Algorithms and Networks.
I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration May/25/2008 PAKDD 2008 Takeaki Uno (1), Hiroki Arimura (2) (1) National Institute of.
Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Toward Constant time Enumeration by Amortized Analysis 27/Aug/2015 Lorentz Workshop for Enumeration Algorithms Using Structures Takeaki Uno (National Institute.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
Approximation Algorithms
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining 2/Oct/2007 Discovery Science 2007 Takeaki Uno (National Institute of Informatics)
Testing the independence number of hypergraphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
New Algorithms for Enumerating All Maximal Cliques
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
Output Sensitive Algorithm for Finding Similar Objects Jul/2/2007 Combinatorial Algorithms Day Takeaki Uno Takeaki Uno National Institute of Informatics,
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Fast Algorithms for BIG DATA (title means “I make slides according to the interests of audience ) 14/Jan/2012 NII Shonan-meeting (open problem seminar)
Approach to Data Mining from Algorithm and Computation Takeaki Uno, ETH Switzerland, NII Japan Hiroki Arimura, Hokkaido University, Japan.
NP-Completeness Yin Tat Lee
Graph Algorithms Using Depth First Search
Finding Subgraphs with Maximum Total Density and Limited Overlap
On the effect of randomness on planted 3-coloring models
Output Sensitive Enumeration
NP-Completeness Yin Tat Lee
Output Sensitive Enumeration
Major Design Strategies
Major Design Strategies
Presentation transcript:

An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University for Advanced Studies

Introducing Pseudo Cliques

Analyzing Large Scale Database By rapid growth of database size, we have to analyze databases in some computational way Finding cliques in similarity/relation graphs is a popular way to classify the data, or get characterizations of the data Group of similar or related objects Thanks to good properties such as monotonicity, (maximal) cliques can be enumerated very quickly (up to 1,000,000/sec) ・ ・ Now, we are motivated to find more rich object, dense structures, such as pseudo cliques

Finding Cliques in Graph Clique: a complete subgraph (complete bipartite subgraph  bipartite clique) Group of similar or related objects Often used for finding clusters or groups Graphs in practice are usually sparse but locally dense, scale free, and satisfy small world property Simple Backtacking (Branch-and-Bound) works well, because of the monotone property (polynomial time for each) Practically very fast even for maximal ones (up to 1,000,000/sec)

Def. Pseudo Clique For a vertex set K, the density of K is (#edges connecting vertices in K) (|K|-1)|K| /2 -  - K is a clique  density is 1 -  - K is an independent set  density is 0   if density is high, K is nearly a clique maximum #edges in S We want to solve the problem of enumerating all pseudo cliqus of the given graph pseudo clique  For given θ, K is a pseudo clique  (density of K) ≧ θ ave. ratio of vertices adjacent to a vertex

Existing Results Easy to find one pseudo clique   two connected vertices always form a pseudo clique Finding a pseudo clique of size k is NP-complete   Reducing k-clique problem by setting θ= 1 Approximation algorithms for maximizing the density for size k - - O(|V| 1/3-ε ) approaximation algorithm - - O((n/k) ε ) approx. if optimal solution is dense [Tokuyama el al.] - - PTAS if Ω(n 2 ) edges [Arora et al.] Many heuristic algorithms in data mining, data engineering, natural sciences However, no algorithm for "complete" enumeration

Hardness for Branch-and-Bound A straightforward approach is branch and bound In each iteration, divide the problem into two non-empty problems by the inclusion of a vertex v 1, v 2 v1v1v1v1 v1v1v1v1 The existence of pseudo clique is NP-comp. The existence of pseudo clique is NP-comp.

Proof of the Hardness For given graph G, threshold θ, and vertex set U, the problem of checking the existence of a pseudo clique including U is NP-complete Theorem 1 Proof: reducing the problem of clique of k vertices input graph G=(V,E) input graph G=(V,E) Add 2|V| 2 vertices as U density = |V| 2 -1 |V| 2 only (U + clique) is pseudo clique density increases by increase of pseudo clique size setting εs.t. clique of size at least k induces a pseudo clique |V| 2 -1 |V| 2 θ= +ε

Is This Really Hard? We proved NP-hardness for "very dense graphs"   unclear for middle dense graph   possibility for polynomial time enumeration θ= 1 θ= 0 easy hard ??????????

Polynomial Time Enumeration

Reverse Search Approach Introduce an acyclic parent-child relation on all pseudo cliques Need an algorithm for listing up all children objectsobjects Enumeration by traversing the tree induced by the relation

Parent of Pseudo Clique v*(K) : min. deg. min. index vertex in G[K]  The parent of pseudo clique K  K \ v*(K) K The parent of K = Density of K = ave. degree G[K] / (|K|-1) The parent is the removal of most "sparse" vertex from K, thus is a pseudo clique  The parent is smaller than its child  acyclic relation

Ex. Enumeration Tree threshold =

Finding Children A child is obtained by adding a vertex to the parent deg K (v): #vertices in K adjacent to v (can be maintained in O(Δ) time for vertex addition)  K ∪ v is a child of K  ①  ① K ∪ v is a pseudo clique  lower bound for deg K (v) ②  ② v*(K ∪ v) = v  upper bound for deg K (v) -  - deg K (v) < min. deg. of K  K ∪ v is always a child -  - deg K (v) > min. deg. of K +1  K ∪ v never be a child  deg K (v) = min. deg. of K or +1  next slide…

Detailed Condition S(K): sequence of vertices in K in the order of (degree, index)  v is a child  v is the top of S(K ∪ v) v is child only if v is adjacent to all vertices preceding to v in S(K) For each vertex, find the first "non-adjacent vertex" in S(K) This can be done in O(Δ 2 ) time Computation time for one iteration is O(Δ 2 + log |V|) ( O(Δk + log |V|) if k-degenerate) Computation time for one iteration is O(Δ 2 + log |V|) ( O(Δk + log |V|) if k-degenerate) top of S(K) is v*(K)

Computational Experiments

ImplementationImplementation Code is a simple version - - update |deg K (v i )| at each addition   adding u to K takes O(deg(u)) time - - to find children, v i satisfying ≦≦ θ|K|(|K|+1) - (#edges in K) ≦ | deg K (v i )| ≦ d*(K)+1  =+  O( C d*(K)) = O(|E|) time + O(1) time for each :== C := #vertices v i, | deg K (v i )| = d*(K), d*(K)+1 Seems to be not large for #children

Problem Instances Pentium M 1.1GHz, 256MB memory, Cygwin, C, gcc Test instances are: - - random graphs (make edge with probability p), - - locally dense random graphs (vertex i is adjacent to vertices from i-k to i+k with probability 1/2 - - graphs generated from real-world data (co-author graph)

Random Graphs p= 0.1, #vertices = 200 to 2000, threshold 0.8, 0.9 Computation time linearly increase as ave. degree

Locally Dense Random Graph make edge from a vertex to its neighbors with p=0.5 #vertices 100 to 25600, threshold 0.8, times slower than clique enumeration computation time per one clique does not change 10 times slower than clique enumeration computation time per one clique does not change

Randomly Generated Scale Free Graph Add vertices of degree 10 iteratively, to a clique of 10 vertices Vertices to be connected are chosen according to their current degrees Computation time increases quite slowly

Real-world Instance co-author graph of academic paper database #vertices = 30,000, #edges = 125,000, scale free Computation time for one pseudo clique does not depend on threshold Computation time for one pseudo clique does not depend on threshold

Bottom-widenessBottom-wideness Why good in practice? The algorithm generates several recursive calls   recursion tree expands exponentially by going down   computation time is dominated by the lowest levels  On lower levels, small degree vertices are added  fast! When pseudo cliques are sufficiently large (over 5?) min. degree is small on average   computation time is short on average at lower levels When pseudo cliques are sufficiently large (over 5?) min. degree is small on average   computation time is short on average at lower levels ・・・ Long time Short time

ConclusionConclusion First polynomial delay polynomial space algorithm for enumerating pseudo cliques Hardness result for straight forward branch-and-bound Evaluate practical efficiency by computational experiments Future works: Explain the gap between theory and practice Introduce maximality and their enumeration Apply the technique to other structures (pseudo bla bla bla) (path, tree, bipartite clique, matching …) What is crucial for the compuation (enumeration) of structures with ambiguity