The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Slides:



Advertisements
Similar presentations
Combinatorial Auction
Advertisements

Network Design with Degree Constraints Guy Kortsarz Joint work with Rohit Khandekar and Zeev Nutov.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Charalampos (Babis) E. Tsourakakis KDD 2013 KDD'131.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Mauro Sozio and Aristides Gionis Presented By:
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Maximizing the Spread of Influence through a Social Network
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Combinatorial Algorithms
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Approximating Maximum Edge Coloring in Multigraphs
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Recent Development on Elimination Ordering Group 1.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
An Approximation Algorithm for Requirement cut on graphs Viswanath Nagarajan Joint work with R. Ravi.
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
Maximal Independent Set Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Mining Graphs with Constrains on Symmetry and Diameter Natalia Vanetik Deutsche Telecom Laboratories at Ben-Gurion University IWGD10 workshop July 14th,
Hardness Results for Problems
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
CSV: Visualizing and Mining Cohesive Subgraphs Nan Wang Srinivasan Parthasarathy Kian-Lee Tan Anthony K. H. Tung School of Computing National University.
Fixed Parameter Complexity Algorithms and Networks.
Efficient Gathering of Correlated Data in Sensor Networks
Network Aware Resource Allocation in Distributed Clouds.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Graph limit theory: Algorithms László Lovász Eötvös Loránd University, Budapest May
Advanced Algorithm Design and Analysis (Lecture 13) SW5 fall 2004 Simonas Šaltenis E1-215b
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Approximation Algorithms
1 Steiner Tree Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Testing the independence number of hypergraphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
The full Steiner tree problem Theoretical Computer Science 306 (2003) C. L. Lu, C. Y. Tang, R. C. T. Lee Reporter: Cheng-Chung Li 2004/06/28.
Vasilis Syrgkanis Cornell University
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Confidential & Proprietary – All Rights Reserved Internal Distribution, October Quality of Service in Multimedia Distribution G. Calinescu (Illinois.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Approximating the MST Weight in Sublinear Time
CS4234 Optimiz(s)ation Algorithms
ICS 353: Design and Analysis of Algorithms
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Conflict-Aware Event-Participant Arrangement
Coverage Approximation Algorithms
Introduction Wireless Ad-Hoc Network
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona

SIGKDD 2010 July 27th, Washington D.C. Planning a cocktail party C. Papadimitriou P. Kanellakis S. Abiteboul

SIGKDD 2010 July 27th, Washington D.C. Planning a cocktail party C. Papadimitriou P. Kanellakis S. Abiteboul J. Ullman

SIGKDD 2010 July 27th, Washington D.C. Planning a cocktail party C. Papadimitriou P. Kanellakis S. Abiteboul J. Ullman Recipe for a successful party: Participants should be “close” to the organizers (e.g. a friend of a friend). Everybody should know some of the participants. The graph should be connected. The number of participants should not be too small but… …not too large either!!! …. Not an easy task…

SIGKDD 2010 July 27th, Washington D.C. The community-search problem Our problem: find the community that a given set of users belongs to. Our approach: Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input.

SIGKDD 2010 July 27th, Washington D.C. The community-search problem Our problem: find the community that a given set of users belongs to. Our approach: Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input.

SIGKDD 2010 July 27th, Washington D.C. The community-search problem Our problem: find the community that a given set of users belongs to. Our approach: Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input.

SIGKDD 2010 July 27th, Washington D.C. The community-search problem Our problem: find the community that a given set of users belongs to. Our approach: Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input. Other applications: Tag suggestions, biological data.

SIGKDD 2010 July 27th, Washington D.C. Tag suggestion in Flickr Sugg.: Mountains Nature Landscape Tags : Dolomites Lake

SIGKDD 2010 July 27th, Washington D.C. Protein interactions

SIGKDD 2010 July 27th, Washington D.C. Protein interactions

SIGKDD 2010 July 27th, Washington D.C. Outline Introduction Related work Problem Definition Our Algorithms Greedy GreedyDist and GreedyFast Evaluation Generalization to Monotone Functions Conclusions and Future Work

SIGKDD 2010 July 27th, Washington D.C. Related Work Large body of work on finding communities in social networks: Agarwal and Kempe (European Physics Journal, 2008) S. White and P. Smyth. (SDM, 2005) Y. Dourisboure et al. (WWW, 2007) D. Gibson, R. Kumar, and A. Tomkins (VLDB, 2005) Our work: Query-dependent variant of the problem. Other related work: Y. Koren, S. C. North, and C. Volinsky (TKDD, 2007) H. Tong and C. Faloutsos (KDD, 2006) Lappas et al. (KDD, 2009) FOCS, ICALP, APPROX

SIGKDD 2010 July 27th, Washington D.C. Outline Introduction Related work Problem Definition Our Algorithms Greedy GreedyDist and GreedyFast Evaluation Generalization to Monotone Functions Conclusions and Future Work

SIGKDD 2010 July 27th, Washington D.C. Density? Good properties: small distance, density, connected subgraph Two definitions of density of a graph d(G)=# of edges in G / # of edges in a clique Formally, D(G)=# of edges in G / # of vertices in G Formally, Fact 1: Computing a subgraph H with maximum density d(H) is NP- Hard (reduction from Max Clique). Fact 2: Computing a subgraph H with maximum density D(H) can be done in polynomial time but the algorithm is slow. Another definition of density: minimum node degree. average degree of G / 2

SIGKDD 2010 July 27th, Washington D.C. Density? Good properties: small distance, density, connected subgraph Two definitions of density of a graph d(G)=# of edges in G / # of edges in a clique Formally, D(G)=# of edges in G / # of vertices in G Formally, Fact 1: Computing a subgraph H with maximum density d(H) is NP- Hard (reduction from Max Clique). Fact 2: Computing a subgraph H with maximum density D(H) can be done in polynomial time but the algorithm is slow. Another definition of density: minimum node degree. average degree of G / 2

SIGKDD 2010 July 27th, Washington D.C. Problem definition Problem definition: Given an undirected graph G= (V,E), a set of query nodes Q V, an integer d (distance constraint), we are to find an induced subgraph H = (V H,E H ) of G, s.t. (i) V H contains Q; (ii) H is connected; (iii) all nodes in H are at distance at most d from Q; (iv) the minimum degree of H is maximized.

SIGKDD 2010 July 27th, Washington D.C. Problem definition Problem definition: Given an undirected graph G= (V,E), a set of query nodes Q V, an integer d (distance constraint), we are to find an induced subgraph H = (V H,E H ) of G, s.t. (i) V H contains Q; (ii) H is connected; (iii) all nodes in H are at distance at most d from Q; (iv) the minimum degree of H is maximized. Good news: There is an optimum greedy algorithm!!!

SIGKDD 2010 July 27th, Washington D.C. Outline Introduction Related work Problem Definition Our Algorithms Greedy GreedyDist and GreedyFast Evaluation Generalization to Monotone Functions Conclusions and Future Work

SIGKDD 2010 July 27th, Washington D.C. Our greedy algorithm 1. Let G=G At each step t if there is a node v in G t-1 violating the distance constraint, then remove v and all its edges; 3. otherwise remove the node with minimum degree in G t Let G t the graph so obtained. 5. Among all the graphs G 0,G 1,….G T constructed during the execution of the algorithm return the graph G i containing the query nodes; satisfying the distance constraint; with maximum minimum degree.

SIGKDD 2010 July 27th, Washington D.C. Our greedy algorithm 1. Let G=G At each step t if there is a node v in G t-1 violating the distance constraint, then remove v and all its edges; 3. otherwise remove the node with minimum degree in G t Let G t the graph so obtained. 5. Among all the graphs G 0,G 1,….G T constructed during the execution of the algorithm return the graph G i containing the query nodes; satisfying the distance constraint; with maximum minimum degree. Theorem: Our greedy algorithm computes an optimum solution for the community-search problem.

SIGKDD 2010 July 27th, Washington D.C. Size Matters! The size of the community shouldn’t be too large: If we are to organize a party we might not have place for 1M people. Humans should be able to analyze the result. Bad news: Adding a cardinality constraint on the number of nodes makes the problem NP-Hard (red. from Steiner Tree) but... Theorem: Let H and H’ be two graphs obtained by executing our greedy algorithm with distance constraint d and d’, respectively (the other input parameters are the same). Then, d’ ≤ d implies |V(H’)| ≤ |V(H)|.

SIGKDD 2010 July 27th, Washington D.C. GreedyDist Intuition: Bound the size of the graph by making the distance constraint tighter. GreedyDist: Let k be an upperbound on the number of vertices and let d be a distance constraint. While the number of vertices of the computed graph is larger than k Execute Greedy with distance constraint d. Decrease d by one (d--);

SIGKDD 2010 July 27th, Washington D.C. GreedyFast Intuition: Nodes that are far away from the query nodes are most probably not related to them. GreedyFast: Let k be an upperbound on the number of vertices and let d be a distance constraint. Preprocessing: consider only the k closest nodes to the query nodes. Run Greedy with the subgraph induced by these query nodes, as input

SIGKDD 2010 July 27th, Washington D.C. Outline Introduction Related work Problem Definition Our Algorithms Greedy GreedyDist and GreedyFast Evaluation Generalization to Monotone Functions Conclusions and Future Work

SIGKDD 2010 July 27th, Washington D.C. Evaluation We evaluate our algorithms on three different datasets: DBLP (226k nodes and 1.4M edges); Flickr tag graph (38k nodes and 1.3M edges); Bio data (16K nodes and 491k nodes). Queries are generated randomly. We vary Number of query nodes; Distance between query nodes; Upper bound on the number of nodes. We measure Minimum degree and average degree; Size of the output graph; Running time.

SIGKDD 2010 July 27th, Washington D.C. Baseline We consider an approach where at each step we add one node (in contrast with all previous approaches). A pseudocode: 1. Connect the query nodes: by means of a Steiner Tree algo. (we use a 2-approximation algorithm for this problem); 2. Let G t be the graph at step t; 3. Add the node v with maximum degree in ; 4. Among all the graph G 0,…,G T constructed, return the one with maximum minimum degree.

SIGKDD 2010 July 27th, Washington D.C. Minimum degree vs Size (Flickr)

SIGKDD 2010 July 27th, Washington D.C. Average deg. vs. Size (Flickr)

SIGKDD 2010 July 27th, Washington D.C. Running time vs Size (Flickr)

SIGKDD 2010 July 27th, Washington D.C. Distance vs Size Theorem: Let H and H’ be two graphs obtained by executing our greedy algorithm with distance constraint d and d’, respectively (the other input parameters are the same). Then, d’ ≤ d implies |V(H’)| ≤ |V(H)|.

SIGKDD 2010 July 27th, Washington D.C. Outline Introduction Related work Problem Definition Our Algorithms Greedy GreedyDist and GreedyFast Evaluation Generalization to Monotone Functions Conclusions and Future Work

SIGKDD 2010 July 27th, Washington D.C. Generalized Community-Search Problem Input: An undirected graph G=(V,E); A set Q of query nodes; Integer parameters k,t; A set of skills T v associated to every node v; A required set of skills. Goal: Find an induced subgraph H of G s.t. G is connected and contains Q; The number of vertices of H is ≥ t; The set of skills of H contains ( ); Any node is at distance at most k from the query nodes; The minimum degree is maximized.

SIGKDD 2010 July 27th, Washington D.C. Generalized Community-Search Problem Input: An undirected graph G=(V,E); A set Q of query nodes; Integer parameters k,t; A set of skills T v associated to every node v; A required set of skills. Goal: Find an induced subgraph H of G s.t. G is connected and contains Q; The number of vertices of H is ≥ t; The set of skills of H contains ( ); Any node is at distance at most k from the query nodes; The minimum degree is maximized. Monotone functions

SIGKDD 2010 July 27th, Washington D.C. Generalized Greedy: Guarantees Monotone function: f(H) ≤ f(G), if H is a subgraph of G. Theorem: There is an optimum greedy algorithm for the problem when all constraint are monotone functions. Running time: Depends on the time to evaluate the function f 1,…,f k, formally where T i is the time to evaluate the monotone function f i

SIGKDD 2010 July 27th, Washington D.C. Outline Introduction Related work Problem Definition Our Algorithms Greedy GreedyDist and GreedyFast Evaluation Generalization to Monotone Functions Conclusions and Future Work

SIGKDD 2010 July 27th, Washington D.C. Conclusions and Future Work Contributions: We proposed a novel combinatorial approach for finding the community of a given set of users in input. Distance constraints proved to be effective in limiting the size of the output graph. We defined a class of functions that can be optimized efficiently. Future work: Are there other useful monotone functions? Can we find all communities of a given set of users? Community search via Map-Reduce?

SIGKDD 2010 July 27th, Washington D.C. What about the party? Community 1: Database Community2: Algorithms

Thanks