Download presentation
Presentation is loading. Please wait.
1
The Community-search Problem and How to Plan a Successful Cocktail Party
Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan, Jialiang Bao, Ge Wang
2
Introduction Graphs are one of the most popular data representation
Have a wide range of applications Communities and social networks as graphs have gained attention People represented as nodes Connection between people are edges This paper focuses on the query-dependent variant of the community search problem
3
Planning a Cocktail Party
Participants should be “close” to the organizers (e.g. a friend of a friend). Everybody should know some of the participants. The graph should be connected. The number of participants should not be too small Not too large either This is difficult Bob Alice Charlie David
4
Community Search Problem
Need to find the community that a given set of users belongs to. Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input.
5
Related Work Connectivity Subgraphs Community Detection
Work has been done to find a subgraph that connects as set of query nodes Not enough Need to extract best community that query nodes define Community Detection Finding communities in large graphs and social networks Typical approach looks at optimizing modularity measure Problem is most methods consider static community detection problem
6
Related Work Team Formation Lappas et. al studied this problem
Given a network where nodes are labeled with a set of skills Find subgraph in which all skills are present and communication cost is small A variant of this problem is present for cocktail party planning
7
Problem definition Problem 1:
Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected f(H) is maximized among all feasible choices of H (the large the better)
8
Query node and goodness function?
Problem 1: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected f(H) is maximized among all feasible choices of H (the large the better) What is query node? They are the nodes that form the community. What is goodness function? It is to define the dense degree. Average degree Minimum degree
9
Why not choose Average degree function?
Lead to unintuitive result Easy to add unrelated but dense part
10
Problem definition Problem 2: We have distance constraint now.
Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected DQ(H) <= d f(H) is maximized among all feasible choices of H (the larger the better) We have distance constraint now.
11
Maximizing the minimum degree
Greedy algorithm: Steps: Set G0 = G, Delete the minimum degree node and all its edges, go to 2 Termination condition: Either: At least one of the query nodes Q has minimum degree The Query node Q is no longer connected
12
Time complexity? Greedy can be implemented in linear time. Idea:
Make separate lists of nodes with degree d, for d = 1, …, n When Remove a node u from G, a neighbor of u with degree d will be remove from list d to list d – 1. So total amount of moves is O(m) (m is the edge ) We can locate the min node in O(1) time, so running time is O(n + m)
13
Generalization to monotone functions
Minimum degree function is actually a member of this family of functions. But sometimes we want some other functions to define the node density.
14
Problem definition Problem 3: We have node monotone function now.
Given an undirected(connected) graph G(V,E), a set of query nodes Q, a node monotone function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected DQ(H) <= d f(H) is maximized among all feasible choices of H (the larger the better) We have node monotone function now.
15
Greedy Gen Greedy algorithm: Termination condition: Steps:
Set G0 = G, Delete the minimum degree node Delete the node which f(G,V) is minimum, and all its edges, go to 3 Termination condition: Either: At least one of the query nodes Q has the minimum f(G,v) The Query node Q is no longer connected
16
Communities with Size Restriction
Drawback of previous algorithm They may return subgraphs with very large size.
17
Complexity Formal definition of minimum degree with upper bound on the size An integer k (size constraint) Subgraph H has at most k nodes NP-hard
18
Algorithm Two heuristics that can be used to find communities with bounded size Inspired the Greedy algorithm for maximizing the minimum degree GreedyDist, GreedyFast
19
Algorithm GreedyDist The tighter the distance constraint is, the smaller communities are
20
Algorithm GreedyDist Invoke GreedyGen
If the query nodes are connected but the size constraint is not satisfied, re-execute GreedyGen with a tighter distance constraint Repeat until the size constraint is satisfied or the query nodes are disconnected
21
Algorithm GreedyFast Preprocess: the input graph is restricted to k’ closest nodes to the query nodes Execute Greedy on the restricted graph The closer a node is to the query nodes, the more related the node is to the query nodes, the more likely it is to belong to their community
22
Experiment Evaluation
DBLP A coauthorship graph extracted from a recent snapshot of the DBLP database 226K nodes, 1.4M edges Tag A tag graph extracted from the flickr photo-sharing portal 38K nodes, 1.3M edges BIOMINE A graph extracted from the database of the Biomine project 16K nodes, 491K edges
23
Quantitative Results BASELINE: a simple and natural baseline algorithm
|Q|: the number of query nodes d: distance bound k: size bound l: inter-distance between query nodes
24
Quantitative Results
26
Conclusion Aim to find the compact community that contains the given query nodes and it is densely connected Measurement based on constraints Minimum degree Distance Size Heuristics GreedyGen GreedyDist GreedyFast
27
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.