Mauro Sozio and Aristides Gionis Presented By:

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Greedy Algorithms Clayton Andrews 2/26/08. What is an algorithm? “An algorithm is any well-defined computational procedure that takes some value, or set.
Spread of Influence through a Social Network Adapted from :
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
1 Maximum flow sender receiver Capacity constraint Lecture 6: Jan 25.
CS171 Introduction to Computer Science II Graphs Strike Back.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
HCS Clustering Algorithm
Recent Development on Elimination Ordering Group 1.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Maximum Flows Lecture 4: Jan 19. Network transmission Given a directed graph G A source node s A sink node t Goal: To send as much information from s.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July
Lecture 8. Why do we need residual networks? Residual networks allow one to reverse flows if necessary. If we have taken a bad path then residual networks.
Applying Edge Partitioning to SPFD's 1 Applying Edge Partitioning to SPFD’s 219B Project Presentation Trevor Meyerowitz Mentor: Subarna Sinha Professor:
Hardness Results for Problems
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
Fixed Parameter Complexity Algorithms and Networks.
Efficient Gathering of Correlated Data in Sensor Networks
Network Aware Resource Allocation in Distributed Clouds.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Finding dense components in weighted graphs Paul Horn
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
On Graphs Supporting Greedy Forwarding for Directional Wireless Networks W. Si, B. Scholz, G. Mao, R. Boreli, et al. University of Western Sydney National.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Robust Local Community Detection: On Free Rider Effect and Its Elimination 1 Case Western Reserve University Yubao Wu 1, Ruoming Jin 2, Jing Li 1, Xiang.
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Theory of Computing Lecture 12 MAS 714 Hartmut Klauck.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Shuchi Chawla, Carnegie Mellon University Guessing Secrets Efficiently Shuchi Chawla 1/23/2002.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
CS 3343: Analysis of Algorithms
Effective Social Network Quarantine with Minimal Isolation Costs
Randomized Algorithms CS648
Solving the Minimum Labeling Spanning Tree Problem
CSE 373: Data Structures and Algorithms
EE5900 Advanced Embedded System For Smart Infrastructure
Maximum Flow Problems in 2005.
Presentation transcript:

The Community-search Problem and How to Plan a Successful Cocktail Party Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan, Jialiang Bao, Ge Wang

Introduction Graphs are one of the most popular data representation Have a wide range of applications Communities and social networks as graphs have gained attention People represented as nodes Connection between people are edges This paper focuses on the query-dependent variant of the community search problem

Planning a Cocktail Party Participants should be “close” to the organizers (e.g. a friend of a friend). Everybody should know some of the participants. The graph should be connected. The number of participants should not be too small Not too large either This is difficult Bob Alice Charlie David

Community Search Problem Need to find the community that a given set of users belongs to. Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input.

Related Work Connectivity Subgraphs Community Detection Work has been done to find a subgraph that connects as set of query nodes Not enough Need to extract best community that query nodes define Community Detection Finding communities in large graphs and social networks Typical approach looks at optimizing modularity measure Problem is most methods consider static community detection problem

Related Work Team Formation Lappas et. al studied this problem Given a network where nodes are labeled with a set of skills Find subgraph in which all skills are present and communication cost is small A variant of this problem is present for cocktail party planning

Problem definition Problem 1: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected f(H) is maximized among all feasible choices of H (the large the better)

Query node and goodness function? Problem 1: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected f(H) is maximized among all feasible choices of H (the large the better) What is query node? They are the nodes that form the community. What is goodness function? It is to define the dense degree. Average degree Minimum degree

Why not choose Average degree function? Lead to unintuitive result Easy to add unrelated but dense part

Problem definition Problem 2: We have distance constraint now. Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected DQ(H) <= d f(H) is maximized among all feasible choices of H (the larger the better) We have distance constraint now.

Maximizing the minimum degree Greedy algorithm: Steps: Set G0­ = G, Delete the minimum degree node and all its edges, go to 2 Termination condition: Either: At least one of the query nodes Q has minimum degree The Query node Q is no longer connected

Time complexity? Greedy can be implemented in linear time. Idea: Make separate lists of nodes with degree d, for d = 1, …, n When Remove a node u from G, a neighbor of u with degree d will be remove from list d to list d – 1. So total amount of moves is O(m) (m is the edge ) We can locate the min node in O(1) time, so running time is O(n + m)

Generalization to monotone functions Minimum degree function is actually a member of this family of functions. But sometimes we want some other functions to define the node density.

Problem definition Problem 3: We have node monotone function now. Given an undirected(connected) graph G(V,E), a set of query nodes Q, a node monotone function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that: VH contains Q (all query nodes must be included) H is connected DQ(H) <= d f(H) is maximized among all feasible choices of H (the larger the better) We have node monotone function now.

Greedy Gen Greedy algorithm: Termination condition: Steps: Set G0­ = G, Delete the minimum degree node Delete the node which f(G,V) is minimum, and all its edges, go to 3 Termination condition: Either: At least one of the query nodes Q has the minimum f(G,v) The Query node Q is no longer connected

Communities with Size Restriction Drawback of previous algorithm They may return subgraphs with very large size.

Complexity Formal definition of minimum degree with upper bound on the size An integer k (size constraint) Subgraph H has at most k nodes NP-hard

Algorithm Two heuristics that can be used to find communities with bounded size Inspired the Greedy algorithm for maximizing the minimum degree GreedyDist, GreedyFast

Algorithm GreedyDist The tighter the distance constraint is, the smaller communities are

Algorithm GreedyDist Invoke GreedyGen If the query nodes are connected but the size constraint is not satisfied, re-execute GreedyGen with a tighter distance constraint Repeat until the size constraint is satisfied or the query nodes are disconnected

Algorithm GreedyFast Preprocess: the input graph is restricted to k’ closest nodes to the query nodes Execute Greedy on the restricted graph The closer a node is to the query nodes, the more related the node is to the query nodes, the more likely it is to belong to their community

Experiment Evaluation DBLP A coauthorship graph extracted from a recent snapshot of the DBLP database 226K nodes, 1.4M edges Tag A tag graph extracted from the flickr photo-sharing portal 38K nodes, 1.3M edges BIOMINE A graph extracted from the database of the Biomine project 16K nodes, 491K edges

Quantitative Results BASELINE: a simple and natural baseline algorithm |Q|: the number of query nodes d: distance bound k: size bound l: inter-distance between query nodes

Quantitative Results

Conclusion Aim to find the compact community that contains the given query nodes and it is densely connected Measurement based on constraints Minimum degree Distance Size Heuristics GreedyGen GreedyDist GreedyFast

Questions?