Tantipathananandh Chayant Tantipathananandh with Tanya Berger-Wolf Constant-Factor Approximation Algorithms for Identifying Dynamic Communities.

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Approximation Algorithms
Charalampos (Babis) E. Tsourakakis KDD 2013 KDD'131.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
© 2005 IBM Corporation Discovering Large Dense Subgraphs in Massive Graphs David Gibson IBM Almaden Research Center Ravi Kumar Yahoo! Research* Andrew.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
B IPARTITE I NDEX C ODING Arash Saber Tehrani Alexandros G. Dimakis Michael J. Neely Department of Electrical Engineering University of Southern California.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Five Problems CSE 421 Richard Anderson Winter 2009, Lecture 3.
Social and ecological factors influencing movement and organizational patterns in sheep Habiba, Caitlin Barale, Ipek Kulahci, Rajmonda Sulo and Khairi.
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
P RELIMINARIES –C OMPUTATIONAL P ROBLEM Given a set of real numbers, output a sequence, ( l 1, …, l i, …, l n ), where l i ≤ l i+1 for i = 1 … n-1. Naive.
Zebras Dan Rubenstein, Siva Sandaresan, Ilya Fischhoff (Princeton) Movie credit: “Champions of the Wild”, Omni-Film Productions.
Sensor Placement September 4, 2003 Tanya Berger-Wolf University of New Mexico Discrete Sensor Placement Problem Tanya Y. Berger-Wolf With William E. Hart.
Vertex Cover, Dominating set, Clique, Independent set
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Advanced Topics in Data Mining Special focus: Social Networks.
A 2-Approximation algorithm for finding an optimum 3-Vertex-Connected Spanning Subgraph.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
An introduction to Approximation Algorithms Presented By Iman Sadeghi.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.
22C:19 Discrete Math Graphs Spring 2014 Sukumar Ghosh.
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
On the approximability of the link building problem Author - MartinOlsena,AnastasiosViglasb, ∗ Speaker - Wayne Yang.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
A framework For Community Identification in Dynamic Social Networks Chayant, Tanya Berger-Wolf, David Kempe [KDD’07] Advisor : Dr. Koh Jia-Ling Advisor.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Network Aware Resource Allocation in Distributed Clouds.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University
Finding dense components in weighted graphs Paul Horn
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
© 2006 Board of Trustees of the University of Illinois Authored by Tanya Berger-Wolf Analysis of Dynamic Social Networks Tanya Berger-Wolf Department.
Approximation Algorithms
Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
1 Approximate Algorithms (chap. 35) Motivation: –Many problems are NP-complete, so unlikely find efficient algorithms –Three ways to get around: If input.
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.
Techniques for Proving NP-Completeness Show that a special case of the problem you are interested in is NP- complete. For example: The problem of finding.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
A Computational Framework for Analysis of Dynamic Social Networks Tanya Berger-Wolf University of Illinois at Chicago Joint work with Jared Saia University.
Data Structures & Algorithms Graphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
A Framework for Finding Communities in Dynamic Social Networks David Kempe University of Southern California Chayant Tantipathananandh, Tanya Berger-Wolf.
Approximation Algorithms for TSP Tsvi Kopelowitz 1.
Dynamics of communities in two fission-fusion species, Grevy's zebra and onager Chayant Tantipathananandh 1, Tanya Y. Berger-Wolf 1, Siva R. Sundaresan.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
Graph Algorithms Maximum Flow - Best algorithms [Adapted from R.Solis-Oba]
Tantipathananandh Chayant Tantipathananandh with Tanya Berger-Wolf Constant-Factor Approximation Algorithms for Identifying Dynamic Communities.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Instructor Neelima Gupta Table of Contents Introduction to Approximation Algorithms Factor 2 approximation algorithm for TSP Factor.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
New Characterizations in Turnstile Streams with Applications
The Taxi Scheduling Problem
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Presentation transcript:

Tantipathananandh Chayant Tantipathananandh with Tanya Berger-Wolf Constant-Factor Approximation Algorithms for Identifying Dynamic Communities

Social Networks These are snapshots and networks change over time

Dynamic Networks Aggregated network Interactions occur in the form of disjoint groups Groups are not communities … t=2 t= … t=

Communities What is community? “Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties.” [Wasserman & Faust 1994] Dynamic Community Identification – GraphScope [Sun et al 2005] – Metagroups [Berger-Wolf & Saia 2006] – Dynamic Communities [TBK 2007] – Clique Percolation [Palla et al 2007] – FacetNet [Lin et al 2009] – Bayesian approach [Yang et al 2009]

Ship of Theseus Jeannot's knife “has had its blade changed fifteen times and its handle fifteen times, but is still the same knife.” [French story] from Wikipedia “The ship … was preserved by the Athenians …, for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same.” [Plutarch, Theseus]

Ship of Theseus … Individual parts never change identities Cost for changing identity

Ship of Theseus … Identity changes to match the group Costs for visiting and being absent

Approach

Community = Color Valid coloring: In each time step, different groups have different colors.

Interpretation Group color: How does community c interact at time t?

Interpretation Individual color: Who belong to community c at time t?

Social Costs: Conservatism Switching cost α α α α Absence cost β 1 Visiting cost β 2 α α α

Social Costs: Loyalty β1β1 β1β1 β1β1 Absence cost β 1 Visiting cost β 2 Switching cost α β1β1 β1β1 β1β β1β1

Social Costs: Loyalty β2β2 β2β2 Switching cost αAbsence cost β 1 Visiting cost β β2β2 2 2 β2β2 3 3

Problem Complexity Minimizing total cost is hard NP-complete and APX-hard [with Berger-Wolf and Kempe 2007] Constant-Factor Approximation [details in paper] Easy special case If no missing individuals and 2α ≤ β 2, then simply weighted bipartite matching [details in paper]

– assume all individuals are observed at all time steps

Greedy Approximation time No visiting or absence and minimizing switching No visiting or absence and minimizing switching

Greedy Approximation ≈ maximizing path coverage No visiting or absence and minimizing switching No visiting or absence and minimizing switching 2 Improvement by dynamic programming Greedy alg guarantees max{2, 2α/β 1, 4α/β 2 } in α, β 1, β 2, independent of input size Greedy alg guarantees max{2, 2α/β 1, 4α/β 2 } in α, β 1, β 2, independent of input size time

Southern Women Data Set [DGG 1941] 18 individuals, 14 time steps Collected in Natchez, MS, 1935 aggregated network

Ethnography [DGG1941] Core note: columns not ordered by time

Optimal Communities all costs equal white circles = unknown Core time individuals ethnography

time Approximate Optimal Core ethnography

Approximation Power 28 inds, 44 times29 inds, 82 times313 inds, 758 times

Approximation Power 41 inds, 418 times264 inds, 425 times96 inds, 1577 times

Conclusions Identity of objects that change over time (Ship of Theseus Paradox) Formulate an optimization problem Greedy approximation – Fast – Near-optimal Future Work – Algorithm with guarantee not depending on α, β 1, β 2 – Network snapshots instead of disjoint groups

Arun Maiya Saad Sheikh Thank You NSF grant, KDD student travel award Habiba David Kempe Jared Saia Mayank Lahiri Dan Rubenstein Tanya Berger-Wolf Rajmonda Sulo Robert Grossman Siva Sundaresan Ilya Fischoff Anushka Anand Chayant

Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins IBM Almaden Research Center On the Bursty Evolution of Blogspace

Blogspace Collection of blogs with their links Motivation – Sociological Different with traditional web page – Technical From static snapshot to dynamic graphs

Web communities (Ravi Kumar,1999) groups of individuals who share a common interest characterized by dense directed bipartite subgraphs. Bursty communities of blogs Exhibit striking temporal characteristics Extract the community within a time interval

time graph G = (V,E) v in V has an associated duaration D(v) e in E is a triple (u, v, t) t is a time in interval D(u) ∩ D(v). prefix of G at time t Gt = (V t,E t ) V t = {v in V | D(v) ∩ [0, t] ≠ Ø } E t = {(u, v, t) in E| t’ ≤ t}

Two step approach – Community extraction Extract dense subgraphs( potential communities) – Bust analysis analyze each dense subgraph to identfy and rank bursts in these communities.

Finding the densest subgraph: NP-hard Two steps: – Pruning Remove vertices of degree no more than one Vertices of degree two are K 3 g Output and remove communities (pass a threshold) Repeat the 3 steps above – Expanding Determines the vertex containing the most links Add it to the community If the links is larger than t k.

Kleinberg’s method (SIGKDD 2002) model the generation of events by an automaton – one of two states, “low” and “high.” high state is hypothesized as generating bursts of events. a cost is associated with any state transition to discourage short bursts. find a low cost state sequence that is likely to generate the stream. solves the problem of enumerating all the bursts by order of weight( dynamic programming)

Expansion in community extraction Edges must grow to triangles; communities of size up to six will only grow vertices that link to all but one vertex; Communities of size up to nine will only grow vertices that link to all but two vertices; communities up to size 20 will grow only vertices that link to 70% of the community; larger communities will grow only vertices that link to at least 60% of the community