Download presentation
Presentation is loading. Please wait.
1
A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee
2
Outline of Presentation The Challenge: Dynamic Social Networks Framework and Problem Formulation Individual and Group Colorings Group Coloring Heuristics Experimental Results Future Directions
3
The Problem Many well-known approaches to identify communities in social networks –Graph Partitioning –Clustering –Various measures of closeness or density But, these approaches generally assume static networks Most social networks are dynamic
4
Dynamic Social Networks Social Networks change over time –Membership changes –Interaction changes Most community identification techniques: –Use a single snapshot –Or use time-averaged measurements –Lose important information
5
Importance of Dynamic Information Networks 1 and 2: same average characteristics, but… –Network 1 shows an oscillation –Network 2 suggests that C joins the community AB ABC AB ABC AB ABC time AB AB AB ABC ABC ABC T1 T2 T3 T4 T5 T6 Network 1Network 2
6
Proposal New framework for modeling social networks over time Algorithms and Heuristics to identify dynamic communities Experiments to verify the concept and the computational performance
7
Problem Formation Given: –A set of individuals –A sequence of snapshot observations Find: –A best-fit set of time-varying communities C(t) –Best-fit time-varying community membership for each individual Approach: –Combinatorial optimization –Graph coloring
8
Model: Individuals and Groups Set of individualsX = {i 1, i 2, …i n } Sequence of observations –Discrete time –Record interaction between individuals The set of individuals interacting at time t define a group. –If A interacts with B, and B interacts with C, than {A,B,C} ⊆ a group A B C
9
Group vs Community Snapshot Graph –Individual is a vertex –Interaction is an edge –Group is a connected subgraph –Assumption: interaction is sufficiently limited so that the graph is not connected (we have disjoint groups) Group ≠ Community –Groups capture observed interaction at a point in time –Communities extend over time
10
Graphing the Observations Each time slice is one observation Edges within a time slice show observed interaction at time t Add edges joining all observations of the same individual No edges between groups from one time to another ○ = individual □ = group
11
Refine the Problem A community appears as a sequence of groups, of at most one group per time slice. Tasks: –Assign each group to a community (color the group vertices) –Assign each individual to a community, for each time step (color individual vertices) More Assumptions: –Individuals belong to one community at a time –Individuals don’t change community frequently –Individuals frequently appear in their community
12
Cost Model Quantify a “good” community identification Assign costs to undesirable behavior: –I-cost: when an individual changes color. –G-costs: 1 when an individual is absent from its community. 2 when an individual is present in a different community. –C-cost: for each color that I uses Find a coloring with minimum cost
13
Coloring Choices and Costs Coloring 1: C changes community and then changes back. –Cost = 2* (+ if this color hasn’t been used before) Coloring 2: C stays in its original community and just visits. –Cost = 1 + 2 Optimal coloring depends on comparison ( 1 + 2) < (2* + ) or (2* ) AB AB C AB AB C time T1 T2 T3 T4 Coloring 1Coloring 2 C C D D D DAB AB C AB AB C C C D D D D At time T3, C temporarily changes its interaction.
14
Finding Optimal Colorings Finding the optimal solution is NP-hard Partition the problem: 1.Find an optimal set of communities 2.Find optimal assignment of individuals to communities If Phase 1 (Group Coloring) is completed first: –Phase 2 is reduced from O(2 N ) to O(2 G ), N = # of individuals, G = # of groups –The cost incurred by one individual’s coloring is independent of the colors chosen by others.
15
Independence of Individual Color Choice Proof: Cost of an individual’s behavior = A (I-cost) + B (G-cost) + C * (C-cost) Costs are assessed individually: –I-cost= ∗ (# of color changes) –G-cost= 1 ∗ (# absences from its group) + 2 ∗ (# visits to other groups) –C-cost= ∗ (# of colors that an individual uses) So, we can solve for each individual one at a time. Moreover, we can assess cost incrementally, from time t to time t+1…
16
Individual Coloring Algorithm C = set of all colors observed to be used by an individual i (t) = {S ⊆ C: 1 ≤ |S| ≤ t} all possible subsets of colors up to time t G(t,x)= G-cost to use color x at time t I(t,x,y)= I-cost to use color x at time t-1 and color y at time t C(x,R)= C-cost to use color x when color set R has been used Min. cost at time t, using color x, with color set S used: At time=1: (I, {x}, x) = G(1,x) At time=t: (t, S, x) = G(t, x) + min [ (t-1, R, y) + I(t, x, y) + C(x, R) ] over all R and y, where R ∈ (t-1), y ∈ R R U {x} = S, i-cost: changing color g-cost: wrong group c-cost: new color
17
Optimal Individual Coloring Given a group coloring, the minimum cost of coloring the individual I is min (T, S, x) S ∈ (T), x ∈ S Time complexity is O( nT|C| 2 2 |C| ) Space requirement is O( |C| 2 |C| ) If the number of groups |C| is not large, the complexity is tractable.
18
Optimal Group Coloring Determine the best mapping of groups at time t to groups at time t+1 Groups that are mapped across time are part of the same community and have the same color A coloring is good if most individuals can retain their color from step to step. A possible coloring
19
Bipartite Matching Heuristic Matching Graph –For each pair of groups g, g’ at times t, t’=t+1, add a weighted edge from v g,t to v g’,t’ –Weight = |g ∩ g’|(similarity of g to g’) Find the maximum weight bipartite matching Evaluation –Weights i-cost more than g-cost –Performs well if membership is fairly stable –No long range perspective –More efficient heuristics? i-cost: changing color g-cost: wrong group c-cost: new color
20
Greedy Heuristics for Group Coloring Approach: Maximize pairwise similarity between groups, for all pairs of groups over all timesteps Jaccard’s index: Jac(g, g′) = | g ∩ g′| | g U g′| Weighted for temporal proximity: JacD(g, g′) = Jac(g, g′) | t - t′ | overlap between g and g′, scaled to size of g and g′
21
Greedy Heuristics for Group Coloring Greedy Heuristic 1 (time is not a factor) –Construct a square similarity matrix of size |#groups| –Using agglomerative clustering Greedy Heuristic 2 (look backwards in time) For t=1 to T do –Match most similar pairs g, g′ for any time t′ < t –If similarity=0 or all colors have been used, add a new color Greedy Heuristic 3 (look back the shortest interval) –Like Heuristic 2, but use t′, t′ is the closest value to t such that ∃ similarity(g, g′) > 0
22
Experiment 1: Verify the Framework Does the framework capture the intuitive concept of dynamic community? Procedure –Construct small, synthetic datasets –Use exhaustive search to get a truly optimal coloring
23
Experiment 1A: “Assembly Line” (A) ( ) =(1,0,1,1)(B) = (1,0,3,1) At each time step, 1 member leaves and 1 enters a group, resulting in a complete membership change in 3 steps. Results change as costs change. (A) favors stable membership. (B) allows for more fluid membership.
24
Experiment 1B: “Dutiful Children” 2, 3, and 4 are Children. 0 and 1 are Parents that visit a different child each timestep. Results: Framework succeeds at detecting the individual children as well as the visitation pattern. (A) ( ) =(1,0,1,1)(B) = (1,0,3,1)
25
Experiment 2: Quality of Heuristic Results Do the heuristics obtain colorings similar to those of an exhaustive search? Procedure –Re-test the synthetic datasets using the various heuristics Results: At least one Heuristic method obtains the same coloring and total cost as Exhaustive Search
26
Experiment 3: Real World Datasets Do the framework and heuristics together obtain expected results using real-world datasets?
27
Experiment 3A: “Southern Women” Eighteen women in 1933 in Natchez, Tennessee Tracks their attendance at 14 social events
28
Experiment 3A: Prior Results Twenty one analyses (1941 to 2001) all show similar results –Two clear communities –The membership of individuals 8, 9, and 16 is less certain.
29
Experiment 3A: Results Detects 4 communities, which are subsets of the traditional 2 communities Individuals 6 and 10 change membership over time By adjusting cost factors, the results of most of the 21 prior analyses can be duplicated =(1,1,1,1)
30
Experiment 3B: “Grevy’s Zebra” 28-member zebra herd observed 44 times over 3 months in 2002 The graph to the left shows the aggregate interaction. Temporal information is lost.
31
Experiment 3B: Results Inferred communities agree with manual results obtained by biologists. –4 stable communities –Some short-lived communities and some visiting
32
Conclusions We present a framework for identifying communities in dynamic social networks The framework produces meaningful results compared to traditional methods Heuristic methods produce near-optimal solutions Future Directions –Develop an approximation algorithm which guarantees the quality of the result –Investigate scalability over network size and time –Relax assumptions about interaction and dynamics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.