Download presentation
Presentation is loading. Please wait.
1
Pei Lee, ICDE 2014, Chicago, IL, USA
Incremental Cluster Evolution Tracking from Highly Dynamic Network Data Pei Lee, Laks V.S. Lakshmanan Computer Science Department University of British Columbia Vancouver, BC, Canada Evangelos E. Milios Computer Science Department Dalhousie University Halifax, NS, Canada The problem, challenges, theory, experiments, conclusion, (related work, theory proofs) 2019/4/15 Pei Lee, ICDE 2014, Chicago, IL, USA
2
Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples
3
Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples
4
Evolving Network Network changes with time Examples: Social Network
add/remove friends or followers Co-authorship/citation network new collaborations/citations added every year /Calling Graph every edge has a time stamp
5
An illustration of evolving co-authorship network
Taken from An illustration of evolving co-authorship network
6
Social Streams: Twitter, Facebook, etc
7
Social Event Evolution Tracking
8
Event Evolution Patterns
9
Evolving Network Social Events
Model social stream as an evolving network Evolving Network Social Events
10
Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples
11
Traditional Evolving Network Mining Approaches
Divide and Conquer: decompose a dynamic network into a series of snapshots for each moment, apply graph mining algorithms on each snapshot to find useful patterns, match patterns between consecutive moments to generate a dynamic pattern sequence. Imagine the finding of evolving clusters
12
Illustrating Divide-and-Conquer
Moment 1 Moment 2 Moment 3 Moment 4 Moment 5 Taken from
13
Divide-and-Conquer: Clustering in evolving networks
Ct: a cluster we find at snapshot of time t; Ct+1: a cluster we find at snapshot of time t+1. How to define “Ct evolves to Ct+1”? Heuristics: If Ct and Ct+1 have the overlap above a given threshold, we say they are matched. Formally, based on Jaccard similarity:
14
Drawbacks of Divide-and-conquer
Quality: It is difficult to decide the threshold K The matching between two consecutive snapshots will lose accuracy Performance: Need to cluster each snapshot from scratch Lots of redundant computation
15
New Proposal: Incremental Computation for dense subgraph mining
Basic Idea: For the very first snapshot, mine the graph pattern set S0 from scratch After this, this step is never applied again. On the steady state, let t start at 1 Obtain the graph update ΔG by comparing the network at moment t with moment t-1 Derive St from St-1 based on ΔG Let t increase to t+1
16
Divide-and-Conquer vs. Incremental Computation
1, 2, 3, 4 Incremental Computation: Initial step: 1 Steady state: 5 Advantages: Avoid redundant computation More accurately capture the evolution patterns
17
Incremental Computation Framework
Adjust the clusters at each moment as the updating of networks
18
Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples
19
Post Network Construction
A social stream is a FIFO queue of posts Post similarity: Post Network: Each post is a node Each edge is constructed if the similarity of end nodes is higher than a given threshold Content similarity Time distance
20
Evolving Post Network We can build a post network for your daily timeline in Facebook/Twitter/LinkedIn As the streaming of posts, the post network is evolving very quickly Challenges of evolving post network mining: The quick surge of post streams (speed) A large number of posts are noise (quality) The huge amount of posts (scalability)
21
Observing Time Window Len: time window length
Δt: time window shifting size at each moment Notations:
22
How to filter out noise? Noise is ubiquitous in social streams
“Good morning ”, “thank you ^.^”, etc About 40% tweets make very little sense
23
How to filter out noise? Distinguish posts into three types: wt(p): the priority of post p at moment t For the example in social network: Core: person with lots of friends Border: not core, but a friend of core Noise: not core, and not a friend of core
24
Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples
25
Skeletal graph of a post network
A graph consisting of all core posts A brief summary of the original post network Clusters can be derived from skeletal graphs Our algorithm monitors the changing of skeletal graphs
26
Network Evolution Operations
Add a post Remove a post
27
Cluster Evolution Operations
We define 6 cluster evolution patterns: appear, disappear, grow, decay, merge and split
28
Summary: Cluster Evolution
Add a post: a new cluster may appear An existing cluster may grow Multiple clusters may merge into the single one Delete a post: An existing cluster may disappear An existing cluster may decay An existing cluster may split into multiple clusters
29
Network Evolution to Cluster Evolution
Cluster evolution of adding a post
30
Network Evolution to Cluster Evolution
Cluster evolution of deleting a post
31
Bulk Updating Existing incremental computation on dynamic graphs usually treats the addition/deletion of nodes or edges one by one Since social posts arrive at a high speed, the post-by-post incremental updating will lead to very poor performance Bulk updating: update subgraph-by-subgraph a bulk = a post cluster More details in Section VII of the paper
32
Proposed Algorithms ICM: Incremental Cluster Maintenance
eTrack: Cluster Evolution Tracking
33
Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples
34
Twitter Technology domain data sets
Time span: 1 month Tech-Lite: collecting all the timelines of users listed in the Technology category of “Who to follow” and their retweeted users streaming rate is about tweets/day Tech-Full: collecting all the timelines followed by users who are in the Technology category streaming rate is about 7216 tweets/hour
35
Ground Truth Major events from News articles: Peaks in Google Trends
Crawl news from major technology websites By treating the news article titles as posts, we apply our approach to extract events Peaks in Google Trends
36
Precision and recall HashtagPeaks: use common hashtags to compute post similarity UnigramPeaks: use common unigrams to compute post similarity Louvain: use common entities to compute post similarity and apply Louvain community detection algorithm eTrack: use common entities to compute post similarity and apply our approach
37
Top 10 social events detected by different methods
38
Running time (a) Adjusting time window length
(b) Adjusting step length
39
Cluster Evolution Examples
42
Conclusion Theoretical side: Application side: Q & A
We propose an incremental computation framework for cluster evolution tracking in highly dynamic networks Application side: We propose an efficient tracking system for event evolution patterns in social streams Q & A
43
Post Network Mining A snapshot of post network is constructed by the posts in the same time window As social posts stream in, events (dense clusters) are identified out
44
Relationships between post network, skeletal graph and clusters
Skeletal graph is a sketch of post network Clusters can be generated from the skeletal graphs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.