Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pei Lee, ICDE 2014, Chicago, IL, USA

Similar presentations


Presentation on theme: "Pei Lee, ICDE 2014, Chicago, IL, USA"— Presentation transcript:

1 Pei Lee, ICDE 2014, Chicago, IL, USA
Incremental Cluster Evolution Tracking from Highly Dynamic Network Data Pei Lee, Laks V.S. Lakshmanan Computer Science Department University of British Columbia Vancouver, BC, Canada Evangelos E. Milios Computer Science Department Dalhousie University Halifax, NS, Canada The problem, challenges, theory, experiments, conclusion, (related work, theory proofs) 2019/4/15 Pei Lee, ICDE 2014, Chicago, IL, USA

2 Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples

3 Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples

4 Evolving Network Network changes with time Examples: Social Network
add/remove friends or followers Co-authorship/citation network new collaborations/citations added every year /Calling Graph every edge has a time stamp

5 An illustration of evolving co-authorship network
Taken from An illustration of evolving co-authorship network

6 Social Streams: Twitter, Facebook, etc

7 Social Event Evolution Tracking

8 Event Evolution Patterns

9 Evolving Network Social Events
Model social stream as an evolving network Evolving Network Social Events

10 Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples

11 Traditional Evolving Network Mining Approaches
Divide and Conquer: decompose a dynamic network into a series of snapshots for each moment, apply graph mining algorithms on each snapshot to find useful patterns, match patterns between consecutive moments to generate a dynamic pattern sequence. Imagine the finding of evolving clusters

12 Illustrating Divide-and-Conquer
Moment 1 Moment 2 Moment 3 Moment 4 Moment 5 Taken from

13 Divide-and-Conquer: Clustering in evolving networks
Ct: a cluster we find at snapshot of time t; Ct+1: a cluster we find at snapshot of time t+1. How to define “Ct evolves to Ct+1”? Heuristics: If Ct and Ct+1 have the overlap above a given threshold, we say they are matched. Formally, based on Jaccard similarity:

14 Drawbacks of Divide-and-conquer
Quality: It is difficult to decide the threshold K The matching between two consecutive snapshots will lose accuracy Performance: Need to cluster each snapshot from scratch Lots of redundant computation

15 New Proposal: Incremental Computation for dense subgraph mining
Basic Idea: For the very first snapshot, mine the graph pattern set S0 from scratch After this, this step is never applied again. On the steady state, let t start at 1 Obtain the graph update ΔG by comparing the network at moment t with moment t-1 Derive St from St-1 based on ΔG Let t increase to t+1

16 Divide-and-Conquer vs. Incremental Computation
1, 2, 3, 4 Incremental Computation: Initial step: 1 Steady state: 5 Advantages: Avoid redundant computation More accurately capture the evolution patterns

17 Incremental Computation Framework
Adjust the clusters at each moment as the updating of networks

18 Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples

19 Post Network Construction
A social stream is a FIFO queue of posts Post similarity: Post Network: Each post is a node Each edge is constructed if the similarity of end nodes is higher than a given threshold Content similarity Time distance

20 Evolving Post Network We can build a post network for your daily timeline in Facebook/Twitter/LinkedIn As the streaming of posts, the post network is evolving very quickly Challenges of evolving post network mining: The quick surge of post streams (speed) A large number of posts are noise (quality) The huge amount of posts (scalability)

21 Observing Time Window Len: time window length
Δt: time window shifting size at each moment Notations:

22 How to filter out noise? Noise is ubiquitous in social streams
“Good morning ”, “thank you ^.^”, etc About 40% tweets make very little sense

23 How to filter out noise? Distinguish posts into three types: wt(p): the priority of post p at moment t For the example in social network: Core: person with lots of friends Border: not core, but a friend of core Noise: not core, and not a friend of core

24 Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples

25 Skeletal graph of a post network
A graph consisting of all core posts A brief summary of the original post network Clusters can be derived from skeletal graphs Our algorithm monitors the changing of skeletal graphs

26 Network Evolution Operations
Add a post Remove a post

27 Cluster Evolution Operations
We define 6 cluster evolution patterns: appear, disappear, grow, decay, merge and split

28 Summary: Cluster Evolution
Add a post: a new cluster may appear An existing cluster may grow Multiple clusters may merge into the single one Delete a post: An existing cluster may disappear An existing cluster may decay An existing cluster may split into multiple clusters

29 Network Evolution to Cluster Evolution
Cluster evolution of adding a post

30 Network Evolution to Cluster Evolution
Cluster evolution of deleting a post

31 Bulk Updating Existing incremental computation on dynamic graphs usually treats the addition/deletion of nodes or edges one by one Since social posts arrive at a high speed, the post-by-post incremental updating will lead to very poor performance Bulk updating: update subgraph-by-subgraph a bulk = a post cluster More details in Section VII of the paper

32 Proposed Algorithms ICM: Incremental Cluster Maintenance
eTrack: Cluster Evolution Tracking

33 Outline Motivation Incremental Computation Framework
Evolving network meets social event Incremental Computation Framework Divide-and-conquer vs. incremental computation Post Network Construction Combat noise Network and Cluster Evolution Evolution operations Empirical Study Examples

34 Twitter Technology domain data sets
Time span: 1 month Tech-Lite: collecting all the timelines of users listed in the Technology category of “Who to follow” and their retweeted users streaming rate is about tweets/day Tech-Full: collecting all the timelines followed by users who are in the Technology category streaming rate is about 7216 tweets/hour

35 Ground Truth Major events from News articles: Peaks in Google Trends
Crawl news from major technology websites By treating the news article titles as posts, we apply our approach to extract events Peaks in Google Trends

36 Precision and recall HashtagPeaks: use common hashtags to compute post similarity UnigramPeaks: use common unigrams to compute post similarity Louvain: use common entities to compute post similarity and apply Louvain community detection algorithm eTrack: use common entities to compute post similarity and apply our approach

37 Top 10 social events detected by different methods

38 Running time (a) Adjusting time window length
(b) Adjusting step length

39 Cluster Evolution Examples

40

41

42 Conclusion Theoretical side: Application side: Q & A
We propose an incremental computation framework for cluster evolution tracking in highly dynamic networks Application side: We propose an efficient tracking system for event evolution patterns in social streams Q & A

43 Post Network Mining A snapshot of post network is constructed by the posts in the same time window As social posts stream in, events (dense clusters) are identified out

44 Relationships between post network, skeletal graph and clusters
Skeletal graph is a sketch of post network Clusters can be generated from the skeletal graphs


Download ppt "Pei Lee, ICDE 2014, Chicago, IL, USA"

Similar presentations


Ads by Google