Presentation is loading. Please wait.

Presentation is loading. Please wait.

 DM-Group Meeting Liangzhe Chen, Oct. 21 2015. Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,

Similar presentations


Presentation on theme: " DM-Group Meeting Liangzhe Chen, Oct. 21 2015. Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,"— Presentation transcript:

1  DM-Group Meeting Liangzhe Chen, Oct. 21 2015

2 Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. Traina Jr., C. Faloutsos  Modeling the Dynamics of Composite Social Network  KDD’13  E. Zhong, W. Fan, Y. Zhu, Q. Yang  A Complex Network Analysis of the United States Air Transportation  ASONAM’12  D. P. Cheung, M. H. Gunes

3 1 st Paper  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. Traina Jr., C. Faloutsos

4 Problems  Q1: What are the patterns of temporal activities caused by human communication in social media?  Activities such as tweet posting in twitter.  Q2: Is it possible to model these patterns?  Q3: Can we use these patterns to tell if a user is a human or a bot based only on the timing of their posts?

5 Formal Problems  Q1 (Pattern-Finding): Given the time-stamps data from different social media services, analyze the IAT distribution and find patterns that are common to all services.  Q2 (Time-Stamp Generation): Design a model that is able to generate synthetic time-stamps whose IAT fits the real data distribution and matches all the patterns found in Problem 1.  Q3 (Bot-Detection): Given time-stamp data from a set of users {U 1,U 2,U 3,···} where each user U i has a sequence of postings time-stamps T i = (t 1,t 2,t 3,...) and the corresponding sequence of postings IAT ∆ i, decide if user U i is a human or a bot. Inter-Arrival Time (IAT): Time difference between consecutive activities.

6 Datasets Studies  Twitter  3,000 most recent tweets from 9,000 verified users.  Remove users with less than 800 tweets (6,790 users left).  Add data from 64 bots users.  Reddit  1,000 most recent comments from 200,000 users.  Remove users with less than 800 comments (21,198 users left)  Add 32 bots users.

7 Q1: Patterns Finding  Positive correlation: The IAT ∆ i between two postings depends on the previous IAT ∆ i−1

8 Q1: Patterns Finding  Periodic Spikes: The IAT distribution has spikes at every 24 hours.

9 Q1: Patterns Finding  Bimodal Distribution: The IAT distribution has two “humps”, the first occurring near 100s and the second occurring near 10,000s.

10 Q1: Patterns Finding  Heavy-Tailed Distribution

11 Q2: Rest-Sleep-and-Comment  RSC algorithm has 3 states:  Active: generate postings events with p post or null events with 1-p post at every time interval δ i A  Rest: generate null events at every time interval δ i R  Sleep: generate a single null event in the next wake up time t wake

12 Q2: Rest-Sleep-and-Comment

13 Q2: Parameter Estimation  Parameter estimation  Count the log-binned histogram of IAT for real and synthetic data  Minimize the square distance between synthetic and real data bin counts:

14 Q2: RSC at work

15

16

17 Q3: Bots Detection  Generate the log-binned histogram count from both the estimated RSC model, and the target user.  Compute the dissimilarity from the user to the RSC model as  Train a Naïve Bayes classifier to get probability of the user being a bot.

18 Q3: Bots Detection

19 2 nd Paper  Modeling the Dynamics of Composite Social Network  KDD’13  E. Zhong, W. Fan, Y. Zhu, Q. Yang

20 Introduction  Users engage in multiple networks and from a ‘composite social network’ by considering common users as bridges.  Users interaction in one network can influence their behavior in another.  2 users without common neighbors may follow each other on Twitter because they are familiar on Facebook.  1 user interact with her friends on Facebook less because they graduate (creating more links on Linkedin).

21 Problem Definition  Given the network sequence {G t } T t=1, where Gt={G i t =(U i,E i t )} l i=1, construct the composite network at time T+1.

22 ITCom model  Infinite Time-Evolving Composite Network Model  Integrate infinite communities, knowledge transfer, and dynamic modeling into the MMSB model

23 MMSB  Mixed Membership Stochastic Blockmodel

24 Infinite Modeling  Communities in networks can come and go, it’s hard to fix the number of communities.  Assuming infinite number of communities, using stick-breaking process to generate the probabilities of each community

25 Knowledge Transfer across Networks  Each user has a latent interest vector x i (1 by D)  Each network has a mapping w d (D by K d ) from latent features to network-dependent communities.

26 Dynamic Modeling  Community compatibility matrix at time t B t evolves from B t-1 using Beta distribution  Latent interests x i, and the mapping from interests to communities w d evolve from previous values using Gaussian distribution  Finally, down-weight the probability of successful interaction

27 Summary

28 Experiments  Two tasks  Link prediction: predicts who will interact whom in a given time stamp  Macro-evolution: predicts changes of networks’ statistics, e.g. clustering coefficients and degree distributions, etc.

29 Datasets  Relational Network where user pairs are distinct (Tencent, Epinion)  Interaction Network where users can interact with each other several times (the other six datasets)

30 Link prediction  Estimate ITCom parameters, generate the probabilities of interactions among users.  Measure the performance with Mean Average Precision

31 Network Evolution  Predict the degree distribution and the clustering coefficient.  Compare with Microscopy Evolution model.

32 3 rd Paper  A Complex Network Analysis of the United States Air Transportation  ASONAM’12  D. P. Cheung, M. H. Gunes

33 Purposes  Analyze the air transportation network to better understand its characteristics  Analyze its changes over the past two decades

34 Dataset  Generate networks from the public data from The Bureau of Transportation Statistics’ TranStats website.  Each node is an airport.  A direct edge represents an available route.  The edge weight represents the number of pasengers, freight, and mail transported between airports.

35 Results


Download ppt " DM-Group Meeting Liangzhe Chen, Oct. 21 2015. Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,"

Similar presentations


Ads by Google