Download presentation
Presentation is loading. Please wait.
Published byDortha Henry Modified over 8 years ago
1
DM-Group Meeting Liangzhe Chen, Oct. 21 2015
2
Papers to be present RSC: Mining and Modeling Temporal Activity in Social Media KDD’15 A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. Traina Jr., C. Faloutsos Modeling the Dynamics of Composite Social Network KDD’13 E. Zhong, W. Fan, Y. Zhu, Q. Yang A Complex Network Analysis of the United States Air Transportation ASONAM’12 D. P. Cheung, M. H. Gunes
3
1 st Paper RSC: Mining and Modeling Temporal Activity in Social Media KDD’15 A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. Traina Jr., C. Faloutsos
4
Problems Q1: What are the patterns of temporal activities caused by human communication in social media? Activities such as tweet posting in twitter. Q2: Is it possible to model these patterns? Q3: Can we use these patterns to tell if a user is a human or a bot based only on the timing of their posts?
5
Formal Problems Q1 (Pattern-Finding): Given the time-stamps data from different social media services, analyze the IAT distribution and find patterns that are common to all services. Q2 (Time-Stamp Generation): Design a model that is able to generate synthetic time-stamps whose IAT fits the real data distribution and matches all the patterns found in Problem 1. Q3 (Bot-Detection): Given time-stamp data from a set of users {U 1,U 2,U 3,···} where each user U i has a sequence of postings time-stamps T i = (t 1,t 2,t 3,...) and the corresponding sequence of postings IAT ∆ i, decide if user U i is a human or a bot. Inter-Arrival Time (IAT): Time difference between consecutive activities.
6
Datasets Studies Twitter 3,000 most recent tweets from 9,000 verified users. Remove users with less than 800 tweets (6,790 users left). Add data from 64 bots users. Reddit 1,000 most recent comments from 200,000 users. Remove users with less than 800 comments (21,198 users left) Add 32 bots users.
7
Q1: Patterns Finding Positive correlation: The IAT ∆ i between two postings depends on the previous IAT ∆ i−1
8
Q1: Patterns Finding Periodic Spikes: The IAT distribution has spikes at every 24 hours.
9
Q1: Patterns Finding Bimodal Distribution: The IAT distribution has two “humps”, the first occurring near 100s and the second occurring near 10,000s.
10
Q1: Patterns Finding Heavy-Tailed Distribution
11
Q2: Rest-Sleep-and-Comment RSC algorithm has 3 states: Active: generate postings events with p post or null events with 1-p post at every time interval δ i A Rest: generate null events at every time interval δ i R Sleep: generate a single null event in the next wake up time t wake
12
Q2: Rest-Sleep-and-Comment
13
Q2: Parameter Estimation Parameter estimation Count the log-binned histogram of IAT for real and synthetic data Minimize the square distance between synthetic and real data bin counts:
14
Q2: RSC at work
17
Q3: Bots Detection Generate the log-binned histogram count from both the estimated RSC model, and the target user. Compute the dissimilarity from the user to the RSC model as Train a Naïve Bayes classifier to get probability of the user being a bot.
18
Q3: Bots Detection
19
2 nd Paper Modeling the Dynamics of Composite Social Network KDD’13 E. Zhong, W. Fan, Y. Zhu, Q. Yang
20
Introduction Users engage in multiple networks and from a ‘composite social network’ by considering common users as bridges. Users interaction in one network can influence their behavior in another. 2 users without common neighbors may follow each other on Twitter because they are familiar on Facebook. 1 user interact with her friends on Facebook less because they graduate (creating more links on Linkedin).
21
Problem Definition Given the network sequence {G t } T t=1, where Gt={G i t =(U i,E i t )} l i=1, construct the composite network at time T+1.
22
ITCom model Infinite Time-Evolving Composite Network Model Integrate infinite communities, knowledge transfer, and dynamic modeling into the MMSB model
23
MMSB Mixed Membership Stochastic Blockmodel
24
Infinite Modeling Communities in networks can come and go, it’s hard to fix the number of communities. Assuming infinite number of communities, using stick-breaking process to generate the probabilities of each community
25
Knowledge Transfer across Networks Each user has a latent interest vector x i (1 by D) Each network has a mapping w d (D by K d ) from latent features to network-dependent communities.
26
Dynamic Modeling Community compatibility matrix at time t B t evolves from B t-1 using Beta distribution Latent interests x i, and the mapping from interests to communities w d evolve from previous values using Gaussian distribution Finally, down-weight the probability of successful interaction
27
Summary
28
Experiments Two tasks Link prediction: predicts who will interact whom in a given time stamp Macro-evolution: predicts changes of networks’ statistics, e.g. clustering coefficients and degree distributions, etc.
29
Datasets Relational Network where user pairs are distinct (Tencent, Epinion) Interaction Network where users can interact with each other several times (the other six datasets)
30
Link prediction Estimate ITCom parameters, generate the probabilities of interactions among users. Measure the performance with Mean Average Precision
31
Network Evolution Predict the degree distribution and the clustering coefficient. Compare with Microscopy Evolution model.
32
3 rd Paper A Complex Network Analysis of the United States Air Transportation ASONAM’12 D. P. Cheung, M. H. Gunes
33
Purposes Analyze the air transportation network to better understand its characteristics Analyze its changes over the past two decades
34
Dataset Generate networks from the public data from The Bureau of Transportation Statistics’ TranStats website. Each node is an airport. A direct edge represents an available route. The edge weight represents the number of pasengers, freight, and mail transported between airports.
35
Results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.