 DM-Group Meeting Liangzhe Chen, Oct. 21 2015. Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
An Analysis of Social Network-Based Sybil Defenses Sybil Defender
Automatic in vivo Microscopy Video Mining for Leukocytes * Chengcui Zhang, Wei-Bang Chen, Lin Yang, Xin Chen, John K. Johnstone.
Linear Regression.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Collaborative Signal Processing CS 691 – Wireless Sensor Networks Mohammad Ali Salahuddin 04/22/03.
Radial Basis Function Networks
Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.
Models of Influence in Online Social Networks
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth.
10 December, 2008 CIMCA2008 (Vienna) 1 Statistical Inferences by Gaussian Markov Random Fields on Complex Networks Kazuyuki Tanaka, Takafumi Usui, Muneki.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Prediction of Influencers from Word Use Chan Shing Hei.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.
Graph clustering to detect network modules
Lecture 23: Structure of Networks
Wenyu Zhang From Social Network Group
DATA MINING © Prentice Hall.
Contextual Intelligence as a Driver of Services Innovation
Structural Properties of Networks: Introduction
DM-Group Meeting Liangzhe Chen, Nov
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Intelligent Information System Lab
Community detection in graphs
Dieudo Mulamba November 2017
Clustering Evaluation The EM Algorithm
Nonparametric Latent Feature Models for Link Prediction
Lecture 23: Structure of Networks
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Hidden Markov Models Part 2: Algorithms
Complex World 2015 Workshop
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Graph and Tensor Mining for fun and profit
CONTEXT DEPENDENT CLASSIFICATION
EE513 Audio Signals and Systems
Lecture 23: Structure of Networks
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Example: Academic Search
The Naïve Bayes (NB) Classifier
Analytics – Statistical Approaches
Topic models for corpora and for graphs
Graph and Link Mining.
Pei Lee, ICDE 2014, Chicago, IL, USA
Lecture 21 Network evolution
Advanced Topics in Data Mining Special focus: Social Networks
Using Clustering to Make Prediction Intervals For Neural Networks
Human-centered Machine Learning
GhostLink: Latent Network Inference for Influence-aware Recommendation
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

 DM-Group Meeting Liangzhe Chen, Oct

Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. Traina Jr., C. Faloutsos  Modeling the Dynamics of Composite Social Network  KDD’13  E. Zhong, W. Fan, Y. Zhu, Q. Yang  A Complex Network Analysis of the United States Air Transportation  ASONAM’12  D. P. Cheung, M. H. Gunes

1 st Paper  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. Traina Jr., C. Faloutsos

Problems  Q1: What are the patterns of temporal activities caused by human communication in social media?  Activities such as tweet posting in twitter.  Q2: Is it possible to model these patterns?  Q3: Can we use these patterns to tell if a user is a human or a bot based only on the timing of their posts?

Formal Problems  Q1 (Pattern-Finding): Given the time-stamps data from different social media services, analyze the IAT distribution and find patterns that are common to all services.  Q2 (Time-Stamp Generation): Design a model that is able to generate synthetic time-stamps whose IAT fits the real data distribution and matches all the patterns found in Problem 1.  Q3 (Bot-Detection): Given time-stamp data from a set of users {U 1,U 2,U 3,···} where each user U i has a sequence of postings time-stamps T i = (t 1,t 2,t 3,...) and the corresponding sequence of postings IAT ∆ i, decide if user U i is a human or a bot. Inter-Arrival Time (IAT): Time difference between consecutive activities.

Datasets Studies  Twitter  3,000 most recent tweets from 9,000 verified users.  Remove users with less than 800 tweets (6,790 users left).  Add data from 64 bots users.  Reddit  1,000 most recent comments from 200,000 users.  Remove users with less than 800 comments (21,198 users left)  Add 32 bots users.

Q1: Patterns Finding  Positive correlation: The IAT ∆ i between two postings depends on the previous IAT ∆ i−1

Q1: Patterns Finding  Periodic Spikes: The IAT distribution has spikes at every 24 hours.

Q1: Patterns Finding  Bimodal Distribution: The IAT distribution has two “humps”, the first occurring near 100s and the second occurring near 10,000s.

Q1: Patterns Finding  Heavy-Tailed Distribution

Q2: Rest-Sleep-and-Comment  RSC algorithm has 3 states:  Active: generate postings events with p post or null events with 1-p post at every time interval δ i A  Rest: generate null events at every time interval δ i R  Sleep: generate a single null event in the next wake up time t wake

Q2: Rest-Sleep-and-Comment

Q2: Parameter Estimation  Parameter estimation  Count the log-binned histogram of IAT for real and synthetic data  Minimize the square distance between synthetic and real data bin counts:

Q2: RSC at work

Q3: Bots Detection  Generate the log-binned histogram count from both the estimated RSC model, and the target user.  Compute the dissimilarity from the user to the RSC model as  Train a Naïve Bayes classifier to get probability of the user being a bot.

Q3: Bots Detection

2 nd Paper  Modeling the Dynamics of Composite Social Network  KDD’13  E. Zhong, W. Fan, Y. Zhu, Q. Yang

Introduction  Users engage in multiple networks and from a ‘composite social network’ by considering common users as bridges.  Users interaction in one network can influence their behavior in another.  2 users without common neighbors may follow each other on Twitter because they are familiar on Facebook.  1 user interact with her friends on Facebook less because they graduate (creating more links on Linkedin).

Problem Definition  Given the network sequence {G t } T t=1, where Gt={G i t =(U i,E i t )} l i=1, construct the composite network at time T+1.

ITCom model  Infinite Time-Evolving Composite Network Model  Integrate infinite communities, knowledge transfer, and dynamic modeling into the MMSB model

MMSB  Mixed Membership Stochastic Blockmodel

Infinite Modeling  Communities in networks can come and go, it’s hard to fix the number of communities.  Assuming infinite number of communities, using stick-breaking process to generate the probabilities of each community

Knowledge Transfer across Networks  Each user has a latent interest vector x i (1 by D)  Each network has a mapping w d (D by K d ) from latent features to network-dependent communities.

Dynamic Modeling  Community compatibility matrix at time t B t evolves from B t-1 using Beta distribution  Latent interests x i, and the mapping from interests to communities w d evolve from previous values using Gaussian distribution  Finally, down-weight the probability of successful interaction

Summary

Experiments  Two tasks  Link prediction: predicts who will interact whom in a given time stamp  Macro-evolution: predicts changes of networks’ statistics, e.g. clustering coefficients and degree distributions, etc.

Datasets  Relational Network where user pairs are distinct (Tencent, Epinion)  Interaction Network where users can interact with each other several times (the other six datasets)

Link prediction  Estimate ITCom parameters, generate the probabilities of interactions among users.  Measure the performance with Mean Average Precision

Network Evolution  Predict the degree distribution and the clustering coefficient.  Compare with Microscopy Evolution model.

3 rd Paper  A Complex Network Analysis of the United States Air Transportation  ASONAM’12  D. P. Cheung, M. H. Gunes

Purposes  Analyze the air transportation network to better understand its characteristics  Analyze its changes over the past two decades

Dataset  Generate networks from the public data from The Bureau of Transportation Statistics’ TranStats website.  Each node is an airport.  A direct edge represents an available route.  The edge weight represents the number of pasengers, freight, and mail transported between airports.

Results