Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media.

Slides:



Advertisements
Similar presentations
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
Advertisements

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Alias Detection in Link Data Sets Master’s Thesis Paul Hsiung.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Clementine Server Clementine Server A data mining software for business solution.
Data Mining – Intro.
Overview of Web Data Mining and Applications Part I
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Peter Myers Bitwise Solutions Pty Ltd. Predictive Analytics PresentationExplorationDiscovery Passive Interactive Proactive Business Insight Canned.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Eric Hsueh-Chan Lu 2 and Vincent S. Tseng 1 1 Institute of Computer Science and Information.
Microblogs: Information and Social Network Huang Yuxin.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
​ Text Analytics ​ Teradata & Sabanci University ​ April, 2015.
Network Community Behavior to Infer Human Activities.
Measuring Behavioral Trust in Social Networks
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Mining information from social media
Unsupervised Streaming Feature Selection in Social Media
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Modeling and Visualizing Information Propagation in Microblogging Platforms Chien-Tung Ho, Cheng-Te Li, and Shou-De Lin National Taiwan University ASONAM.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Graph clustering to detect network modules
Oracle Advanced Analytics
Experience Report: System Log Analysis for Anomaly Detection
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Using Social Media to Enhance Emergency Situation Awareness
Clickprints on the Web: Are there Signatures in Web Browsing Data?
Data Mining – Intro.
DATA MINING © Prentice Hall.
Data Analytics for ICT.
Source: Procedia Computer Science(2015)70:
Smart Portal To Protect Child Online
The Institute of Scientific and Technical Information of China
A Network Science Approach to Fake News Detection on Social Media
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
GANG: Detecting Fraudulent Users in OSNs
Data Warehousing Data Mining Privacy
Leverage Consensus Partition for Domain-Specific Entity Coreference
Pei Lee, ICDE 2014, Chicago, IL, USA
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Credit Card Fraudulent Transaction Detection
Is Statistics=Data Science
Presentation transcript:

Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media

Social Media Customer Analytics 2 Network topology namesexagediseasesalary AdaF18cancer25k BobM25heart110k … idSexageaddressIncome 5FYNC25k 3MYSC110k Structured profile Retweet sequence Unstructured text (e.g., blog, tweet) Customer profile Customer transaction Inventory Product desc and review … Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy

Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 3

Multi-factor interaction analysis 4 For each following relationship, what factors affect the user A’s decision on whether to forward messages from B to A’ s followers? We examine users’ retweet behaviors by using various features Power ratio (A) Link structure (B) Location factor (C) Gender factor (D) … We apply a fitted Log-linear model to capture and interpret interaction patterns among features A-D and retweet E.

Interpreting interaction effect 5

Interpretation example Neither gender nor location has any significant effect on retweeting solely. However, considering link structure, Females are more conservative and have a lower tendency to retweet messages from non-friend (especially female) users, but have a higher tendency to retweet messages from friends or superstars. Males are more open-minded and have a higher tendency to retweet messages from non-friend (especially female) users. 6

Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 7

Retweet Sequence Information dynamically flows through the network. 8 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A

Retweet Sequence Information dynamically flows through a social network. 9 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A t2m2Bt1m1A

Flow Through Tree Structure Information dynamically flows through a social network. 10 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A t2m2Bt1m1A t3m3D\t Bt1m1A

Flow Through Tree Structure Information dynamically flows through a social network. 11 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A t2m2Bt1m1A t3m3D\t Bt1m1A t4m4Ct1m1A …

WISE12 Challenge Sina Weibo # of user: 5,636,858 # of tweets: 46,584,914 # of retweets: 190,920, test messages each with 100 initial retweets composed by 27 users from 6 events For each message, predict M1: the number of retweets in 30 days M2: the number of possible-views in 30 days 12

Idea We treat retweeting activities of each original message in the training data as a time series Each value corresponds to the number of times that the original message during time period t For each message in the test data 13 Known from 100 retweets Use ARMA to predict

Prediction Result 14 Runner-up award (2 nd place) on WISE 2012 Challenge – Mining Track. Death of Steve Jobs Xiaomi Release Yao Jiaxin Murder Case Xiaomi Release

Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 15

Bursts 16 Peak Time Duration Time

Topic 17

Retweet vs. Time 18

Retweet vs. Time 19

Burst Analysis : Users Top 100 users tend to have: shorter path length, shorter peak time, shorter duration time. 20

Burst Prediction Extract features User related including profile and history information Tweet-related including time series and retweet tree Run classifiers Logistic regression Random forest Decision tree Naïve bayes SVM KNN Achieve 83.2% accuracy 21

Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 22

Spectral graph analysis Spectral coordinate: Polbook Network 23

Accuracy of AdjCluster Lap [Miller and Teng 1998] : Laplacian based Ncut [Shi and Malik, 2000] : Normalized cut HE’ [Wakita and Tsurumi, 2007] : Modularity based agglomerative clustering SpokEn [Prakash et al., 2010] : EigenSpoke Accuracy: where :the i-th community produced by different algorithms 24 Refer to IJCAI 11 for details

Evaluation on Web spam challenge data SPCTRA fraud detection 25 GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008] times faster Refer to ICDE11details.

Acknowledgments This work was supported in part by U.S. National Science Foundation CNS and CCF , and UNC Charlotte Chancellor’s Special Fund. Thank You! Questions? 26