Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University.

Slides:

Advertisements

Similar presentations

SEARCHING THE BLOGOSPHERE

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System.

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Unsupervised Modeling of Twitter Conversations

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

Analysis of Twitter Data NIKHIL PURANIK CMSC 601 – Research Skills 25 th April 2011UNIVERSITY OF MARYLAND BALTIMORE COUNTY.

Active Learning and Collaborative Filtering

1 © Ramesh Jain Social Life Networks: Ontology-based Recognition Ramesh Jain Contact:

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.

EARLY DETECTION OF TWITTER TRENDS MILAN STANOJEVIC UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING.

Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.

Cluster based fact finders Manish Gupta, Yizhou Sun, Jiawei Han Feb 10, 2011.

On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date ： 2014/05/13 Source ： CIKM’13 Advisor ： Prof. Jia-Ling, Koh Speaker ： Yi-Hsuan.

How do I decide whom to follow on Twitter ? IARank: Ranking Users on Twitter in Near Real-time, Based on their Information Amplification Potential.

Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.

Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Viral Video Style: A Closer Look at Viral Videos on YouTube Lu Jiang, Yajie Miao, Yi Yang, Zhenzhong Lan, Alexander G. Hauptmann School of Computer Science,

Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.

Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.

Microblogs: Information and Social Network Huang Yuxin.

Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.

Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang

Sampling Methods and Sampling Distributions

By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.

Query Based Event Extraction along a Timeline H.L. Chieu and Y.K. Lee DSO National Laboratories, Singapore (SIGIR 2004)

Al Jomaih Social Media. Facebook Page 90 Likes Started with 7,772 Reached Likes.

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.

Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.

Twitter: Who is using it?. What is Twitter? Social media site where users “tweet” messages in a maximum of 140 characters Users “follow” other users of.

1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Socialbots and its implication On ONLINE SOCIAL Networks Md Abdul Alim, Xiang Li and Tianyi Pan Group 18.

Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.

From SEO to “SEE” Carmen Cano April, 2012 …does it make a sound?" "If a tree falls in a forest and no one is around to hear it…

A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Unsupervised Streaming Feature Selection in Social Media

TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.

A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.

Modeling and Visualizing Information Propagation in Microblogging Platforms Chien-Tung Ho, Cheng-Te Li, and Shou-De Lin National Taiwan University ASONAM.

2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.

Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.

Antisocial Behavior in Online Discussion Communities Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizily, Jure Leskovec Presented by: Ananya Subburathinam.

St. Luke’s Social Media Facebook  1,238 Likes; 14 Talking About This  Date Joined: October 26, 2010  Last Activity: March 26, 2011  Seldom posts status.

Predicting Subway Passenger Flow with Social Media under Event Occurrences Qing He joint work with Ming Ni and Jing Gao First Annual Symposium TransInfo.

Designing a framework For Recommender system Based on Interactive Evolutionary Computation Date : Mar 20 Sat, 2011 Project Number :

Negative Link Prediction and Its Applications in Online Political Networks Mehmet Yigit Yildirim Mert Ozer Hasan Davulcu.

Using Social Media to Enhance Emergency Situation Awareness

Yi-Chia Wang LTI 2nd year Master student

Representing Documents Through Their Readers

#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),

Deep Learning Research & Application Center

Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16

A Network Science Approach to Fake News Detection on Social Media

Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Presentation transcript:

Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University

Some Facts about Twitter December million Tweets are sent per day 80% of Twitter active users are on mobile 77% of accounts are outside the U.S. 284 million monthly active users Statistics collected in December 2014

Events on Twitter The volume of tweets on an event shows its popularity December Tweets per minute 20 big moments on Twitter

Event Identification Can we identify the major events tweeted on Twitter within a certain period? – Identify event-related tweets – Cluster these tweets such that each cluster is a single event – Rank the clusters by volume December 20154

Event Analysis Can we characterize events by linking them to general topics? – E.g. football games and Olympic games are related to sports, whereas presidential debates are related to politics Can we link events to users’ personal preferences? – E.g. User A likes to tweet about sports events while User B likes to tweet about political events December 20155

Applications of Event Identification and Analysis December Event Identification and Analysis on Twitter Stock Market Prediction Event Recommendation Opinion Analysis

This Talk A unified model for topics, events and users on Twitter [Diao & Jiang, EMNLP’13] – Related work – Our model – Experiments – Conclusions December 20157

Related Work Event detection ([Sakaki et al. 2010] [Petrovic et al. 2010] [Weng & Lee, 2011] [Becker et al. 2011] [Li et al. 2012]) – Online, real-time, early detection Temporal topic modeling – Fixed number of topics ([Blei & Lafferty, 2006] [Wang & McCallum, 2006] [Wang et al. 2007]) – Non-parametric ([Ahmed & Xing, 2008] [Ahmed et al. 2011] [Tang & Yang, 2012]) Applied to news articles December 20158

Chinese Restaurant Process December Fix number of clusters: 2 … Items: Traditional Generative Clustering Model Chinese Restaurant Process

Recurrent Chinese Restaurant Process December … t-1 … t+1 … items: t

Recurrent Chinese Restaurant Process December Events on date t-1 Events on date t Super bowl Concert Traffic accident Concert Fashion show Traffic accident RCRP t … …

Limitations of Directly Applying RCRP Not every tweet is event-related – Our solution: separate tweets into personal topic- related tweets and event-related tweets RCRP models the “rich-get-richer” phenomenon but not the burstiness of events on social media – Social media items have two properties: imitation and recency [Leskovec et al. 2009] – Our solution: penalize event clusters that have long durations December

Base Model December Tweets on date t Sports 0.3 Food 0.2 Music 0.1 … H T Sports Food H H Topic T Events on date t-1 Event Concert Events on date t Super bowl Concert Traffic accident Concert Fashion show Traffic accident Personal Interests RCRP

Duration-based Regularization December Super bowl Concert Events on date t-1 Events on date t Super bowl Concert Traffic accident Concert Fashion show Traffic accident RCRP Traffic accident Date t

Relating Events to Topics In the base model, tweets are separated into two types: – Topic tweets: each tweet belongs to one of a fixed number of general topics – Event tweets: each tweet belongs to an event cluster modeled by RCRP How can we model and capture the correlations between events and topics? December

Event-topic Affinity Vector December Sports 0.6 Music 0.2 Fashion 0.1 … Sports 0.3 Music 0.2 Fashion 0.1 … Sports 0.1 Music 0.1 Fashion 0.7 … Super bowl Fashion show Events on date t-1 Events on date t Super bowl Concert Traffic accident Concert Fashion show Traffic accident RCRP Event-Topic Affinity Vector Inner Popularity + + dot product

The Model December DtDt U ∞ ∞ A T N1N1 NtNt NTNT … … BaseBase+RegBase+Reg+Aff Balasubramanyan and Cohen (SDM 2013) The idea: If timestamps of tweets in the event cluster deviate much from t, the probability of observing r becomes smaller.

Experiments Dataset: – 500 users randomly selected from ~150K Singapore Twitter users – Their tweets from with 1st April 2012 to 30th June 2012 – 655,881 tweets in total Methods for comparison – TimeUserLDA: Diao et al. (2012) “Finding bursty topics from Microblogs” – Base: Our method without time duration regularization and event-topic affinity. – Base+Reg: Our method without event-topic affinity. – Base+Reg+Aff December

Quality of Most Popular Events Ground truth generation: o For each method, rank identified events by its magnitude. o Merge top-30 events from each method, then randomly pick 100 tweets from each event. o For each event, provide the 100 tweets to two human judges, and ask them to score 1 (true )or 0 (false). Only when both judges score 1, we treat the event as true. (0.744 Cohen’s Kappa) Quality of top events: December Table 1: for the various methods

Quality of Most Popular Events December Top 5 events identified by Base+Reg+Aff: LabelTop WordsPeriodInner Popularity Debate Caused by Manda Swaggie singapore, bieber, europe, amanda, justin 17 June ~ 19 June Indonesia TsunamiTsunami, earthquake, indonesia, singapore, hit 10 April ~ 12 April SJ encore concert #ss4encore, cr, # ss4encoreday2, hyuk, May ~ 28 May Mother’s Day Day, happy, mother’s, mothers, love 11 May ~ 14 May April Fools’ Day April, fools, day, fool, joke 1 April ~ 3 April Table 2: The top 5 events identified by our model, in which story name is manually labeled.

Event Recommendation December Event recommendation: o Purpose: recommend an event to the users who have not posted on it. Topics & Events 500 Users April & May 2012 Events June 2012 We randomly pick half of the users to learn the events in June, and we pick 8 common ones shared by most methods. Recommend We randomly pick 100 users from the remaining 250 users, and read their tweets to justify whether they tweet on the 8 events.

Results of Event Recommendation December Event recommendation: Table 4: For the 8 events that happened in June 2012, we compute the Average Precision for each event. We also show the Mean Average Precision when applicable. EventTimeUserLDABaseBase+RegBase+Reg+AffInner Popularity E E E E4N/A E5N/A E6N/A E7N/A E MAPN/A With the event-topic affinity vector, we can do better recommendation. The event-topic affinity vectors are especially useful to recommend events that attract only certain people’s attention, such as those related to sports, music, etc. With the event-topic affinity vector, we can do better recommendation. The event-topic affinity vectors are especially useful to recommend events that attract only certain people’s attention, such as those related to sports, music, etc.

Example Events December Grouping events by topics: Table 4: Example topics and their corresponding highly related events.

Conclusions We proposed a unified model for events, topics and user interests on Twitter – The model can identify meaningful events – The model can identify users’ personal topical interests – The model can align events with general topics Future work – Event labeling/summarization – Modeling event evolution December

Acknowledgment Qiming Diao LARC December

Thank You! Questions? December

Our Work Finding bursty topics from microblogs [Diao et al., ACL’12] – We designed a TimeUserLDA model to find bursty topics (where the number of topics is fixed) and used a two-state machine to perform post-processing on the bursty topics to identify events Recurrent Chinese restaurant process with a duration-based discount for event identification from Twitter [Diao & Jiang, SDM’14] – We used non-parametric models to identify events (where the number of events is not fixed). The model is modified from Recurrent Chinese Restaurant Process (RCRP) by Ahmed & Xing [SDM’08]. December