Unsupervised Modeling of Twitter Conversations

Slides:

Advertisements

Similar presentations

A probabilistic model for retrospective news event detection

Advertisements

Yansong Feng and Mirella Lapata

Topic models Source: Topic models, David Blei, MLSS 09.

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.

Title: The Author-Topic Model for Authors and Documents

Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis Jing Wang, New York University Siamak Faridani, University of California,

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.

Probabilistic Clustering-Projection Model for Discrete Data

Statistical Topic Modeling part 1

Caimei Lu et al. (KDD 2010) Presented by Anson Liang.

Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

… Hidden Markov Models Markov assumption: Transition model:

The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Latent Dirichlet Allocation a generative model for text

. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.

British Museum Library, London Picture Courtesy: flickr.

Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.

Scalable Text Mining with Sparse Generative Models

Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..

Learning Hidden Markov Model Structure for Information Extraction Kristie Seymour, Andrew McCullum, & Ronald Rosenfeld.

Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.

Integrating Topics and Syntax Paper by Thomas Griffiths, Mark Steyvers, David Blei, Josh Tenenbaum Presentation by Eric Wang 9/12/2008.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.

Graphical models for part of speech tagging

Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Query Based Event Extraction along a Timeline H.L. Chieu and Y.K. Lee DSO National Laboratories, Singapore (SIGIR 2004)

Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Xinran He1, Theodoros Rekatsinas2,

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Weakly Supervised Models of Aspect-Sentiment for Online Course Discussion Forums ARTI RAMESH SHACHI H. KUMAR JAMES FOULDS LISE GETOOR.

Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.

CSE 517 Natural Language Processing Winter 2015

Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.

The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.

Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.

Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) Authors: Qiming Diao, Minghui Qiu, Chao-Yuan Wu Presented by Gemoh Mal.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Online Multiscale Dynamic Topic Models

Entity- & Topic-Based Information Ordering

Stochastic Optimization Maximization for Latent Variable Models

Michal Rosen-Zvi University of California, Irvine

Topic models for corpora and for graphs

Topic Models in Text Processing

GhostLink: Latent Network Inference for Influence-aware Recommendation

Presentation transcript:

Unsupervised Modeling of Twitter Conversations Alan Ritter (UW) Colin Cherry (NRC) Bill Dolan (MSR)

Twitter Most of Twitter looks like this: I want a cuban sandwich extra bad!! About 10-20% are replies I 'm going to the beach this weekend! Woo! And I'll be there until Tuesday. Life is good. Enjoy the beach! Hope you have great weather! thank you  What do twitter conversations look like? Most are not conversational -People just broadcasting information about themselves 10-20% are in reply to some other post -These tend to form short conversations.

Gathering Conversations Twitter Public API Public Timeline 20 randomly selected posts per minute Use to get random sample of twitter users Query to get all their posts Follow any that are replies to collect conversations No need for disentanglement [Elsner & Charniak 2008]

Conversation Length (number of Tweets) Frequency 2 952603 3 214787 4 97533 5 43447 6 26349 7 14432 8 10277 9 6180 10 4614 11 3004 12 2450 13 1764 14 1377 15 1000 16 888 17 652 18 541 …

Modeling Latent Structure: Dialogue Acts I 'm going to the beach this weekend! Woo! And I'll be there until Tuesday. Life is good. Enjoy the beach! Hope you have great weather! thank you  Status Comment Our goal is to model latent structure in conversations Traditionally referred to as “Dialogue Acts” or “Speech Acts” Tag each utterance with a label describing it’s role in the conversation Thanks

Dialogue Acts: Many Useful Applications Conversation Agents [Wilks 2006] Dialogue Systems [Allen et al. 2007] Dialogue Summarization [Murray et al. 2006] Flirtation Detection [Ranganath et al. 2009] … Dialogue Acts provide an initial level of semantic representation. They have been shown useful in many applications: -Conversational Agents & Dialogue Systems need to know whether user is asking a question, or giving command. -Also been shown useful in Summarization -Flirtation Detection (maybe interesting application to Twitter?)

Traditional Approaches Gather Corpus of Conversations focus on speech data Telephone Conversations - [Jurafsky et. al. 1997] Meetings - [Dhillon et. al. 2004] [Carletta et. al. 2006] Annotation Guidelines Manual Labeling Expensive Traditionally, Dialogue Act Taggers have been built using the “Annotate, Train, Test” pradigm Lots of Manual Labeling – Very expensive Traditionally focused on conversational speech: -Telephone conversations -Recorded meetings

Dialogue Acts for Internet Conversations Lots of Variety Email [Cohen et. al. 2004] Internet Forums [Jeong et. al. 2009] IRC Facebook Twitter … More on the horizon? Tags from speech data not always appropriate Includes: Backchannel, distruption, floorgrabber Missing: Meeting request, Status post, etc… Much wider variety of conversations on the Internet Tags from speech not always appropriate Difficult to design a set of domain-independent dialogue acts (always missing something from some domain).

Our Contributions Unsupervised Tagging of Dialogue Acts First application to open domain Dialogue Modeling on Twitter Potential for new language technology applications Lots of data available Release a dataset of Twitter conversations Collected over several months http://research.microsoft.com/en-us/downloads/8f8d5323-0732-4ba0-8c6d-a5304967cc3f/default.aspx To address these challenges we’ve been investigating unsupervised dialogue act tagging -Automatically Discover dialogue acts which best describe the domain First application to “open domain” conversations where users can talk about any topic -Strong tendency to cluster together posts on the same topic

Modeling Latent Structure: Dialogue Acts I 'm going to the beach this weekend! Woo! And I'll be there until Tuesday. Life is good. Enjoy the beach! Hope you have great weather! thank you  Status Comment Predictable transitions Thanks

Discourse constraints I 'm going to the beach this weekend! Woo! And I'll be there until Tuesday. Life is good. Enjoy the beach! Hope you have great weather! thank you  Status Comment Predictable transitions Thanks

Words indicate dialogue act I 'm going to the beach this weekend! Woo! And I'll be there until Tuesday. Life is good. Enjoy the beach! Hope you have great weather! thank you  Status Comment Latent structure Animate labels Thanks

Conversation Specific Topic Words I 'm going to the beach this weekend! Woo! And I'll be there until Tuesday. Life is good. Enjoy the beach! Hope you have great weather! thank you  Status Comment Latent structure Animate labels Thanks

Content Modeling [Barzilay & Lee 2004] Summarization Model order of events in news articles Very specific topics: e.g. Earthquakes Model Sentence Level HMM states emit whole sentences Learn parameters with EM Where, when Rictor scale The Content Modeling work from Summarization, looks pretty close to what we want -Modeling news events in news articles Damage

Adapt CM to Unsupervised DA Tagging Adapt content modeling approach to unsupervised dialogue act tagging -Rename hidden states acts Just a Hidden Markov model where each hidden state generates a bag of words as opposed to a single word in an HMM POS tagger.

Content Models: Pros/Cons Models Transitions Between Dialogue Acts Unsupervised Cons: Topic Clusters Barzilay & Lee masked out named entities to focus on general (not specific) events This is more difficult on Twitter Content modeling is attractive because it is unsupervised, and models discourse constraints, however when directly applied it tends to discover clusters of posts which represent a common topic, and not any particular dialogue act. Conversations on Twitter can occur on any topic, so there are strong clusters of topically-related posts.

Problem: Strong Topic Clusters We want: Not: Status Comment Thanks Technology Food We want a set of dialogue acts which give a shallow semantic representation of the conversation. Not a set of topic clusters which self-transition with high probability and partition conversations into topics.

Goal: separate content/dialogue words Dialogue act words Conversation specific words LDA-style topic model Each word is generated from 1 of 3 sources: General English Conversation Topic Specific Vocabulary Dialogue Act specific Vocabulary Similar to: [Daume III and Marcu, 2006] [Haghighi & Vanderwende 2009]

Conversation Model Just a reminder, the content-modeling framework is a sentence-level HMM where the hidden states emit a bag of words.

Conversation+Topic Model We’re proposing to add a set of hidden variables: -One for each word -They determine the source from which the word is drawn: -A multinomial over words representing the topic of conversation -General English -A multinomial associated with the dialogue act.

Inference Slice Sampling Hyperparameters [Neal 2003] Collapsed Gibbs sampling Sample each hidden variable conditioned on assignment of all others Integrate out parameters But, lots of hyperparameters to set Act transition multinomial Act emission multinomial Doc-specific multinomal English multinomial Source distribution multinomial EM in this model becomes problematic -Can’t efficiently sum out hidden variables using dynamic programming We decided to move to a sampling-based approach to inference -Collapsed Gibbs But, there are a lot of multinomials in our model, so rather than set them to some arbitrary value, we use slice sampling, and treat them like additional hidden variables. Slice Sampling Hyperparameters [Neal 2003]

Probability Estimation Problem: Need to evaluate the probability of a conversation Integrate out hidden variables Use as a language model Chibb-style Estimator [Wallach et. al 2009] [Murray & Salakhutdinov 2009] Because we can’t do efficient dynamic programming, we need to fall back to an approximation in order to evaluate the probability of held-out test dataChibbb Chibb-Style estimator: -Alternative to things like Harmonic Mean Method -More accurate -We have some data and hidden variables -Pick an arbitrary assignment to the hidden variables -Can easily compute the joint probability of the data and hidden variables -This reduces the problem to computing the conditional probability of the hidden variables given the data (can be done using sampling)

Chibb Style Estimator [Wallach et Chibb Style Estimator [Wallach et. al 2009] [Murray & Salakhutdinov 2009] Just Need to estimate:

Qualitative Evaluation Trained on 10,000 Twitter conversations of length 3 to 6 tweets

Conversation+Topic model Dialogue act transitions The Conversation+Topic model discovered the most interpretable set of dialogue acts, and didn’t’ include any “topic” clusters. Visualization of the transition matrix from the HMM -Hidden states are labeled with my interpretation of the acts.

Status Word clouds represent the dialogue acts. Each word’s size is proportional to it’s probability given the act.

Question

Question to Followers

Reference Broadcast

Reaction

Evaluation How do we know which works best? How well can we predict sentence order? Generate all permutations of a conversation Compute probability of each How similar is highest ranked to original order? Measure permutation similarity with Kendall Tau Counts number of swaps needed to get desired order Evaluation is difficult -We don’t have a set of gold-labeled conversations -We want to remain agnostic about the right set of dialogue acts -Allow each model to discover a set of dialogue acts Tau is based on the number of exchanged pairs ranges from -1 to 1 Reverse order is -1 Correct order is +1

Experiments – Conversation Ordering EM Conversation Model does much better than simple Bigram LM baseline.

Experiments – Conversation Ordering Conversation+Topic does better, but these models are using different methods for inference

Experiments – Conversation Ordering Content words help predict sentence order -Adjacent sentences contain similar content words Experiments – Conversation Ordering We implemented a sampler for Conversation Model, and it performs a bit better -But acts include topic clusters we wanted to avoid. Content words can help to predict sentence order -Adjacent sentences contain similar content words – provides useful constraints for sentence ordering

Conclusions Presented a corpus of Twitter Conversations http://research.microsoft.com/en-us/downloads/8f8d5323-0732-4ba0-8c6d-a5304967cc3f/default.aspx Conversations on Twitter seem to have some common structure Strong topic clusters are a problem for open-domain unsupervised DA tagging Presented an approach to address this Gibbs Sampling/Full Bayesian inference seems to outperform EM on Conversation Ordering