Learning TFC Meeting, SRI March 2005 On the Collective Classification of Email “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

A Tutorial on Learning with Bayesian Networks
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Dynamic Bayesian Networks (DBNs)
Statistical Topic Modeling part 1
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
Probabilistic inference
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A Relational Representation for Procedural.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
InfoMagnets : Making Sense of Corpus Data Jaime Arguello Language Technologies Institute.
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Distributed Representations of Sentences and Documents
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.
Classifying into Acts From EMNLP-04, Learning to Classify into Speech Acts, Cohen-Carvalho-Mitchell An Act is described as a verb- noun pair.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Machine Learning CS 165B Spring 2012
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Isolated-Word Speech Recognition Using Hidden Markov Models
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Learning to Classify into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
On Node Classification in Dynamic Content-based Networks.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Classification Ensemble Methods 1
Conditional Markov Models: MaxEnt Tagging and MEMMs
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Predicting Leadership Roles in Workgroups Vitor R. Carvalho, Wen Wu and William W. Cohen Carnegie Mellon University CEAS-2007, Aug 2 nd 2007.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Today Graphical Models Representing conditional dependence graphically
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
SIGIR, August 2005, Salvador, Brazil On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.
Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
ANNOUCEMENTS 9/3/2015 – NO CLASS 11/3/2015 – LECTURE BY PROF.IR.AYOB KATIMON – 2.30 – 4 PM – DKD 5 13/3/2015 – SUBMISSION OF CHAPTER 1,2 & 3.
Learning Relational Dependency Networks for Relation Extraction
Multimodal Learning with Deep Boltzmann Machines
A Markov Random Field Model for Term Dependencies
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Towards a Personal Briefing Assistant
Topic models for corpora and for graphs
IE With Undirected Models
Discriminative Probabilistic Models for Relational Data
Sequential Learning with Dependency Nets
Presentation transcript:

Learning TFC Meeting, SRI March 2005 On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University

Classifying into Acts From EMNLP-04, Learning to Classify into Speech Acts, Cohen- Carvalho-Mitchell From EMNLP-04, Learning to Classify into Speech Acts, Cohen- Carvalho-Mitchell An Act is described as a verb- noun pair (e.g., propose meeting, request information) - Not all pairs make sense. One single message may contain multiple acts. An Act is described as a verb- noun pair (e.g., propose meeting, request information) - Not all pairs make sense. One single message may contain multiple acts. Try to describe commonly observed behaviors, rather than all possible speech acts in English. Also include non- linguistic usage of (e.g. delivery of files) Try to describe commonly observed behaviors, rather than all possible speech acts in English. Also include non- linguistic usage of (e.g. delivery of files) Nouns Verbs

Idea: Predicting Acts from Surrounding Acts Delivery Request Commit Proposal Request Commit Delivery Commit Delivery > Act has little or no correlation with other acts of same message Strong correlation with previous and next message’s acts Example of Sequence

 Winograd and Flores, 1986: “Conversation for Action Structure”  Murakoshi et al. 1999; “ Construction of Deliberation Structure in ” Related work on the Sequential Nature of Negotiations

Data: CSPACE Corpus Few large, free, natural corpora are available Few large, free, natural corpora are available CSPACE corpus (Kraut & Fussell) CSPACE corpus (Kraut & Fussell) o s associated with a semester-long project for Carnegie Mellon MBA students in 1997 o 15,000 messages from 277 students, divided in 50 teams (4 to 6 students/team) o Rich in task negotiation. o More than 1500 messages (from 4 teams) were labeled in terms of “Speech Act”. o One of the teams was double labeled, and the inter- annotator agreement ranges from 72 to 83% (Kappa) for the most frequent acts.

Evidence of Sequential Correlation of Acts Transition diagram for most common verbs from CSPACE corpus It is NOT a Probabilistic DFA Act sequence patterns: (Request, Deliver+), (Propose, Commit+, Deliver+), (Propose, Deliver+), most common act was Deliver Less regularity than the expected ( considering previous deterministic negotiation state diagrams)

Content versus Context Content: Bag of Words features only Context: Parent and Child Features only ( table below) 8 MaxEnt classifiers, trained on 3F2 and tested on 1F3 team dataset Only 1 st child message was considered (vast majority – more than 95%) Kappa Values on 1F3 using Relational (Context) features and Textual (Content) features. Parent Boolean Features Child Boolean Features Parent_Request, Parent_Deliver, Parent_Commit, Parent_Propose, Parent_Directive, Parent_Commissive Parent_Meeting, Parent_dData Child_Request, Child_Deliver, Child_Commit, Child_Propose, Child_Directive, Child_Commissive, Child_Meeting, Child_dData Set of Context Features (Relational) Delivery Request Commit Proposal Request ??? Parent messageChild message

Collective Classification using Dependency Networks Dependency networks are probabilistic graphical models in which the full joint distribution of the network is approximated with a set of conditional distributions that can be learned independently. The conditional probability distributions in a DN are calculated for each node given its neighboring nodes (its Markov blanket). No acyclicity constraint. Simple parameter estimation – approximate inference (Gibbs sampling) No acyclicity constraint. Simple parameter estimation – approximate inference (Gibbs sampling) In this case, Markov blanket = parent message and child message In this case, Markov blanket = parent message and child message Heckerman et al., JMLR Neville & Jensen, KDD-MRDM Heckerman et al., JMLR Neville & Jensen, KDD-MRDM-2003.

Collective Classification algorithm (based on Dependency Networks Model)

Agreement versus Iteration Kappa versus iteration on 1F3 team dataset, using classifiers trained on 3F2 team data. Kappa versus iteration on 1F3 team dataset, using classifiers trained on 3F2 team data.

Leave-one-team-out Experiments 4 teams: 1f3(170 msgs), 2f2(137 msgs), 3f2(249 msgs) and 4f4(165 msgs) 4 teams: 1f3(170 msgs), 2f2(137 msgs), 3f2(249 msgs) and 4f4(165 msgs) (x axis)= Bag-of-words only (x axis)= Bag-of-words only (y-axis) = Collective classification results (y-axis) = Collective classification results Different teams present different styles for negotiations and task delegation. Different teams present different styles for negotiations and task delegation. Kappa Values

Leave-one-team-out Experiments Consistent improvement of Commissive, Commit and Meet acts Consistent improvement of Commissive, Commit and Meet acts Kappa Values

Leave-one-team-out Experiments Deliver and dData performance usually decreases Deliver and dData performance usually decreases Associated with data distribution, FYI, file sharing, etc. Associated with data distribution, FYI, file sharing, etc. For “non-delivery”, improvement in avg. Kappa is statistically significant (p=0.01 on a two-tailed T-test) For “non-delivery”, improvement in avg. Kappa is statistically significant (p=0.01 on a two-tailed T-test) Kappa Values

Act by Act Comparative Results Kappa values with and without collective classification, averaged over the four test sets in the leave-one-team out experiment.

Discussion and Conclusion  Sequential patterns of acts were observed in the CSPACE corpus.  These patterns, when studied an artificial experiment, were shown to contain valuable information to the -act classification problem.  Different teams present different styles for negotiations and task delegation.  We proposed a collective classification scheme for Speech Acts of messages. (based on a Dependency Network model)

Conclusion  Modest improvements over the baseline (bag of words) were observed on acts related to negotiation (Request, Commit, Propose, Meet, etc). A performance deterioration was observed for Delivery/dData (acts less associated with negotiations)  Agrees with general intuition on the sequential nature of negotiation steps.  Degree of linkage in our dataset is small – which makes the observed results encouraging.