Approaches for Automatically Tagging Affect Nathanael Chambers, Joel Tetreault, James Allen University of Rochester Department of Computer Science.

Slides:

Advertisements

Similar presentations

Text Categorization.

Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

University of Sheffield NLP Module 4: Machine Learning.

Traditional IR models Jian-Yun Nie.

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

CSE3201/4500 Information Retrieval Systems

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Preparing for the 6th Grade Science MOSL ** DO NOT WRITE IN PACKET **

Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University.

Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.

Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.

Vector Space Model CS 652 Information Extraction and Integration.

Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Scalable Text Mining with Sparse Generative Models

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.

An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Multiple testing correction

Text Analysis Everything Data CompSci Spring 2014.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

1 Computational Linguistics Ling 200 Spring 2006.

Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.

Selection Focus 5-1 Literary Elements Trans. 5-1.

Bug Localization with Machine Learning Techniques Wujie Zheng

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Database Systems Microsoft Access Practical #3 Queries Nos 215.

Chapter 6: Information Retrieval and Web Search

1 Computing Relevance, Similarity: The Vector Space Model.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.

National Taiwan University, Taiwan

Vector Space Models.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Research Progress Kieu Que Anh School of Knowledge, JAIST.

Language Identification and Part-of-Speech Tagging

A Simple Approach for Author Profiling in MapReduce

N-Gram Based Approaches

Erasmus University Rotterdam

Multimedia Information Retrieval

Presentation transcript:

Approaches for Automatically Tagging Affect Nathanael Chambers, Joel Tetreault, James Allen University of Rochester Department of Computer Science

Affective Computing Why use computers to detect affect? –Make human-computer interaction more natural Computers express emotion And detect user’s emotion Tailor responses to situation –Use affect for text summarization Understanding affect improves computer- human interaction systems

From the Psychologist’s P.O.V However, if computers can detect affect, it can also help humans understand affect By observing the changes in emotion and attitude in people conversing, psychologists can determine correct treatments for patients

Marriage Counseling Emotion and communication are important to mental and physical health Psychological theories suggest that how well a couple copes with serious illness is related to how well they interact to deal with it Poor interactions (ie. Disengagement during conversations) can at times exacerbate an illness Tested hypothesis by observing the engagement- levels of conversation between married-couples presented with a task

Example Interactions Good interaction sequence: W: Well I guess we'd just have to develop a plan wouldn't we? H: And we would be just more watchful or plan or maybe not, or be together more when the other one went to do something W: In other words going together H: Going together more W: That's right. And working more closely together and like you say, doing things more closely together. And I think we certainly would want to share with the family openly what we felt was going on so we could kind of work out family plans Poor interaction sequence: W: So how would you deal with that? H: I don't know. I'd probably try to help. And you know, go with you or do things like that if I, if I could. And you know, I don't know. I would try to do the best I could to help you

Testing theory Record and transcribe conversations of married couples presented with “what-if” scenario of one of them having Alzheimer’s. –Participants asked to discuss how they would deal with the sickness Tag sentences of transcripts with affect-related codes. Certain textual patterns evoke negative or position connotations Use distribution of tags to look for correlations between communication and marital satisfaction Use tag distribution to decide on treatment for couple

Problem However tagging (step 2) is time- consuming and requires training time for new annotators, as well as being unreliable Solution: use computers to do tagging work so psychologists can spend more time with patients and less time coding

Goals Develop algorithms to automatically tag transcripts of a Marriage Counseling Corpus (Shields, 1997) Develop a tool that human annotators can use to pre-tag a transcript given the best algorithm, and then quickly correct it

Outline Background Marriage Counseling Corpus N-gram based approaches Information-Retrieval/Call Routing approaches Results CATS Tool

Background Affective computing, or detecting emotion in texts or from a user, is a young field Earliest approaches used keyword matching Tagged dictionaries with grammatical features (Boucouvalas and Ze, 2002) Statistical methods – LSA (Webmind project), TSB (Wu et al., 2000) to tag a dialogue Liu et al. (2003) use common-sense rules to detect emotion in s

New Methods for Tagging Affect Our approaches differ from others in two ways: Use different statistical methods based on computing N-grams Tag individual sentences as opposed to discourse chunks Our approaches are based on methods that have been successful in another domain: discourse act tagging

Marriage Counseling Corpus 45 annotated transcripts of married couples working on a task of Alzheimer’s Collected by psychologists in the Center for Future Health, Rochester, NY Transcripts broken into “thought units” – one or more sentences that represent how the speaker feels toward a topic (4,040 total) Tagging thought units takes into account positive and negative words, level of detail, comments on health, family, travel, etc, sensitivity

Code Tags DTL – “Detail” (11.2%) speaker’s verbal content is concise and distinct with regards to illness, emotions, dealing with death: –“It would be hard for me to see you so helpless” GEN – “General” (41.6%) verbal content towards illness is vague or generic, or speaker does not take ownership of emotions: –“I think that it would be important”

Code Tags SAT: “Statements About the Task” – (7.2%) couple discusses what the task is, how to perform it: –“I thought I would be the caregiver” TNG – “Tangent” – (2.9%) statements that are way off topic. ACK – “Acknowledgments” (22.8%) of the other speaker’s comments: –“Yeah” “right”

N-Gram Based Approaches n-gram: a sequential list of n words, used to encode the likelihood that the phrase will appear in the future Involves splitting sentence into chunks of consecutive words of length “n” “I don’t know what to say” 1-gram (unigram): I, don’t, know, what, to, say 2-gram (bigram): I don’t, don’t know, know what, what to, to say 3-gram (trigram): I don’t know, don’t know what, know what to, etc. … n-gram

Frequency Table (Training) GEN DTL ACK “I don’t want to be” “Don’t want to be” “I” SAT “Yeah” Each entry: Probability that n-gram is labeled a certain tag

N-Gram Motivation Advantages Encode not just keywords, but also word ordering, automatically Models are not biased by hand coded lists of words, but are completely dependent on real data Learning features of each affect type is relatively fast and easy Disadvantages Long range dependencies are not captured Dependent on having a corpus of data to train from –Sparse data for low frequency affect tags adversely affects the quality of the n-gram model

Naïve Approach P(tag i | utt) = max j,k P(tag i | ngram jk ) Where i is one of {GEN, DTL, ACK, SAT, TNG} And ngram jk is the j-th ngram of length k So for all n-grams in a thought unit, find the one with the highest probability for a given tag, and select that tag

Naïve Approach Example I don’t want to be chained to a wall. kTagTop N-gramProbability 1GENdon’t GENto a GEN I don’t DTLdon’t want to be DTLI don’t want to be1.00

N-Gram Approaches Weighted Approach –Weight the longer n-grams higher in the stochastic model Lengths Approach –Include a length-of-utterances factor, capturing the differences in utterance length between affect tags Weights with Lengths Approach –Combine Weighted with Lengths Repetition Approach –Combine all the above information,with overlap of words between thought units

Repetition Approach Many acknowledgement ACK utterances were being mistagged as GEN by the previous approaches. Most of the errors came from grounding that involved word repetition: A - so then you check that your tire is not flat. B - check the tire We created a model that takes into account word repetition in adjacent utterances in a dialogue. We also include a length probability to capture the Lengths Approach. Only unigrams are used to avoid sparseness in the training data.

IR-based approaches Work based on call-routing algorithm of Chu-Carroll and Carpenter (1999) Problem: route a user’s call to a financial call center to the correct destination Do this by comparing a query from the user (speech converted to text) into a vector to be compared with a list of possible destination vectors in a database

Database Table (Training) GEN DTL ACK “I don’t want to be” “Don’t want to be” “I” SAT “yeah” Query Cosine comparison “yeah, that’s right” Database Query (thought unit) compared against each tag vector in database

Database Creation Construct database in the same manner as N-gram Database then normalized Filter: Inverse Document Frequency (IDF) – lowers the weight of terms that occur in many documents: IDF(t) = log 2 (N / d(t) ) Where d(t) is the number of tags containing n-gram t, and N is the total number of tags

Method 1: Routing-based method Modified call-routing method with entropy (amount of disorder) to further reduce contribution of terms that occur frequently Also created two more terms (rows in database) –Sentence length: tags may be correlated with sentences of a certain length –Repetition – acknowledgments tend to repeat the words stated in the previous thought unit

Method 1: Example ACK=0.002 query DTL = GEN = SAT = TNG = Cosine scores for tags compared against query vector for “I don’t want to be chained to a wall”

Method 2: Direct Comparison Instead of comparing queries to a normalized database of exemplar documents, compare them to all test sentences Advantage: no normalizing or construction of documents Cosine test is used to get the top ten matches. Add matches with the same tag. The tag that has the highest sum in the end is selected.

Method 2: Example Cosine ScoreTagSentence 0.64SATAre we supposed to get them? 0.60GENThat sounds good 0.60TNGThat’s due to my throat 0.56DTLBut if I said to you I don’t want… 0.55DTLIf it were me, I’d want to be a guinea pig to try things DTL selected with total score of 1.11

Evaluation Performed six-fold cross-validation over the Marriage Corpus and Switchboard Corpus Averaged scores from each of the six evaluations

Results NaiveWeightedLengths Weights with Lengths Repetition 66.80%67.43%64.35%66.02%66.60% 6-Fold Cross Validation for N-gram Methods OriginalEntropyRepetitionLengthRepetition and Length Direct 61.37%66.16%66.39%66.76% 63.16% 6-Fold Cross Validation for IR Methods

Discussion N-gram approaches do slightly better than IR over Marriage Counseling Incorporating additional features of sentence length and repetition improve both models Entropy model better than IDF in call-routing system (gets 4% boost) Psychologists currently using tool to tag their work. Note sometimes computer tags better than the human annotators

CATS CATS: An Automated Tagging System for affect and other similar information retrieval tasks. Written in Java for cross-platform interoperability. Implements the Naïve approach with unigrams and bigrams only. Builds the stochastic models automatically off of a tagged corpus, input by the user into the GUI display. Automatically tags new data using the user’s models. Each tag also receives a confidence score, allowing the user to hand check the dialogue quickly and with greater confidence.

The CATS GUI provides a clear workspace for text and tags. Tagging new data and training old data is done with a mouse click.

Customizable models are available. Create your own list of tags, provide a training corpus, and build a new model.

Tags are marked with confidence scores based on the probabilistic models.