The Role and Identification of Dialog Acts in Online Chat AAAI-11 Workshop on Analyzing Microtext August 8, 2011 Tamitha Carpenter, Emi Fujioka Stottler Henke Associates Inc NE 45th St., Suite 310, Seattle, WA FAX:
Overview Problem: Analyze task-supporting chat to enable situation awareness processing Domain: Software development Corpus 1111 messages, collected from an IRC chat room over a 6 week period Approach Chat-IE – Context-aware, event driven, collection of experts Includes tokenizer, POS tagger, dialog act type identifiers, and dialog pattern matcher
Software Development Team I've finished one task (in review now) and one review what defect is it? meeting tomorrow at noon to discuss ideas on how to do this. so how do you know how to read the value if the file hasn't changes? changed Domain Term Recognition Shallow Parsing Historical Phrase Matching Dialog Act Splitting/Merging Fragment Tagger so how do you know how to read the value if the file hasn't changed? I've finished one task (in review now) and one review what defect is it? meeting tomorrow at noon to discuss ideas on how to do this. Directive Action Wh-question Context Source Code Bug Tracking Wiki Pages
Dialog Act Types, most common first statement non opinionstatement opinion action descriptionyes no question action directivecommit agree acceptother wh questionthanking affirmative answercompletion declarative y/n questionhmm response acknowledgeapology appreciationnegative answer offercorrection hedgemaybe accept part open questionreject hold before agreementother answer summarize restaterhetorical question conventional closingquotation downplayeroption or clauseself talk abandonedack backchannel (mm hmm) attentionbackchannel question conventional openingdeclarative wh question repeat phrasesignal non understanding tag question Most commonly self-completion Example: Speaker1: I’m working on defect 567 Speaker1: I meant 568 For messages directed at specific person Describe ongoing and completed activities
Uses Triage – Identify critical events mid-conversation Threading – Use patterns of dialogs to detangle multiple conversations Filtering – Direct topically relevant conversations to interested users Extraction – Use sequences of dialog act types to structure IE rules
Dialog Act Identification (1) Historical Phrase Matching Identify Dialog Act Types based on past messages –Raw text –Text tagged with parts of speech Uses variation of a String B-tree for fast matching over a large corpus Obtained about 60% accuracy on common dialog act types
Dialog Act Identification (2) Boosted performance to near 90% accuracy Example rules: –Wh-questions – Messages starting with wh-words (what, which, why, etc.). –Statement-opinion – Messages containing one of: “might”, “maybe”, “should”, “seems”, “i think”, “looks like”, “look like”, “probably”, or “i'm sure”. –Action-directive – Messages starting with infinitive verbs. –Action-description – Messages starting with “i”, “i just”, “i have”, “i’m”, etc., followed by a past tense or “-ing” verb. –Commit – Messages starting with “i will”, “i’ll”, “i’m going to”, or “i am going to”. Also, messages starting with “will” followed by an infinitive verb.
Dialog Patterns Status updates – An action-directive or wh-question, followed by any number of action-descriptions. Directed request with acknowledge – An attention followed by any number of utterances, followed by a response-acknowledge by the person mentioned in the first utterance. Confirmed expertise (1) – An action-description followed by a thanking or a response-acknowledge (preferably mentioning the initial speaker). (First speaker demonstrated expertise.) Confirmed expertise (2) – A yes-no-question or wh- question followed by a describe-other. (Second speaker demonstrated expertise.)
Lessons Learned Users have very specific needs for chat analysis. Filter chat dialogs and messages/threads into topics or “bins”. Monitor chat rooms for triggering events. Everything hinges on the tokenizer. Users combine characters in novel ways (e.g., ?!?!,, :-), etc.) Domains may have special tokens (e.g., “/usr/bin/chatLogs”, “65.4N”). Partial dialogs may need to be retired without being “finished”.
References Cohen & Levesque, Rational interaction as the basis for communication. In Intentions in Communication. Creswick, Fujioka, & Goan, Pedigree tracking in the face of ancillary content. In Proceedings of the Second Workshop on Uncovering Plagiarism, Authorship, and Software Misuse (PAN). Cunningham, Maynard, Bontcheva, & Tablan, GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02). Grice, Logic and conversation. In Syntax and semantics 3: Dialog acts. Hepple, Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part-of-Speech Taggers. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000). Stolcke, Ries, Coccaro, Shriberg, Bates, Jurafsky, Taylor, Martin, Van Ess-Dykema, & Meteer, Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. In Computational Linguistics 26(3).