Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004
Roadmap Maptask overview Coding –Transactions –Games –Moves Assessing agreement
Maptask Conducted by HCRC – Edinburgh/Glasgow Task structure: –2 participants: Giver, follower –2 slightly different maps Giver guides follower to destination on own map –Forces interaction, ambiguities, disagreements, etc –Conditions: Familiar/not; Visible/not
Dialogue Tagging Goal: Represent dialogue structure as generically as possible Three level scheme: –Transactions Major subtasks in participants overall task –Conversational Games Correspond to G&S discourse segments –Conversational Moves Initiation and response steps
Basic Dialogue Moves Initiations and responses Cover acts observed in dialogue – generalized Initiations: Instruct: tell to carry out some action; Explain: give unelicited information; Check: ask for confirmation; Align:check attention; Query-yn: Query-wh Responses:Acknowledge: signal understand & accept; Reply-y; Reply-n; Reply-wh; Clarify Ready:Inter-game moves
Game Coding Initiation: –Identified by first move Purpose – carry through to completion –May embed other games – Mark level –Mark completion/abandonment
Interrater Agreement How good is tagging? A tagset? Criterion: How accurate/consistent is it? Stability: –Is the same rater self-consistent? Reproducibility: –Do multiple annotators agree with each other? Accuracy: –How well do coders agree with some “gold standard”?
Agreement Measure Kippendorf’s Kappa (K) –Applies to classification into discrete categories –Corrects for chance agreement K<0 : agree less than expected by chance –Quality intervals: >= 0.8: Very good; 0.6<K<0.8: Good, etc Maptask: K=0.92 on segmentation, –K = 0.83 on move labels
Dialogue Act Tagging Other tagsets –DAMSL, SWBD-DAMSL, VERBMOBIL, etc Many common move types –Vary in granularity Number of moves, types Assignment of multiple moves
Dialogue Act Recognition Goal: Identify dialogue act tag(s) from surface form Challenge: Surface form can be ambiguous –“Can you X?” – yes/no question, or info-request “Flying on the 11t h, at what time?” – check, statement Requires interpretation by hearer –Strategies: Plan inference, cue recognition
Plan-inference-based Classic AI (BDI) planning framework –Model Belief, Knowledge, Desire Formal definition with predicate calculus –Axiomatization of plans and actions as well –STRIPS-style: Preconditions, Effects, Body –Rules for plan inference Elegant, but.. –Labor-intensive rule, KB, heuristic development –Effectively AI-complete
Cue-based Interpretation Employs sets of features to identify –Words and collocations: Please -> request –Prosody: Rising pitch -> yes/no question –Conversational structure: prior act Example: Check: Syntax: tag question “,right?” Syntax + prosody: Fragment with rise N-gram: argmax d P(d)P(W|d) –So you, sounds like, etc Details later ….
Recognizing Maptask Acts Assume: – Word-level transcription – Segmentation into utterances, –Ground truth DA tags Goal: Train classifier for DA tagging –Exploit: Lexical and prosodic cues Sequential dependencies b/t Das –14810 utts, 13 classes
Features for Classification Acoustic-Prosodic Features: –Pitch, Energy, Duration, Speaking rate Raw and normalized, whole utterance, last 300ms 50 real-valued features Text Features: –Count of Unigram, bi-gram, tri-grams Appear multiple times features, sparse Features z-score normalized
Classification with SVMs Support Vector Machines –Create n(n-1)/2 binary classifiers Weight classes by inverse frequency Learn weight vector and bias, classify by sign –Platt scaling to convert outputs to probabilities
Incorporating Sequential Constraints Some sequences of DA tags more likely: –E.g. P(affirmative after y-n-Q) = 0.5 – P(affirmative after other) = 0.05 Learn P(yi|yi-1) from corpus –Tag sequence probabilities –Platt-scaled SVM outputs are P(y|x) Viterbi decoding to find optimal sequence
Results SVM OnlySVM+Seq Text Only Prosody Only Text+Prosody
From Human to Computer Conversational agents –Systems that (try to) participate in dialogues –Examples: Directory assistance, travel info, weather, restaurant and navigation info Issues: –Limited understanding: ASR errors, interpretation –Computational costs: broader coverage -> slower, less accurate
Dialogue Manager Tradeoffs Flexibility vs Simplicity/Predictability –System vs User vs Mixed Initiative –Order of dialogue interaction –Conversational “naturalness” vs Accuracy –Cost of model construction, generalization, learning, etc Models: FST, Frame-based, HMM, BDI Evaluation frameworks