Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC 35900-1 November 3, 2006.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Speech and Language Processing Chapter 10 of SLP Advanced Automatic Speech Recognition (II) Disfluencies and Metadata.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
(Hidden) Information State Models Ling575 Discourse and Dialogue May 25, 2011.
Dialogue Management Ling575 Discourse and Dialogue May 18, 2011.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Detecting missrecognitions Predicting with prosody.
8/12/2003 Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye International Computer Science Institute.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Part of speech (POS) tagging
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
1 ICSI-SRI-UW Structural MDE: Modeling, Analysis, & Issues Yang Liu 1,3, Elizabeth Shriberg 1,2, Andreas Stolcke 1,2, Barbara Peskin 1, Jeremy Ang 1, Mary.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Discourse Markers Discourse & Dialogue CS November 25, 2006.
Exploiting lexical information for Meeting Structuring Alfred Dielmann, Steve Renals (University of Edinburgh) {
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
1 Determining query types by analysing intonation.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Semi-supervised Dialogue Act Recognition Maryam Tavafi.
Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
National Taiwan University, Taiwan
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Discourse & Dialogue CS 359 November 13, 2001
Challenges in Dialogue Discourse and Dialogue CMSC October 27, 2006.
Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences Mohammad Sadegh Rasooli[Columbia University] Joel Tetreault[Yahoo Labs] This work conducted.
Challenges in Dialogue
Investigating Pitch Accent Recognition in Non-native Speech
Recognizing Disfluencies
Recognizing Structure: Dialogue Acts and Segmentation
Dialogue Acts Julia Hirschberg CS /18/2018.
Turn-taking and Disfluencies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Statistical Models for Automatic Speech Recognition
Recognizing Disfluencies
Recognizing Structure: Dialogue Acts and Segmentation
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Automatic Prosodic Event Detection
Presentation transcript:

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006

Roadmap Task & Corpus Dialogue Act Tagset Automatic Tagging Models –Features –Integrating Features Evaluation Comparison & Summary

Task & Corpus Goal: –Identify dialogue acts in conversational speech Spoken corpus: Switchboard –Telephone conversations between strangers –Not task oriented; topics suggested –1000s of conversations recorded, transcribed, segmented

Dialogue Act Tagset Cover general conversational dialogue acts –No particular task/domain constraints Original set: ~50 tags – Augmented with flags for task, conv mgmt 220 tags in labeling: some rare Final set: 42 tags, mutually exclusive –Agreement: K=0.80 (high) 1,155 conv labeled: split into train/test

Common Tags Statement & Opinion: declarative +/- op Question: Yes/No&Declarative: form, force Backchannel: Continuers like uh-huh, yeah Turn Exit/Adandon: break off, +/- pass Answer : Yes/No, follow questions Agreement: Accept/Reject/Maybe

Probabilistic Dialogue Models HMM dialogue models –Argmax U P(U)P(E|U) – E: evidence,U:DAs Assume decomposable by utterance Evidence from true words, ASR words, prosody Structured as offline decoding process on dialogue –States= DAs, Obs=Utts, P(Obs)=P(Ei|Ui), trans=P(U) P(U): –Conditioning on speaker tags improves model –Bigram model adequate, useful

DA Classification -Words Words –Combines notion of discourse markers and collocations: e.g. uh-huh=Backchannel –Contrast: true words, ASR 1-best, ASR n-best Results: –Best: 71%- true words, 65% ASR 1-best

DA Classification - Prosody Features: –Duration, pause, pitch, energy, rate, gender Pitch accent, tone Results: –Decision trees: 5 common classes 45.4% - baseline=16.6% –In HMM with DT likelihoods as P(Ei|Ui) 49.7% (vs. 35% baseline)

DA Classification - All Combine word and prosodic information –Consider case with ASR words and acoustics –P(Ai,Wi,Fi|Ui) ~ P(Ai,Wi|Ui)P(Fi|Ui) –Reweight for different accuracies Slightly better than raw ASR

Integrated Classification Focused analysis –Prosodically disambiguated classes Statement/Question-Y/N and Agreement/Backchannel Prosodic decision trees for agreement vs backchannel –Disambiguated by duration and loudness –Substantial improvement for prosody+words True words: S/Q: 85.9%-> 87.6; A/B: 81.0%->84.7 ASR words: S/Q: 75.4%->79.8; A/B: 78.2%->81.7 –More useful when recognition is iffy

Observations DA classification can work on open domain –Exploits word model, DA context, prosody –Best results for prosody+words –Words are quite effective alone – even ASR Questions: –Whole utterance models? – more fine-grained –Longer structure, long term features

Automatic Metadata Annotation What is structural metadata? –Why annotate?

What is Structural Metadata? Issue: Speech is messy Sentence/Utterance boundaries not marked Basic units for dialogue act, etc Speech has disfluencies Result: Automatic transcripts hard to read Structural metadata annotation: –Mark utterance boundaries –Identify fillers, repairs

Metadata Details Sentence-like units (SU) –Provide basic units for other processing Not necessarily grammatical sentences Distinguish full and incomplete SUs Conversational fillers –Discourse markers, disfluencies – um, uh, anyway Edit disfluencies –Repetitions, repairs, restarts Mark material that should be excluded from fluent Interruption point (IP): where corrective starts

Annotation Architecture 2 step process: –For each word, mark IP, SU, ISU, none bound –For region – bound+words – identify CF/ED Post-process to remove insertions Boundary detection – decision trees –Prosodic features: duration, pitch, amp, silence –Lexical features: POS tags, word/POS tag patterns, adjacent filler words

Boundary Detection - LM Language model based boundaries –“Hidden event language model” Trigram model with boundary tags Combine with decision tree –Use LM value as feature in DT –Linear interpolation of DT & LM probabilities –Jointly model with HMM

Edit and Filler Detection Transformation-based learning –Baseline predictor, rule templates, objective fn Classify with baseline Use rule templates to generate rules to fix errors Add best rule to baseline Training: Supervised –Features: Word, POS, word use, repetition,loc –Tag: Filled pause, edit, marker, edit term

Evaluation SU: Best combine all feature types –None great CF/ED: Best features – lexical match, IP Overall: SU detection relatively good –Better on reference than ASR Most FP errors due to ASR errors –DM errors not due to ASR –Remainder of tasks problematic

SU Detection FeaturesSU- R SU-PISU- R ISU-PIP-RIP-P Prosody only POS, Pattern,LM Pros,POS, Pattern,LM All+frag