Intonational Variation in Spoken Dialogue Systems

Slides:

Advertisements

Similar presentations

Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.

Advertisements

“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.

Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

Intonation and Information Discourse and Dialogue CS359 October 16, 2001.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

Lexical, Prosodic, and Syntactics Cues for Dialog Acts.

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.

On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,

Teaching Listening Why teach listening?

Investigating Pitch Accent Recognition in Non-native Speech

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Objectives State the reasons for the complexity involved in the development of software Define the following terms Objects Classes Messages Methods Explain.

ASSESSMENT OF STUDENT LEARNING

Language Functions.

Dialogue Systems Julia Hirschberg CS /17/2018.

Building and Evaluating SDS

Error Detection and Correction in SDS

Spoken Dialogue Systems

Studying Intonation Julia Hirschberg CS /21/2018.

Meanings of Intonational Contours

Representing Intonational Variation

Studying Intonation Julia Hirschberg CS /21/2018.

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Intonational and Its Meanings

Intonational and Its Meanings

The American School and ToBI

Meaningful Intonational Variation

Teaching Listening Based on Active Learning.

Dialogue Acts Julia Hirschberg CS /18/2018.

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,

Information Structure and Prosody

Meanings of Intonational Contours

Turn-taking and Disfluencies

Studying Spoken Language Text 17, 18 and 19

Representing Intonational Variation

Representing Intonational Variation

Advanced NLP: Speech Research and Technologies

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

“Downstepped contours in the given/new distinction”

Predicting Phrasing and Accent

Agustín Gravano & Julia Hirschberg {agus,

Advanced NLP: Speech Research and Technologies

Spoken Dialogue Systems

Spoken Dialogue Systems

Discourse Structure in Generation

Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997

Intonational and Its Meanings

Spoken Dialogue Systems: System Overview

Spoken Dialogue Systems

Recognizing Structure: Dialogue Acts and Segmentation

Spoken Dialogue Systems

Low Level Cues to Emotion

Evaluation of a multimodal Virtual Personal Assistant Glória Branco

Presentation transcript:

Intonational Variation in Spoken Dialogue Systems Generation and Understanding Julia Hirschberg Charles University March 2001 11/12/2018

Talking to a Machine….and Getting an Answer Today’s spoken dialogue systems make it possible to accomplish real tasks, over the phone, without talking to a person Real-time speech technology enables real-time interaction Speech recognition and understanding is ‘good enough’ for limited, goal-directed interactions Careful dialogue design can be tailored to capabilities of component technologies Limited domain Judicious use of system initiative vs. mixed initiative 10 or 15 years ago we talked about talking to machines, we studied the problems involved, and we even built demo dialogue systems. Today when you talk to a machine, you have a chance of getting a useful answer -- about travel arrangements or your email or your finances. What’s made this possible? Julia 11/12/2018

Some Representative Spoken Dialogue Systems Deployed Mixed Initiative Brokerage (Schwab-Nuance) User E-MailAccess (myTalk) System Initiative Directory Assistant (BNR) Air Travel (UA Info-SpeechWorks) Communicator (DARPA Travel) Also, Waxholm, August, PADIS (Bouwman & Hulstijn ’98) Not only have spoken dialogue systems become useful for real tasks, but they’ve also become increasingly usable. While early systems relied upon the system controlling all stages of the interaction (e.g. Please say ‘yes’ or ‘no’.), more recent systems allow the user greater input into how the dialogue will proceed (e.g. ‘Hello. I want to go to Philadelphia on 11 May.’) But actual commercially deployed systems mostly still use system initiative – the system tells the user precisely what to say and can accept no deviations (i.e., no additional information or topic switching), although we are starting to see some mixed initiative, where the system is able to process additional information offered by the user. User initiative systems, with open-ended prompts (like How May I Help You) are mostly in trial. And all systems can fail utterly….<Play damon/task3> ANSER - Japanese banking information. Primarily single word commands. ATIS - DARPA air travel (explained earlier) MIT Galaxy/Jupiter - from DARPA ATIS, mixed-initiative for multiple applications, including weather info, directions to restaurants in the MIT area, airline reservations) Directory Assistant - Bell Northern Research telephone number retrieval by name ARISE - European research effort on dialogues for train timetable info and ticket purchase (some systems are system initiative, others mixed initiative) Brokerage info - stock quotes, account information - mostly system initiative, limited mixed initiative (e.g., “transfer to checking from savings”) E-mail access - mostly single commands Flight arrival information - system initiative for getting flight info for United Airlines Communicator - current DARPA project on modular dialogue system architecture, mixed initiative dialogue How May I Help You - customer care routing, user intiative for initial request, then mixed initiative Multimodal maps (research) - TRAINS application [U. Rochester] - planning train routes; Quickset (Oregon Grad. Institute), disaster management application. MIT Galaxy/Jupiter Communications (Wildfire, Portico) Customer Care (HMIHY – AT&T) Banking (ANSER) ATIS (DARPA Travel) Train Schedule (ARISE) Multimodal Maps (Trains, Quickset) 1980+ 1990+ 1993+ 1995+ 1997+ 1999+ Julia 11/12/2018

But we have a long way to go… 11/12/2018

Course Overview Spoken Dialogue Systems today Evaluating their weaknesses Role of intonational variation Importance of corpora and conventions for annotating them Intonational ‘meanings’ Prosody in Speech Generation Prosody in Speech Recognition/ Understanding Julia 11/12/2018

Course Overview Spoken Dialogue Systems today Evaluating their strengths and weaknesses Role of intonational variation Importance of corpora and conventions for annotating them Intonational ‘meanings’ Prosody in Speech Generation Prosody in Speech Recognition/ Understanding Julia 11/12/2018

Evaluating Dialogue Systems PARADISE framework (Walker et al ’00) “Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished Maximize Task Success Minimize Costs Efficiency Measures Qualitative Measures Julia 11/12/2018

Task Success Task goals seen as Attribute-Value Matrix ELVIS e-mail retrieval task (Walker et al ‘97) “Find the time and place of your meeting with Kim.” Attribute Value Selection Criterion Kim or Meeting Time 10:30 a.m. Place 2D516 Task success defined by match between AVM values at end of with “true” values for AVM Julia 11/12/2018

Metrics Efficiency of the Interaction:User Turns, System Turns, Elapsed Time Quality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation Requests User Satisfaction Task Success: perceived completion, information extracted Julia 11/12/2018

Experimental Procedures Subjects given specified tasks Spoken dialogues recorded Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled Users specify task solution via web page Users complete User Satisfaction surveys Use multiple linear regression to model User Satisfaction as a function of Task Success and Costs; test for significant predictive factors Julia 11/12/2018

User Satisfaction: Sum of Many Measures Was Annie easy to understand in this conversation? (TTS Performance) In this conversation, did Annie understand what you said? (ASR Performance) In this conversation, was it easy to find the message you wanted? (Task Ease) Was the pace of interaction with Annie appropriate in this conversation? (Interaction Pace) In this conversation, did you know what you could say at each point of the dialog? (User Expertise) How often was Annie sluggish and slow to reply to you in this conversation? (System Response) Did Annie work the way you expected her to in this conversation? (Expected Behavior) From your current experience with using Annie to get your email, do you think you'd use Annie regularly to access your mail when you are away from your desk? (Future Use) Julia 11/12/2018

Performance Functions from Three Systems ELVIS User Sat.= .21* COMP + .47 * MRS - .15 * ET TOOT User Sat.= .35* COMP + .45* MRS - .14*ET ANNIE User Sat.= .33*COMP + .25* MRS +.33* Help COMP: User perception of task completion (task success) MRS: Mean recognition accuracy (cost) ET: Elapsed time (cost) Help: Help requests (cost) Julia 11/12/2018

Performance Model Perceived task completion and mean recognition score are consistently significant predictors of User Satisfaction Performance model useful for system development Making predictions about system modifications Distinguishing ‘good’ dialogues from ‘bad’ dialogues But can we also tell on-line when a dialogue is ‘going wrong’ Julia 11/12/2018

Course Overview Spoken Dialogue Systems today Evaluating their weaknesses Role of intonational variation Importance of corpora and conventions for annotating them Intonational ‘meanings’ Prosody in Speech Generation Prosody in Speech Recognition/ Understanding Julia 11/12/2018

How to Predict Problems ‘On-Line’? Evidence of system misconceptions reflected in user responses (Krahmer et al ‘99, ‘00) Responses to incorrect verifications contain more words (or are empty) show marked word order (especially after implicit verifications) contain more disconfirmations, more repeated/corrected info ‘No’ after incorrect verifications vs. other ynq’s has higher boundary tone wider pitch range longer duration longer pauses before and after more additional words after it Signalling whether information is grounded or not (Clark & Wilkes-Gibbs ‘86, Clark & Schaeffer ‘89): presentation/acceptance 120 dialogue for Dutch train info; one version uses explicit verification and oneimplicit; 20 users given 3 tasks; analyzed 443 verification q/a pairs predicted that responses to correct verifications would be shorter, with unmarked word order, not repeating or correcting information but presenting new information (positive cues) -- principle of least effort findings: where problems, subjects use more words (or say nothing), use marked word order (especially after implicit verifs), contain more disconfirmations (duh), with more repeated and corrected info ML experiments (memory based learning) show 97% correct prediction from these features (>8 words or marked word order or corrects info -> 92%) Krahmer et al ‘99b predicted additional prosodic cues for neg signals: high boundary tone, high pitch range, long duration of ‘nee’ and entire utterance, long pause after ‘nee’, long delay before ‘no’, from 109 negative answers to ynqs of 7 speakers; hyp Julia 11/12/2018

User information state reflected response (Shimojima et al ’99, ‘01) Echoic responses repeat prior information – as acknowledgment or request for confirmation S1: Then go to Keage station. S2: Keage. Experiment: Identify ‘degree of integration’ and prosodic features (boundary tone, pitch range, tempo, initial pause) Perception studies to elicit ‘integration’ effect Results: fast tempo, little pause and low pitch signal high integration Shimojima et al: prior observational study produced corpus of task-oriented conversations in japanese: block configuration task, ftf, 2 party focussed on immediate repeats from responder (not hello/hello, etc) Consensus labeled ‘degree to which responder seemed to have integrated information’ (1-5) Chi-sq and t-tests show close corr of ratings and prosodic features modified tokens’ tempo, ppau and pitch to create prosody favoring very high and very low integration ratings and presented to subjects to rate, 1-5 Asked what grounding functions were being performed (request-repair or ack) Over all, fast tempo, little pause and low pitch signal integration Julia 11/12/2018

Can Prosodic Information Help Identify Dialogue System Problems ‘On Line’? 11/12/2018

Motivation Prosody conveys information about: The state of the interaction: Is the user having trouble being understood? Is the user having trouble understanding the system? What the speaker is trying to convey Is this a statement or a question? The structure of the dialogue Is the user or the system trying to start a new topic? The emotions of the speaker Is the speaker getting angry, frustrated? Julia 11/12/2018

Past Research Issues and Applications How prosodic variation influences ‘meaning’ Focus or contrast Given/new How prosodic variation is related to other linguistic components Syntax Semantics How to model prosodic variation effectively Applications: Text-to-Speech Julia 11/12/2018

Trends in Research on Prosody Current Trends New description schemes (e.g. ToBI) Corpus-based research and machine learning Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue) Investigation of spontaneous speech phenomena and variation in speaking style Applications to CTS, ASR and SDS Trends in Research on Prosody Proposals of new description schemes (e.g. ToBI) and refinement of older ones --> compare our findings to others’ -- within and across languages (Large) Corpus-based research --> fascination with automatic learning techniques Emphasis on evaluation of algorithms and systems --> new questions about how to evaluate ‘better’ prosody Investigation of spontaneous speech phenomena and variation in speaking style Refinement of old questions about how prosody contributes to phenomena like syntactic disambiguation, focus, given/new distinction, discourse structure Julia 11/12/2018

Course Overview Spoken Dialogue Systems today Evaluating their weaknesses Role of intonational variation Importance of corpora and conventions for annotating them Intonational ‘meanings’ Prosody in Speech Generation Prosody in Speech Recognition/ Understanding Julia 11/12/2018

Corpora Public and semi-public databases Private collections The Web ATIS, SwitchBoard, Call Home (NIST/DARPA/LDC) TRAINS/TRIPS (U. Rochester) FM Radio (BU) Private collections Acquired for speech or dialogue research (e.g. August, Gustafson & Bell ’00) Meeting, call center, focus group collections Accidentally collected The Web Mud/Moo dialogues Julia 11/12/2018

To(nes and)B(reak)I(ndices) Developed by prosody researchers in four meetings over 1991-94 Goals: devise common labeling scheme for Standard American English that is robust and reliable promote collection of large, prosodically labeled, shareable corpora ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,.... Julia 11/12/2018

Minimal ToBI transcription: recording of speech f0 contour ToBI tiers: orthographic tier: words break-index tier: degrees of junction (Price et al ‘89) tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) miscellaneous tier: disfluencies, non-speech sounds, etc. Julia 11/12/2018

Sample ToBI Labeling Julia 11/12/2018

Online training material,available at: Evaluation http://www.ling.ohio-state.edu/phonetics/ToBI/ Evaluation Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94) Julia 11/12/2018

Course Overview Spoken Dialogue Systems today Evaluating their weaknesses Role of intonational variation Importance of corpora and conventions for annotating them Intonational ‘meanings’ Prosody in Speech Generation Prosody in Speech Recognition/ Understanding Julia 11/12/2018

Pitch Accent/Prominence in ToBI Which items are made intonationally prominent and how? Accent type: H* simple high (declarative) L* simple low (ynq) L*+H scooped, late rise (uncertainty/ incredulity) L+H* early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity) Julia 11/12/2018

Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence: within a phrase: HiF0 across phrases Julia 11/12/2018

Functions of Pitch Accent Given/new information S: Do you need a return ticket. U: No, thanks, I don’t need a return. Contrast (narrow focus) U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) Disambiguation of discourse markers S: Now let me get you the train information. U: Okay (thanks) vs. Okay….(but I really want…) Julia 11/12/2018

Prosodic Phrasing in ToBI ‘Levels’ of phrasing: intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier 0 no word boundary 1 word boundary 2 strong juncture with no tonal markings 3 intermediate phrase boundary 4 intonational phrase boundary Julia 11/12/2018

Functions of Phrasing Disambiguates syntactic constructions, e.g. PP attachment: S: You should buy the ticket with the discount coupon. Disambiguates scope ambiguities, e.g. Negation: S: You aren’t booked through Rome because of the fare. Or modifier scope: S: This fare is restricted to retired politicians and civil servants. Julia 11/12/2018

Contours: Accent + Phrasing What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight. (L*+H L- H%) Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) “Personality” S: Welcome to the Sunshine Travel System. Julia 11/12/2018

Pitch Range and Timing Level of speaker engagement S: Welcome to InfoTravel. How may I help you? Contour interpretation S: You can take the L*+H bus from Malpensa to Rome L-H%. U: Take the bus. vs. Take the bus! Discourse/topic structure Julia 11/12/2018

Can systems make use of this information. Can they produce it Can systems make use of this information? Can they produce it?? Can they recognize it?? 11/12/2018