( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.

Slides:



Advertisements
Similar presentations
Números.
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
EuroCondens SGB E.
Worksheets.
& dding ubtracting ractions.
Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus
STATISTICS Linear Statistical Models
Addition and Subtraction Equations
Disability status in Ethiopia in 1984, 1994 & 2007 population and housing sensus Ehete Bekele Seyoum ESA/STAT/AC.219/25.
1 When you see… Find the zeros You think…. 2 To find the zeros...
We need a common denominator to add these fractions.
EQUS Conference - Brussels, June 16, 2011 Ambros Uchtenhagen, Michael Schaub Minimum Quality Standards in the field of Drug Demand Reduction Parallel Session.
Measurements and Their Uncertainty 3.1
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
2.11.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
The basics for simulations
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
Regression with Panel Data
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Statistics Review – Part I
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.
Subtraction: Adding UP
: 3 00.
5 minutes.
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
famous photographer Ara Guler famous photographer ARA GULER.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Select a time to count down from the clock above
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
Presentation transcript:

( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog systems Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University 1 abstract phone-based, mixed initiative system for conference room reservations access to live schedules for 13 rooms in 2 buildings size, location, a/v equipment Roomline ( current & future work ) We propose a data-driven approach for constructing more accurate beliefs over concept values in spoken dialog systems by integrating information across multiple turns in the conversation. The approach bridges existing work in confidence annotation and correction detection and provides a unified framework for belief updating. It significantly outperforms heuristic rules currently used in most spoken dialog systems. 3 user response analysis a k-hypotheses + other 4 models. results 5 conclusion 2 problem 3 dataset b estimated impact on task success c using information from n-best lists user study with the RoomLine spoken dialog system 46 participants (1 st time users) 10 scenario-based interactions each 449 dialogs 8278 turns corpus transcribed and annotated responses to explicit confirmations YesNoOther Correct (1159)94% [93%] 0% [0%] 5% [7%] Incorrect (279)1% [6%] 72% [57%] 27% [37%] responses to implicit confirmations YesNoOther Correct (554)30% [0%] 7% [0%] 63% [100%] Incorrect (229)6% [0%] 33% [15%] 61% [85%] how do users respond to correct and incorrect confirmations? explicit confirmation User corrects User does not correct Correct01159 Incorrect25029 implicit confirmation User corrects User does not correct Correct2552 Incorrect users interact strategically ~ correct later correct later ~critical552 critical1447 how often users correct the system? % 20% 10% 0% % 20% 10% 0% % 10% 0% explicit confirmation implicit confirmation unplanned implicit confirmation proposed a data-driven approach for constructing more accurate beliefs in task-oriented spoken dialog systems bridge insights from detection of misunderstandings and corrections into a unified belief updating framework model significantly outperforms heuristics currently used in most spoken dialog systems initialinitial confidence score of top hypothesis, # of initial hypotheses, concept type (bool / non-bool), concept identity; system actionindicators describing other system actions in conjunction with current confirmation; user response acoustic / prosodic acoustic and language scores, duration, pitch (min, max, mean, range, std.dev, min and max slope, plus normalized versions), voiced-to-unvoiced ratio, speech rate, initial pause; lexicalnumber of words, lexical terms highly correlated with corrections (MI); grammaticalnumber of slots (new, repeated), parse fragmentation, parse gaps; dialogdialog state, turn number, expectation match, new value for concept, timeout, barge-in. evaluation initial (error rate in system beliefs before the update) heuristic (error rate in system beliefs after the update – using the heuristic update rules) proposed model (error rate of the proposed logistic model tree) oracle (oracle error rate) features model Conf upd (th C ) M (Conf init (th C ), SA(C), R) logistic model tree [one for each system action] 1-level deep, root splits on answer-type (YES / NO / other) leaves contain stepwise logistic regression models sample efficient, feature selection good probability outputs (minimize cross entropy between model predictions and reality) 10% 0% 20% 30% 16.17% 5.52% 30.83% 6.06% 7.86% 10% 20% 0% 26.16% 17.56% 30.46% 21.45% 22.69% 0% 12% 4% 8% 15.15% 10.72% 15.49% 14.02% 12.95% 4% 0% 8% 12% 9.49% 6.08% 98.14% 9.64% 45.03% 19.23% 80.00% 25.66% 20% 0% 40% unexpected updates initial heuristic proposed model (basic feature set) proposed model (basic + priors) oracle As a prerequisite for increased robustness and making better decisions, dialog systems must be able to accurately assess the reliability of the information they use. Typically, recognition confidence scores provide an initial assessment for the reliability of the information obtained from the user. Ideally, a system should leverage information available in subsequent turns to update and improve the accuracy of its beliefs. belief updating problem: given an initial belief over a concept Belief t (C), a system action SA(C) and a user response R, compute the updated belief Belief t+1 (C) S:starting at what time do you need the room? U: [STARTING AT TEN A M / 0.45] starting at ten a.m. start-time = {10:00 / 0.45} S:did you say you wanted the room starting at ten a.m.? U: [GUEST UNTIL ONE / 0.89] yes until noon start-time = {?} explicit confirmation (correct value) S:for when do you need the room? U: [NEXT THURSDAY / 0.75] next Thursday date = { / 0.75} S:a room for Thursday, August 26 th … starting at what time do you need the room? U: [FIVE TO SEVEN P_M / 0.58] five to seven p.m. date = {?} implicit confirmation (correct value) S:how may I help you? U: [THREE TO RESERVE A ROOM / 0.65] Id like to reserve a room start-time = {15:00 / 0.65} S:starting at three p.m. … for which day do you need the conference room? U: [CAN YOU DETAILS TIME / NONUNDER.(0.0)] I need a different time start-time = {?} implicit confirmation (incorrect value) belief representation k hypotheses + other multinomial generalized linear model system actions: all actions explicit confirmation implicit confirmation unplanned impl. confirmation request [system asks for the value for a concept] unexpected update [system received a value for a concept, without asking for it, e.g. as a result of a misrecognition or the user over-answering or attempting a topic shift] features added prior information on concepts priors constructed manually belief representation: most accurately: probability distribution over the set of possible values but: system is not likely to hear more than 3 or 4 conflicting values in our data, the maximum number of hypotheses for a concept accumulated through interaction was 3; the system heard more than 1 hypothesis for a concept in only 6.9% of cases compressed belief representation: k hypotheses + other for now, k = 1: top hypothesis + other [see current and future work for extensions] for now, only updates after system confirmation actions given an initial confidence score for the top hypothesis h for a concept C - Conf init (th C ), construct an updated confidence score for the hypothesis h - Conf upd (th C ) - in light of the system confirmation action SA(C) and the follow-up user response R compressed belief updating problem: how does the accuracy of the belief updating model affect task success? relates the accuracy of the belief updates to overall task success through a logistic regression model accuracy of belief updates: measured as AVG-LIK of the correct hypothesis word-error-rate acts as a co-founding factor model: P(Task Success=1) α + βWER + γAVG-LIK fitted model using 443 data-points (dialog sessions) β, γ capture the impact of WER and AVG-LIK on overall task success word-error-rate probability of task success avg.lik. = 0.5 avg.lik. = 0.6 avg.lik. = 0.7 avg.lik. = 0.8 avg.lik. = 0.9 current heuristic proposed model avg.lik. = 0.5 avg.lik. = 0.6 current heuristic avg.lik. = 0.7 proposed model avg.lik. = 0.8 avg.lik. = 0.9 average word-error rate nativesnon-natives probability of task success currently: using only the top hypothesis from the recognizer next: extract more information from n-best list or lattices