Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus

Slides:

Advertisements

Similar presentations

Advertisements

Symantec 2010 Windows 7 Migration Global Results.

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

EuroCondens SGB E.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.

Sequential Logic Design

1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.

1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 28.

Addition and Subtraction Equations

Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.

Performance of Hedges & Long Futures Positions in CBOT Corn Goodland, Kansas March 2, 2009 Daniel OBrien, Extension Ag Economist K-State Research and Extension.

Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.

The 5S numbers game..

Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.

A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.

Sampling in Marketing Research

Student & Work Study Employment Facts & Time Card Training

Break Time Remaining 10:00.

This module: Telling the time

The basics for simulations

Turing Machines.

PP Test Review Sections 6-1 to 6-6

EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.

Matthias Wimmer, Bernd Radig, Michael Beetz Chair for Image Understanding Computer Science TU München, Germany A Person and Context.

Oil & Gas Final Sample Analysis April 27, Background Information TXU ED provided a list of ESI IDs with SIC codes indicating Oil & Gas (8,583)

1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.

Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.

Performance of units These slides complement the article How journal rankings can suppress interdisciplinary research. A comparison between innovation.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

Simultaneous Equations 6-Apr-17

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

1 Joseph Ghafari Artificial Neural Networks Botnet detection for Stéphane Sénécal, Emmanuel Herbert.

Responsiveness in IM: Predictive Models Supporting Inter-Personal Communication Daniel Avrahami, Scott E. Hudson Carnegie Mellon University

2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.

Before Between After.

2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.

2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.

Foundation Stage Results CLL (6 or above) 79% 73.5%79.4%86.5% M (6 or above) 91%99%97%99% PSE (6 or above) 96%84%100%91.2%97.3% CLL.

Subtraction: Adding UP

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.

Static Equilibrium; Elasticity and Fracture

Age Biased Technical and Organisational Change, Training and Employment Prospects of Older Workers Luc Behaghel, Eve Caroli and Muriel Roger Paris School.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Clock will move after 1 minute

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Immunobiology: The Immune System in Health & Disease Sixth Edition

Select a time to count down from the clock above

Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.

Basics of Statistical Estimation

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,

Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.

Features & Decision regions

Approaching an ML Problem

Presentation transcript:

Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus Work by: Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Dan Bohus, Alex Rudnicky Carnegie Mellon University – 2001

Is This Conversation on Track ? Outline The Problem. The Approach Training Data and Features Experiments and Results Conclusion. Future Work

Is This Conversation on Track ? The Problem Systems often misunderstand, take misunderstanding as fact, and continue to act using invalid information Repair costs Increased dialog length User Frustration Confidence annotation provides critical information for effective confirmation and clarification in dialog systems.

Is This Conversation on Track ? The Approach Treat the problem as a data-driven classification task. Objective: accurately label misunderstood utterances. Collect a training corpus. Identify useful features. Train a classifier ~ identify the best performing one for this task.

Is This Conversation on Track ? Data Communicator Logs & Transcripts: Collected 2 months (Oct, Nov 1999). Eliminated conversations with < 5 turns. Manually labeled OK (67%) / BAD (33%) BAD ~ RecogBAD / ParseBAD / OOD / NONSpeech Discarded mixed-label utterances (6%). Cleaned corpus of 4550 utterances / 311 dialogs.

Is This Conversation on Track ? Feature Extraction 12 Features from various levels: Decoder Features: Word Number, Unconfident Percentage Parsing Features: Uncovered Percentage, Fragment Transitions, Gap Number, Slot Number, Slot Bigram Dialog Features: Dialog State, State Duration, Turn Number, Expected Slots Garble: handcrafted heuristic currently used by the CMU Communicator

Is This Conversation on Track ? Experiments with 6 different classifiers Decision Tree Artificial Neural Network Naïve Bayes Bayesian Network Several network structures attempted AdaBoost Individual feature-based binning estimators as weak learners, 750 boosting stages Support Vector Machines Dot, Polynomial, Radial, Neural, Anova

Is This Conversation on Track ? Evaluating performance Classification Error Rate (FP+FN) CDR = 1-Fallout = 1-(FP/NBAD) Cost of misunderstanding in dialog systems depends on Error type (FP vs. FN) Domain Dialog state Ideally, build a cost function for each type of error, and optimize for that

Is This Conversation on Track ? Results – Individual Features Features (top 8)Mean Err. Rate Uncovered Percentage19.93% Expected Slot20.97% Gap Number23.01% Bigram Score23.14% Garble25.32% Slot Number25.69% Unconfident Percentage27.34% Dialog State31.03% Baseline error 32.84% (when predicting the majority class) All experiments involved 10-fold cross-validation

Is This Conversation on Track ? Results – Classifiers ClassifierMean Err. RateF/P RateF/N Rate AdaBoost16.59%11.43%5.16% Decision Tree17.32%11.82%5.49% Bayesian Network17.82%9.41%8.42% SVM18.40%15.01%3.39% Neural Network18.90%15.08%3.82% Naïve Bayes21.65%14.24%7.41% T-Test showed there is no statistically significant difference between the classifiers except for the Naïve Bayes Explanation: independence between feature assumption is violated Baseline error 25.32% (GARBLE)

Is This Conversation on Track ? Future Work Improve the classifiers Additional features Develop a cost model for understanding errors in dialog systems. Study/optimize tradeoffs between F/P and F/N; Integrate value and confidence information to guide clarification in dialog systems

Is This Conversation on Track ? Confusion Matrix OKBAD System says OKTPFP System says BADFNTN FP = False acceptance FN = False detection/rejection Fallout = FP/(FP+TN) = FP/NBAD CDR = 1-Fallout = 1-(FP/NBAD)