Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.

Slides:



Advertisements
Similar presentations
Mini Presentations: How To
Advertisements

Generation of Referring Expressions: Managing Structural Ambiguities I.H. KhanG. Ritchie K. van Deemter University of Aberdeen, UK.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Academic Writing Writing an Abstract.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Business and Management Research WELCOME. Lecture 6.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
© 2010 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium The WIDA ELP Standards and Formative Assessment.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Part of speech (POS) tagging
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Speech-to-Speech Translation with Clarifications Julia Hirschberg, Svetlana Stoyanchev Columbia University September 18, 2013.
Stages of Second Language Acquisition
Albert Gatt Corpora and Statistical Methods Lecture 9.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part I.
Exercise 2. No.1  (Worse) Either the supply or consumers determines the market outcome.  (Better) Either the supply or consumers determine the market.
Ronniee-Marie Ruggiero Title III Access to Core Coach Stevenson Middle School Presenters : Xavier Contreras, Bertha Melendez, Frank Rodriguez Language.
Clarification in Spoken Dialogue Systems: Modeling User Behaviors Julia Hirschberg Columbia University 1.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Common errors in writing technical English papers Bob Bailey.
+ What is th CELDT? What you need to know to be successful on this important exam.
Developing Communicative Dr. Michael Rost Language Teaching.
CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.
Speech Analysing Component in Automatic Tutoring Systems Presentation by Doris Diedrich and Benjamin Kempe.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Developing an automated assessment tool for children’s oral reading Leen Cleuren March
Are you ready to play…. Deal or No Deal? Deal or No Deal?
1 Natural Language Processing Lecture Notes 14 Chapter 19.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Introduction to Dialogue Systems. User Input System Output ?
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
1 KINDS OF PARAGRAPH. There are at least seven types of paragraphs. Knowledge of the differences between them can facilitate composing well-structured.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
The Critical Period for Language Acquisition: Evidence from Second Language Learning CATHERINE E. SNOW AND MARIAN HOEFNAGEL-HÖHLE UNIVERSITY OF AMSTERDAM.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
SAT Prep Course English: Mrs. Lowe & Mr. Sorensen Math: Ms. Gilman.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
1 Instructing the English Language Learner (ELL) in the Regular Classroom.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Assessing Listening (Listening comprehension has not always drawn the attention of educators. Human beings have a natural tendency.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Understanding Populations & Samples
Listening Comprehension in Pedagogical Research
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Taking the TEAM Approach: Writing with a Purpose
Music to my Ears: Grammar Review Through Song 2.0
期中考试 Quarter’s Final Oct. 11th – Listening / Reading / Writing (60)
Why Study Spoken Language?
Issues in Spoken Dialogue Systems
Why Study Spoken Language?
Advanced NLP: Speech Research and Technologies
Automatic Detection of Causal Relations for Question Answering
Intermediates Here is a simple profile for Intermediate proficiency speakers from ACTFL 2012.
The WIDA ELP Standards and Formative Assessment
Chapter 2 What speakers know.
Intermediates Here is a simple profile for Intermediate proficiency speakers from ACTFL 2012.
Teaching Listening Comprehension
Presentation transcript:

Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University of London April 3,

Outline Motivation Previous work: a corpus of human clarification questions Automatic method for generating targeted clarification questions Evaluate automatically generated questions with human subjects Comparison two evaluation groups 2

Speech Translation Speech-to-Speech translation system 3 L1 Speaker lation Speech Question (L1) Translated Question (L2) Answer (L2) Translated Answer (L1)) L2 Speaker Translation System 3

Speech Translation Translation may be impaired by: Speech recognition errors Word Error rate in English side of Transtac is 9% Word Error rate in Let’s Go bus information is 50% A speaker may use ambiguous language A speech recognition error may be caused by use of out-of-vocabulary words 4 4

Translation System Speech Translation Speech-to-Speech translation system Introduce a clarification component 5 L1 Speaker Speech Question (L1) Translated Question (L2) Answer (L2) Translated Answer (L1)) Clarification sub-dialogue Clarification sub-dialogue L2 Speaker Dialogue Manager 5

Most Common Clarification Strategies in Dialogue Systems “Please repeat” “Please rephrase” System repeats the previous question 6 6

What Clarification Questions Do Human Speakers Ask? Targeted reprise questions (M. Purver) o Ask a targeted question about the part of an utterance that was misheard or misunderstood, including understood portions of the utterance o Speaker: Do you have anything other than these XXX plans? o Non-Reprise: What did you say?/Please repeat. o Reprise: What kind of plans? 88% of human clarification questions are reprise 12% non-reprise Goal: Introduce targeted questions into a spoken system 7 7

Advantages of Targeted Clarifications More natural User does not have to repeat the whole utterance/command Provides grounding and implicit confirmation  Speech-to-speech translation  Useful in systems that handle natural language user responses/commands/queries and a wide range of topics and vocabulary  Tutoring system  Virtual assistants (in car, in home): a user command may contain ASR error due to noise, background speech, etc. 8 8

Corpus of Human Clarification Questions Collect a corpus of targeted clarification questions Understand user’s reasons for choosing Whether to ask a question Whether it is possible to ask a targeted question When can users infer missing information 9

Corpus of Human Clarification Questions Gave a participant a sentence with a missing segment (from Transtac system output) how many XXX doors does this garage have Asked the participant to Guess the word Guess the word type (POS) Would you ask a question if you heard this in a dialogue? What question would you ask? (encourage targeted) 10

Corpus of Human Clarification Questions Collected 794 Targeted clarification questions 72% of all clarification questions asked 11

Rules for Constructing Questions Construct rules for question generation based on analysis of human-generated questions The algorithm relies on detection of an error segment Use context around the error word, to create a targeted clarification question R_WH Generic (reprise) Syntactic R_VB (reprise) Syntactic R_NMOD R_START R_NE – Named Entity-specific question 12

Rules for Constructing Questions R_WH Generic: + WHAT? The doctor will most likely prescribe XXX. R_WH: The doctor will most likely prescribe WHAT? 13

Rules for Constructing Questions R_WH Generic: + WHAT? The doctor will most likely prescribe XXX. R_WH: The doctor will most likely prescribe WHAT? In some cases using error word is desirable When was the XXX contacted? R_WH* When was the WHAT? R_VB1: When was the WHAT contacted? 14

Rules for Constructing Questions Context can not be used indiscriminately As long as everyone stays XXX we will win. R_VB1* As long as everyone stays WHAT we will win? R_WH As long as everyone stays WHAT? R_VB1: applies when verb and error word share a syntactic parent 15

Rules for Constructing Questions R_VB2: applies when an infinitival verb follows an error word We need to have XXX to use this medication. R_WH We need to have WHAT? R_VB2 We need to have WHAT to use this medication? 16

Rules for Constructing Questions R_NMOD: Error word is a noun modifier Do you have anything other than these XXX plans R_WH: Do you have anything other than these WHAT? R_NMOD: Which plans? XXX Parent NN/NNS NMOD 17

Rules for Constructing Questions If an error occurs in the beginning of a sentence (or there are no content words before the error), there is no. R_START: what about XXX arrives tomorrow. R_START: What about “arrives tomorrow”? 18

Rules for Constructing Questions If an error word is a name or location, use WHERE and WHO instead of WHAT Not present in this data set 19

Evaluation Questionnaire 2 Experimental conditions: COMPUTER: Generated questions automatically using the rules for a set of 84 sentences HUMAN: Asked humans (mturk) to create a clarification questions for the same sentences 20

Experiment Two groups of participants Mturk experiment Recruited 6 participants from the lab Each participant scored 84 clarification questions (CQ) Each CQ was scored by 3 participants from each group 21

Survey Results * * 22

Results 23

Discussion R_WH and R_VB performance is comparable to human- generated questions R_NMOD (which …?) outperforms all other question types including human-generated questions R_START rule did not work 24

Comparing Mturk and Recruited Subjects 25

Recruited Subjects Disliked more human-generated questions than computer-generated questions. Examples of answers to the survey question “How would you ask this clarification question differently?” The set up is obviously XXX by a professional Human-Gen: what type of set up is this? Recruited-subjects chose to change this to: The set up is WHAT by a professional? The set up is obviously WHAT by a professional? it’s obviously WHAT? 26

Mturk Subjects Disliked more computer-generated questions than human-generated questions. Examples of answers to the survey question “How would you ask this clarification question differently?” Do your XXX have suspicious contacts Human-Gen: what type of set up is this? Recruited-subjects chose to change this to: My WHAT? What was suspicious contacts? Who? 27

Discussion Desirable properties of clarification questions Conciseness Specificity Goal of a generator is to maximize conciseness and specificity Future work: identify properties of an optimal clarification question from the data Classify syntactic constituents whether they should be present in question 28

Summary Presented a set of simple transformation rules for creating targeted clarification questions Simplicity of the rules makes the method more robust to incorrect error segment detection Evaluation with human subjects shows that subjects score generated questions comparably to human- generated questions The user preference is subjective and may differ across subject groups 29

Related Work A system's clarification question may not be appropriate because  An error segment may not be detected correctly  An error type is not identified correctly Automatically detect user responses to “inappropriate” clarification questions 30

Thank you Questions? 31

Requirement for a Targeted Question Constructing an appropriate question requires correct error detection  Error segment boundaries  Error type Does the error contain a proper name? Does the error contain an out-of-vocabulary (OOV) word? 32