Speech-to-Speech Translation with Clarifications Julia Hirschberg, Svetlana Stoyanchev Columbia University September 18, 2013.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

TOASTMASTER OF THE DAY TOASTMASTER OF THE DAY To act as host and conduct the entire program, including introducing the participants. Always lead the applause.

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.

Using media to present ideas . . .

Listening Processes Listen and take notes. Then compare your notes with my notes. How are you doing with your listening skills?

IELTS (International English Language Testing System) Why do we need to know about it? Why do we need to know about it? What does it look like? What does.

S.T.A.I.R.. General problem solving strategy that can be applied to a range problems.

Cognitive Walkthrough More evaluation without users.

Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.

Running Records.

Depth Interviews. Funnel Method Funnel Method let respondent do all the talking let respondent do all the talking can be a diagnostic interview can be.

HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.

Cognitive Processes PSY 334 Chapter 11 – Language Structure.

U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.

Beginning Oral Language and Vocabulary Development

Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.

National Curriculum Key Stage 2

ELDC TESTS Siriporn Pongsurapipat 29 November 2006.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Clarification in Spoken Dialogue Systems: Modeling User Behaviors Julia Hirschberg Columbia University 1.

Advanced Topics in Requirement Engineering. Requirements Elicitation Elicit means to gather, acquire, extract, and obtain, etc. Requirements elicitation.

Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.

Radio CI Pesto. Topics this class Radio Communication Facilities Phonetic Alphabet Aircraft Call Signs Time Standard Phrases Priority of Communication.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Vocabulary Link Listening Pronunciation Speaking Language Link LESSON A Writting Reading Video Program.

G. Herbst Interviews.

Study Group 5 STANAG for Non-Specialists. Task Simplify the STANAG document for administrative purposes Outline salient aspects in non-technical.

The Linguistics of Second Language Acquisition

Speak Smart, Stand Smart, Be Smart

Area Report Machine Translation Hervé Blanchon CLIPS-IMAG A Roadmap for Computational Linguistics COLING 2002 Post-Conference Workshop.

The new languages GCSE: STRATEGIES FOR SUCCESSFUL IMPLEMENTATION.

Active Listening Listening carefully to what the speaker is saying, without judgment or evaluation. Listening to both the content of the message as well.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Understand business uses of presentation software and methods of distribution.

10 Strategies to do well in TOEIC Speaking Exams By Thomas Gowing 26/01/12 By following some tips, non-native English speakers can get high or higher scores.

Ways of Collecting Information Interviews Questionnaires Ethnography Books and leaflets in the organization Joint Application Design Prototyping.

Are you ready to play…. Deal or No Deal? Deal or No Deal?

Focus Education Assessing Reading: Exceeding Year 4 Expectations Year 4 Exceeding Expectations: Reading Locate and use information from a range of.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Introduction to Computational Linguistics

1 My office hours My office is 319 office hours this week: Friday 12:45-2:15 No office hours next week (week 8, April 6 th ) Contact me:

Facilitate Group Learning

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

DESIGNING AN ARTICLE Effective Writing 3. Objectives Raising awareness of the format, requirements and features of scientific articles Sharing information.

Information and Communication Technology Literacy III Grade 8 Ms. Green The key to unlocking your future.

SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Input, Interaction, and Output Input: (in language learning) language which a learner hears or receives and from which he or she can learn. Enhanced input:

Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.

Week 2: Interviews. Definition and Types  What is an interview? Conversation with a purpose  Types of interviews 1. Unstructured 2. Structured 3. Focus.

Activities to Promote Speaking. Speaking is "the process of building and sharing meaning through the use of verbal and non-verbal symbols, in a variety.

An E-Textiles. Virginia Tech e-Textiles Group Design of an e-textile computer architecture – Networking – Fault tolerance – Power aware – Programming.

To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Teaching Listening Why teach listening?

Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita

STANAG for Non-Specialists

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Seminar Four Quality Academic feedback: oral and written

Spoken Dialogue Systems

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Seminar Four Quality Academic feedback: oral and written

Presentation transcript:

Speech-to-Speech Translation with Clarifications Julia Hirschberg, Svetlana Stoyanchev Columbia University September 18, 2013

Outline Main Problem Key Ideas Solution Details Impact Issues, Gaps, and Future work

Speech Translation Speech-to-Speech translation system 3 L1 Speaker lation Speech Question (L1) Translated Question (L2) Answer (L2) Translated Answer (L1) L2 Speaker Translation System

Speech Translation Translation may be impaired by: Speech recognition errors Word Error rate in English side of Transtac is 9% Word error rate in Let’s Go bus information is 50% A speaker may use ambiguous language A speech recognition error may be caused by use of out-of-vocabulary words 4

Translation System Speech Translation Speech-to-Speech translation system Introduce a clarification component 5 L1 Speaker Speech Question (L1) Translated Question (L2) Answer (L2) Translated Answer (L1)) Clarification sub-dialogue Clarification sub-dialogue L2 Speaker Dialogue Manager

Key Ideas Use targeted clarifications Address challenges with targeted clarifications Data collection for system evaluation

Most Common Clarification Strategies in Dialogue Systems “Please repeat” “Please rephrase” System repeats the previous question 7

What Clarification Questions Do Human Speakers Ask? Targeted reprise questions (M. Purver) o Ask a targeted question about the part of an utterance that was misheard or misunderstood, including understood portions of the utterance o Speaker: Do you have anything other than these XXX plans? o Non-Reprise: What did you say?/Please repeat. o Reprise: What kind of plans? 88% of human clarification questions are reprise 12% non-reprise Goal: Introduce targeted (reprise) questions into a spoken system 8

Advantages of Targeted Clarifications More natural User does not have to repeat the whole utterance/command Provides grounding and implicit confirmation  Speech-to-speech translation  Useful in systems that handle natural language user responses/commands/queries and a wide range of topics and vocabulary  Tutoring system  Virtual assistants (in car, in home): a user command may contain ASR error due to noise, background speech, etc. 9

Types of Clarification Questions in the TBOLT System Rephrase part Used when an error is OOV and NOT a name (works on difficult non-OOV words as well) Asks to rephrase the error segment “I did not understand when you said: fiscal. Please give me another word or phrase for it.” Spelling Used for names “Please spell ‘Rockefeller’.” Disambiguation Used to disambiguate between homophones “Did you mean plain as in extensive tract of level open land, or, plane as in an aircraft ?”

Types of Questions (cont.) Reprise (as found in human-human communication) o Repeats part of the utterance before the error segment o User: We will search some of the XXX to make sure everyone is safe. o System: We will search some of the what? Reprise/ Rephrase-part o Combines a targeted question with a rephrase question o System: We will search some of the what? Please say another word or phrase for this: ‘vehicles’. Confirmation o A yes/no question to confirm an utterance o “Did you say ‘the breach is located here’?”

Requirement for a Targeted Question Error Detection  Error segment boundaries  Error type Does the error contain a proper name? Does the error contain an out-of-vocabulary (OOV) word?

Challenges with Using Targeted Clarifications 1)A system's clarification question may not be appropriate because  An error segment may not be detected correctly  An error type is not identified correctly 2)How should a system construct a targeted/reprise clarification question 3)How long should the system persist in its clarification? Our goal is to address these challenges empirically 13

Examples of Inappropriate Clarification Questions Rephrase-part (incorrect error type detected) User: Captain Bell wants to ask you some questions. Error Detection: Captain to ask you some questions. System: I did not understand when you said "Bell wants". Please give me another word or phrase for it. Reprise/ Targeted Rephrase (error segment is identified incorrectly) User: The insurgents are denouncing us. ASR/Error detection: The insurgents are doing now and System: The insurgents are doing now and what? Please say another word or phrase for "cing us".

Examples (cont.) Spelling (error type is incorrectly detected as a name) User: It sounds like there is a kernel of truth to your story. Error Detection: It sounds like there is a of truth to your story. System: I think I heard a name when you said this: "kernel". Please spell that name.

Examples (cont.) Disambiguation (disambiguated word is misunderstood) User: How often do you have problems with generators? ASR/Error Detection: How often do you have problems with general? System: Did you mean general as in broad or general as in a military officer?

Causes of Inappropriate Questions Rephrase part  Partial word is detected as an error  Detected segment contains a name  Detected segment is a function word (to, from, the …) Disambiguate  Neither choice for disambiguation is correct  Disambiguated word is misrecognized Spell  Non-name  Long segment Reprise Contains undetected recognition error

Goal Develop a method to automatically identify when an inappropriate question is asked  Use user’s answers to detect if a question was inappropriate

Data Collection Simulation clarification system Users were asked to read a sentence and then were played a pre-recorded question Led to believe they were interacting with the actual system

Data Collection(cont.) Prepared 228 questions  84 appropriate  144 inappropriate  For each type of clarification questions, create appropriate and inappropriate questions,  Total 19 categories of clarification questions Each subject was asked 144 questions Recorded their initial utterances and their answers to the questions

User Responses Subjects tended to be cooperative Answers varied from subject to subject Example: “I did not understand when you said: ‘Betirma’. Please give me another word or phrase for it.” o “No" o "Betirma" o “Betirma bravo echo tango india romeo mike alpha"

User Responses (cont.) Example 2:  User: “How often do you have problems with generators?”  System: “Did you mean general as in broad or general as in a military officer?” o "generator as in a machine for making electricity" o "no" o "generators"

Method Extract lexical and prosodic features from responses  Number of pauses, speech energy, speech tempo  Lexical and prosodic difference between initial response and an answer to clarification  Measure number of times subjects replay each question  Measure latency: length of pause before answer Determine whether questions are appropriate or inappropriate based on user responses

Challenge 2: Constructing Targeted Clarification Questions Previous work: collected clarification questions using mturk (Stoyanchev et al. 2012, 2013) Using human-generated questions manually created a set of generation rules Evaluated generated questions with human subjects

Types of Questions R_GEN Generic: what? Applies if no other rules apply Sentence: The doctor will most likely prescribe XXX Question: the doctor will most likely prescribe WHAT? R_SYN Syntactic: about what about ? Applies when: there is VB after error; VB and error share a parent Sentence: When was the XXX contacted? Question: When was WHAT contacted? R_NMOD: which ? Applies when: DEP TAG error = NMOD and parent POS = NN | NNS Sentence: Do you have anything other than these XXX plans Question: Which plans? R_START: what about

Evaluation Questionnaire Generated questions automatically using the rules for a set of 84 sentences Asked humans (mturk) to create a clarification questions for the same sentences Questionnaire applied to both human and computer- generated questions

Subjects Mturk Recruited 6 subjects from the lab Inter-annotator Agreement

Results

Discussion R_GEN and R_SYN performance is comparable to human-generated questions R_NMOD (which …?) outperforms all other question types including human-generated questions R_START rule did not work

Key Ideas Use Targeted Clarifications Address challenges with targeted clarifications  Experiment on automatic detection of inappropriate questions  Experiment on automatic detection of when to terminate clarification Data collection for system evaluation

Image Description and Questioning Speaker1:  A car is burning behind the girl  The girl looks startled  There was a massive explosion Speaker2:  A woman is standing in front of a burning car  Everything around her seems to have been destroyed  What caused this destruction? Show user an image and ask to describe it and construct questions

Data Collection for System Evaluation Advantages: Do not prime users with words in a verbally described scenario Elicits natural speech compared to reading Can be extended to a 2-way dialogue where the interviewee is given a narrative or video information for answering interviewer's questions. Disadvantages: Uncontrolled vocabulary (can not force to mispronounce words) No control across subject pairs

Impact Impact on Speech-to-Speech Translation Detecting when a targeted clarification question was inappropriate is an important feature for determining next dialogue move in clarification Impact beyond Speech-to-Speech Translation Targeted clarifications can be used in spoken dialogue systems Especially useful for non-slot-filling (tutoring, virtual assistants)

Future Work Appropriate and inappropriate questions Analyze the data collected in responses to appropriate and inappropriate clarification questions Use machine learning to predict if an utterance is an answer to appropriate or inappropriate clarification question Targeted (reprise) clarification questions Which information from an initial sentence should a reprise clarification question contain? Using human-constructed questions, determine which information is essential to be repeated in a targeted question Clarification length How long should the system focus on a targeted clarification before back off? Collect data and use machine learning to predict on each system’s turn whether a clarification should continue or stops

Conclusions Used an error-simulation system to collect data  Data collection experiment for automatic detection of answers to 'inappropriate' system clarifications  Evaluation of automatically generated reprise clarification questions shows that they could be used in a system  Proposed an experiment for determining an optimal length of targeted clarification Collected audio data for system evaluation using an image description method 36

Thank you Questions? 37

Challenge 3: Clarification Length How long should the system focus on a targeted clarification before back off? In a Speech-to-Speech translation: back-off= translate In spoken dialogue systems : back-off = ask a generic question to 'please rephrase'. The answer depends on how patient and cooperative are users.

Evaluation of Clarification Length BOLT 2012 system behaviour: System asks targeted clarification at most 3 times before translating. Goal: Determine dynamically at each clarification turn whether the system should terminate clarification process. Use data to learn the dialogue strategy

Experiment Design Simulate sequence of unsuccessful clarification questions. Give user an option to hit “translate” button Distractor cases: Simulate successful clarification  User: This computer is not operational  System: Please rephrase “not operational”  User: not working  System: thank you ( translate and show next question) Experimental case:  Loop asking 3 – 5 different targeted questions  Clarification dialogue continues until the user hits “translate” Use a combination of distractor and experimental cases

Method Use data to determine when system should give up on a targeted clarification  Apply machine learning  Features: Dialogue length (more likely to give up as dialogue continues to fail) Question type Appropriateness of a clarification question Confidences of error detection and classification components