Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Pattern-Based French-to-IF Analysis Fill in the blanks IF-to-French Generation within NESPOLE! Hervé Blanchon CLIPS-IMAG
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
Data warehouse example
Towards an NLP `module’ The role of an utterance-level interface.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Technical Architectures
Query Processing and Reasoning How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? Esther Kaufmann and Abraham Bernstein.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
The Use of Speech in Speech-to-Speech Translation Andrew Rosenberg 8/31/06 Weekly Speech Lab Talk.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Introduction to Software Testing
Overview of Search Engines
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
MSF Testing Introduction Functional Testing Performance Testing.
Speech-to-Speech MT JANUS C-STAR/Nespole! Lori Levin, Alon Lavie, Bob Frederking LTI Immigration Course September 11, 2000.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
NineOneOne: Recognizing and Classifying Speech for Handling Minority Language Emergency Calls Udhay Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking,
INTRODUCTION TO WEB DATABASE PROGRAMMING
TESTING STRATEGY Requires a focus because there are many possible test areas and different types of testing available for each one of those areas. Because.
Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO.
revised CmpE 583 Fall 2006Discussion: OWL- 1 CmpE 583- Web Semantics: Theory and Practice DISCUSSION: OWL Atilla ELÇİ Computer Engineering.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Direct Translation Approaches: Statistical Machine Translation
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
SIG IL 2000 Evaluation of a Practical Interlingua for Task-Oriented Dialogue Lori Levin, Donna Gates, Alon Lavie, Fabio Pianesi, Dorcas Wallace, Taro Watanabe,
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie.
Speech-to-Speech MT in the JANUS System Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Speech-to-Speech MT Design and Engineering Alon Lavie and Lori Levin MT Class April
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System Alon Lavie, Carnegie Mellon University Florian Metze, University of.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)
The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.
NESPOLE! is a project which aims at providing a system capable of supporting communication in the field of e-commerce and e-service by resorting to automatic.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Language Technologies Capability Demonstration Alon Lavie, Lori Levin, Alex Waibel Language Technologies Institute Carnegie Mellon University CATANAL Planning.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Recent Advances in Speech Translation Systems ESSLLI-2002 Tutorial Course August 12-16, 2002 Course Organizers: Alon Lavie – Carnegie Mellon University.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
Parsing into the Interlingua Using Phrase-Level Grammars and Trainable Classifiers Alon Lavie, Chad Langley, Lori Levin, Dorcas Wallace,Donna Gates and.
Carnegie Mellon IRST-itc Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue Lori Levin, Donna Gates, Dorcas Wallace, Kay.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates and Kay.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
NCP meeting Jan 27-28, 2003, Brussels Colette Maloney Interfaces, Knowledge and Content technologies, Applications & Information Market DG INFSO Multimodal.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Intelligent Systems Development
English-Korean Machine Translation System
Presentation transcript:

Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay Peterson, Kornel Laskowski MT Class, April 2, 2003

Speech-to-speech translation for E-Commerce applications Partners: CMU, Univ of Karlsruhe, ITC-irst, UJF-CLIPS, AETHRA, APT-Trentino Builds on successful collaboration within C-STAR Improved limited-domain speech translation Experiment with multimodality and with MEMT Showcase-1: Travel and Tourism in Trentino, completed in Nov-2001, demonstrated IST,HLT Showcase-2: expanded travel + medical service

April 2, 2003MT Class3 NESPOLE! System Overview Human-to-human spoken language translation for e-commerce application (e.g. travel & tourism) (Lavie et al., 2002) English, German, Italian, and French Translation via interlingua Translation servers for each language exchange interlingua to perform translation –Speech recognition (Speech  Text) –Analysis (Text  Interlingua) –Generation (Interlingua  Text) –Synthesis (Text  Speech)

April 2, 2003MT Class4 Speech-to-speech in E- commerce Augment current passive web E-commerce with live interaction capabilities Client starts via web, can easily connect to agent for specific detailed information “Thin client” - very little special hardware and software on client PC: browser, MS Netmeeting, Shared Whiteboard

April 2, 2003MT Class5 NESPOLE! User Interfaces

April 2, 2003MT Class6 NESPOLE! Translation Monitor

April 2, 2003MT Class7 NESPOLE! Architecture

April 2, 2003MT Class8 Distributed S2S Translation over the Internet

April 2, 2003MT Class9 Language-specific HLT Servers

April 2, 2003MT Class10 Our Parsing and Analysis Approach Goal: A portable and robust analyzer for task-oriented human-to-human speech, parsing utterances into interlingua representations Our earlier systems used full semantic grammars to parse complete DAs –Useful for parsing spoken language in restricted domains –Difficult to port to new domains Current focus is on improving portability to new domains (and new languages) Approach: Continue to use semantic grammars to parse domain-independent phrase-level arguments and train classifiers to identify DAs

April 2, 2003MT Class11 Interchange Format Interchange Format (IF) is a shallow semantic interlingua for task-oriented domains Utterances represented as sequences of semantic dialog units (SDUs) IF representation consists of four parts –Speaker –Speech Act –Concepts –Arguments speaker : speech act +concept* +arguments* } Domain Action

April 2, 2003MT Class12 Hybrid Analysis Approach Text  Argument Parser Text Arguments  SDU Segmenter Text Arguments SDUs  DA Classifier IF  Use a combination of grammar-based phrase- level parsing and machine learning to produce interlingua (IF) representations

April 2, 2003MT Class13 Hybrid Analysis Approach Hello. I would like to take a vacation in Val di Fiemme. c:greeting (greeting=hello) c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=(place-name=val_di_fiemme)) hello i would like to take a vacation in val di fiemme SDU1SDU2 greeting=disposition=visit-spec=location= helloi would like totake a vacationin val di fiemme greetinggive-information+disposition+trip greeting=disposition=visit-spec=location= helloi would like totake a vacationin val di fiemme

April 2, 2003MT Class14 Argument Parsing Parse utterances using phrase-level grammars SOUP Parser (Gavaldà, 2000): Stochastic, chart-based, top-down robust parser designed for real-time analysis of spoken language Separate grammars based on the type of phrases that the grammar is intended to cover

April 2, 2003MT Class15 Domain Action Classification Identify the DA for each SDU using trainable classifiers Two TiMBL (k-NN) classifiers –Speech act –Concept sequence Binary features indicate presence or absence of arguments and pseudo- arguments

April 2, 2003MT Class16 Using the IF Specification Use knowledge of the IF specification during DA classification –Ensure that only legal DAs are produced –Guarantee that the DA and arguments combine to form a valid IF representation Strategy: Find the best DA that licenses the most arguments –Trust parser to reliably label arguments –Retaining detailed argument information is important for translation

April 2, 2003MT Class17 Evaluation: Classification Accuracy 20-fold cross-validation using the NESPOLE! travel domain database EnglishGerman SDUs Domain Actions Speech Acts 70 Concept Sequences Vocabulary The database: Most Frequent Class: EnglishGerman Speech Act 41.4%40.7% Concept Sequence 38.9%40.3%

April 2, 2003MT Class18 Evaluation: Classification Accuracy EnglishGerman Speech Acts 81.25%78.93% Concept Sequences 69.59%67.08% Classification Performance Accuracy

April 2, 2003MT Class19 Evaluation: End-to-End Translation English-to-English and English-to-Italian Training set: ~8000 SDUs from NESPOLE! Test set: 2 dialogs, only client utterances Uses IF specification fallback strategy Three graders, bilingual English/Italian speakers Each SDU graded as perfect, ok, bad, very bad Acceptable translation = perfect+ok Majority scores

April 2, 2003MT Class20 Evaluation: End-to-End Translation Speech recognizer hypotheses66.7%WAR: 56.4% English Source Input Target Language Acceptable (OK + Perfect) Translation fromEnglish 68.1% Human TranscriptionItalian 69.7% Translation fromEnglish 50.4% SR HypothesisItalian 50.2%

April 2, 2003MT Class21 Evaluation: Data Ablation Experiment

April 2, 2003MT Class22 Domain Portability Experimented with porting to a medical assistance domain in NESPOLE! Initial medical domain system up and running, with reasonable coverage of flu-like symptoms and chest pain Porting the interlingua, grammars and modules for English, German and Italian required about 6 person months in total –Interlingua development: ~180 hours –Interlingua annotation: ~200 hours –Analysis grammars, training: ~250 hours –Generation development: ~250 hours

April 2, 2003MT Class23 New Development Tools

April 2, 2003MT Class24 Questions?

April 2, 2003MT Class25 Grammars Argument grammar –Identifies arguments defined in the IF s[arg:activity-spec=]  (*[object-ref=any] *[modifier=good] [biking]) –Covers "any good biking", "any biking", "good biking", "biking", plus synonyms for all 3 words Pseudo-argument grammar –Groups common phrases with similar meanings into classes s[=arrival=]  (*is *usually arriving) –Covers "arriving", "is arriving", "usually arriving", "is usually arriving", plus synonyms

April 2, 2003MT Class26 Grammars Cross-domain grammar –Identifies simple domain-independent DAs s[greeting] ([greeting=first_meeting] *[greet:to-whom=]) –Covers "nice to meet you", "nice to meet you donna", "nice to meet you sir", plus synonyms Shared grammar –Contains low-level rules accessible by all other grammars

April 2, 2003MT Class27 Segmentation Identify SDU boundaries between argument parse trees Insert a boundary if either parse tree is from cross-domain grammar Otherwise, use a simple statistical model

April 2, 2003MT Class28 Using the IF Specification Check if the best speech act and concept sequence form a legal IF If not, test alternative combinations of speech acts and concept sequences from ranked set of possibilities Select the best combination that licenses the most arguments Drop any arguments not licensed by the best DA

April 2, 2003MT Class29 Grammar Development and Classifier Training Four steps 1.Write argument grammars 2.Parse training data 3.Obtain segmentation counts 4.Train DA classifiers Steps 2-4 are automated to simplify testing new grammars Translation servers include a development mode for testing new grammars

April 2, 2003MT Class30 Evaluation: IF Specification Fallback 182 SDUs required classification 4% had illegal DAs 29% had illegal IFs Mean arguments per SDU: 1.47 Changed Speech Act5% Concept Sequence26% Domain Action29% Arguments dropped per SDU Without fallback0.38 With fallback0.07

April 2, 2003MT Class31 Evaluation: Data Ablation Experiment 16-fold cross validation setup Test set size (# SDUs): 400 Training set sizes (# SDUs): 500, 1000, 2000, 3000, 4000, 5000, 6009 (all data) Data from previous C-STAR system No use of IF specification

April 2, 2003MT Class32 Future Work Alternative segmentation models, feature sets, and classification methods Multiple argument parses Evaluate portability and robustness –Collect dialogues in a new domain –Create argument and full DA grammars for a small development set of dialogues –Assess portability by comparing grammar development times and examining grammar reusability –Assess robustness by comparing performance on unseen data

April 2, 2003MT Class33 References Cattoni, R., M. Federico, and A. Lavie Robust Analysis of Spoken Input Combining Statistical and Knowledge-Based Information Sources. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Trento, Italy. Daelemans, W., J. Zavrel, K. van der Sloot, and A. van den Bosch TiMBL: Tilburg Memory Based Learner, version 3.0, Reference Guide. ILK Technical Report Gavaldà, M SOUP: A Parser for Real-World Spontaneous Speech. In Proceedings of the IWPT-2000, Trento, Italy. Gotoh, Y. and S. Renals. Sentence Boundary Detection in Broadcast Speech Transcripts In Proceedings on the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium, Paris. Lavie, A., F. Metze, F. Pianesi, et al Enhancing the Usability and Performance of NESPOLE! – a Real-World Speech-to-Speech Translation System. In Proceedings of HLT-2002, San Diego, CA.

April 2, 2003MT Class34 References Lavie, A., C. Langley, A. Waibel, et al Architecture and Design Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications. In Proceedings of HLT-2001, San Diego, CA. Lavie, A., D. Gates, N. Coccaro, and L. Levin Input Segmentation of Spontaneous Speech in JANUS: a Speech-to-speech Translation System. In Dialogue Processing in Spoken Language Systems: Revised Papers from ECAI-96 Workshop, E. Maier, M. Mast, and S. Luperfoy (eds.), LNCS series, Springer Verlag. Lavie, A GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language. PhD dissertation, Technical Report CMU-CS , Carnegie Mellon University, Pittsburgh, PA. Munk, M Shallow Statistical Parsing for Machine Translation. Diploma Thesis, Karlsruhe University. Stevenson, M. and R. Gaizauskas. Experiments on Sentence Boundary Detection In Proceedings of ANLP and NAACL-2000, Seattle. Woszczyna, M., M. Broadhead, D. Gates, et al A Modular Approach to Spoken Language Translation for Large Domains. In Proceedings of AMTA- 98, Langhorne, PA.