May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.

Slides:

Advertisements

Similar presentations

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advertisements

Silvia Mosso 1 Research Connection 2009: the LUNA project 1 Prague, 7 May 2009 Research Connection 2009 Silvia Mosso LUNA: the Power of Understanding.

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.

RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA Paul Piwek, ITRI, Brighton Brigitte Krenn, OFAI, Vienna Marc Schröder,

INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

The Use of Speech in Speech-to-Speech Translation Andrew Rosenberg 8/31/06 Weekly Speech Lab Talk.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Why is ASR Hard? Natural speech is continuous

Chapter 10: Architectural Design

The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( )

Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

Speech Recognition and Machine Translation Stephan Kanthak AIXPLAIN AG, Aachen, Germany.

Working group on multimodal meaning representation Dagstuhl workshop, Oct

APPLICATIONS OF CONTEXT FREE GRAMMARS BY, BRAMARA MANJEERA THOGARCHETI.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Chennai, 17./18. Feb 04Andreas KlüterNLP System Software Engineering Verbmobil from a Software Engineering point of view System Design and Software Integration.

Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.

Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.

Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.

Introduction to Computational Linguistics

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Performance Comparison of Speaker and Emotion Recognition

Chapter 8. Situated Dialogue Processing for Human-Robot Interaction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Sabaleuski.

金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.

Listening comprehension is at the core of second language acquisition. Therefore demands a much greater prominence in language teaching.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

HIGH SCHOOL TEACHER TRAINING WORKSHOP

To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.

Neural Machine Translation

Automatic Speech Recognition

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Chunk Parsing CS1573: AI Application Development, Spring 2003

Presentation transcript:

May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil

May 2006CLINT-CS Verbmobil2 Verbmobil Verbmobil is a spoken dialogue system that provides phone users with simultaneous dialogue interpretation services for restricted topics. Recognises spoken input, translates it, and then utters the translation. Three languages: German, English and Japanese

May 2006CLINT-CS Verbmobil3 Challenges for S and L Technology Input Conditions NaturalnessAdaptabilityDialogue Capabilities Close speaking, PTT Isolated wordsSpeaker dependent Monologue dictation Telephone, pause based segmentation Read continuous speech Speaker independent Information seeking dialogue Open microphone, GSM quality Spontaneous speech Speaker adaptive Multiparty negotiation Increasing difficulty

May 2006CLINT-CS Verbmobil4 Grand Challenges Not a push-to-talk system. Has to decide for itself when user input is complete. Spontaneous speech including disfluencies and repair phenomena. Speaker adaptive. Mixed initiative dialogue Three different domains of discourse

May 2006CLINT-CS Verbmobil5 Domains Scenario 1 Appointment Scheduling Scenario 2 Travel Planning Scenario 3 Remote PC Maintenance When? Focus on temporal expressions Vocabulary 2.5-6K When? Where? How? Focus on Temporal and spatial expresssions Vocabulary 7-10K What? When? Wherer? How? Focus on integration of special sublanguage lexica Vocabulary 15-30K

May 2006CLINT-CS Verbmobil6 Data Collection Transliterated speech data Segmented speech with prosodic labels Dialogues annotated with dialogue acts Treebanks & predicate argument structures Aligned bilingual Corpora A signficant programme of data collection was performed To extract statistical properties of different kinds of data

May 2006CLINT-CS Verbmobil7 Speech Data Multi channel recording –close-speaking microphone –room microphone –various telephones Speech recognisers trained on data sets of different audio quality

May 2006CLINT-CS Verbmobil8 Multi Level Data Annotation Speech Data –Transliteration –Orthography –Pronunciation –Phonological Segmentation –Word Segmentation –Prosodic Segmentation Non Speech –Dialogue Acts –Treebanks

May 2006CLINT-CS Verbmobil9 Statistical Models Data used to train different statistical models using Machine Learning. Models include –Neural Networks –Probabilistic Automata (HMMs for speech) –Probabilistic CFGs (robust parsing) –Probabilistic Transfer Rules

May 2006CLINT-CS Verbmobil10

May 2006CLINT-CS Verbmobil11 Architecture Different input devices (microphone, telephone, mobile, internet) Multilingual speech recognition (EN, DE, JP) including prosodic analysis Parsing Multi-level translation Multi-lingual generation

May 2006CLINT-CS Verbmobil12 Multi Engine Parsing Architecture Three different parsing models are employed –Probabilistic LR Parser –Robust Chunk Parsing –HPSG Chart Parser All parsing models produce trees that are tranformed into the same multistratal representation called VIT (Verbmobil Interface Terms) This facilitates integration of partial results from the different parsing models

May 2006CLINT-CS Verbmobil13 Translation Models Substring Based Template Based Dialogue Act Based

May 2006CLINT-CS Verbmobil14 Substring Based Translation Starts with the best sentence hypothesis of the speech recogniser Uses prosodic information to determine phrase boundaries and sentence mode Machine Learning methods applied to a sentence-aligned bilingual corpus The output of this module is a sequence of words in the target language together with a confidence measure that is used for selecting the best translation.

May 2006CLINT-CS Verbmobil15 Template Based Translation Based on 30K translation templates learned from a sentence-aligned corpus T i = (T i s,T i t ){x 1,..,x n } 3 phases: –SL Template matching –Subphrase Translation –TL utterance generation

May 2006CLINT-CS Verbmobil16 Template Translation Results WL Best Hypothesis All Word Lattice Perfect Translation47%67% Approx. Correct16%6% Bad Translation15%5% No Translation22%

May 2006CLINT-CS Verbmobil17 Multi Engine Translation Segment 1 If you prefer another hotel Segment 2 please let me know case based translation substring based translation selection module statistical translation dialogue based translation semantic transfer Segment 1 Semantic Xfer Segment 2 CBT

May 2006CLINT-CS Verbmobil18 Dialogue Act Based Translation Meaning based translation Statistical classification of 19 dialogue acts. Extraction of propositional content using finite state transducers. Content built from an ontology covering appointment scheduling and travel planning tasks. Template based approach to generation of target language from content.

May 2006CLINT-CS Verbmobil19 Part of Ontology for Propositional Content top object situation quality agent location event action abstract concrete move-by-rail move-by-plane move by public transport journey move stay show meeting

May 2006CLINT-CS Verbmobil20 Dialogue Act Hierarchy deliberate thank introduce bye greet control dialogue promote task manage task Dialogue Act request suggest request clarify request comment request commit digress exclude clarify justify request suggest inform feedback commit offer init defer close

May 2006CLINT-CS Verbmobil21 Dialogue –Based Translation: Transfer Component rules Semantic Representation Source Language VIT Semantic Representation Target Language VIT Dialogue and context evaluation GENERATION

May 2006CLINT-CS Verbmobil22 Prosody Input –Speech signal –Word Hypothesis Graph (WHG) Output –annotated WHG including, per word –duration, pitch, energy, pause info Used to classify phrase and clause boudaries, accented words, and sentence mood.

May 2006CLINT-CS Verbmobil23 Prosody – Sentence Mood row? mor You are coming to You are coming to mor ro w. time pitch

May 2006CLINT-CS Verbmobil24 Use of Prosodic Information Prosodic information is used systematically at all processing stages Prosodic difference can lead to different translation… wir haben noch (we still have vs. we have another)

May 2006CLINT-CS Verbmobil25 Multi Blackboard Architecture Final system comprises 69 highly interactive modules. No direct communication between modules. Communication is handled by 198 blackboards. Shared representation structures A module typically subscribes to several blackboards.

May 2006CLINT-CS Verbmobil26 Blackboards & Modules command recogniser generation robust dialogue semantics semantic construction spontaneous speech recogniser speaker adaptation prosodic analysis chunk parser HPSG parser semantic transfer statisstical parser dialogue act recognition Audio Data WHG with prosodic labels VIT discourse representation

May 2006CLINT-CS Verbmobil27 Multi Engine Approach statistical parser chunk parser HPSG parser robust dialogue semantic KBased reconstruction complete and spanning VIT chart containing partial VITs Augmented WHG

May 2006CLINT-CS Verbmobil28 Achievements 3 language pairs, three domains and a vocalbulary size of over 100K word forms Average processing time 4x original signal duration Word recognition rate of 75% for spontaneous speech 80% approximately correct translations 90% success rate for dialogue tasks in end- to-end evaluation

May 2006CLINT-CS Verbmobil29 Conclusion Speech to speech translation of spontaneous dialogues can only be cracked by combining deep and shallow processing The final architecture maximises the necessary interaction between processing modules Software engineering considerations must be taken seriously in such a project.