Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Goteborg University Dialogue Systems Lab Motivation for using GF with GoDiS TALK meeting Edinburgh 7/
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Towards an NLP `module’ The role of an utterance-level interface.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Natural Language Understanding
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO.
Grammars.
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.
Knowledge Discovery in Ontology Learning A survey.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
1 Computational Linguistics Ling 200 Spring 2006.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
An ICALL writing support system tunable to varying levels of learner initiative Karin Harbusch 1 & Gerard Kempen 2,3 1 University of Koblenz-Landau, Koblenz,
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.
Workshop: Corpus (1) What might a corpus of spoken data tell us about language? OLINCO 2014 Olomouc, Czech Republic, June 7 Sean Wallis Survey of English.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Referring to Objects with Spoken and Haptic Modalities Frédéric LANDRAGIN Nadia BELLALEM & Laurent ROMARY LORIA Laboratory Nancy, FRANCE.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
© 2013 by Larson Technical Services
Conceptual Design Dr. Dania Bilal IS588 Spring 2008.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Language Model for Machine Translation Jang, HaYoung.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
English-Korean Machine Translation System
Computational and Statistical Methods for Corpus Analysis: Overview
Machine Learning in Natural Language Processing
An ICALL writing support system tunable to varying levels
Artificial Intelligence 2004 Speech & Natural Language Processing
Owen Rambow 6 Minutes.
Presentation transcript:

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for human-machine dialogue systems, without training corpora; Multilinguality supported, uniform linguistic coverage of speech interpretation modules case study: synchronising parser’s and SR interface for French for MIAMM project L.Romary*, A.Todirascu**, D.Langlois* *LORIA, Nancy, France **Université de Technologie de Troyes, France Speech Recognition’s Experiments system: ESPERE (small vocabularies) protocol: 88 sentences recorded, 4 speakers, 2 men and 2 females, no OOV, all sentences in language several methods for estimating P(w|v) 1.Frequencies in training corpus: WER = 3.8, SD = Uniform probabilities for bigrams in training corpus: WER = 4.0, SD = 0.4 True frequencies are not useful 3.All bigrams are possible (not null probabilities, back-off with ): WER >> 4.0 Constraint bigrams from grammar are necessary 4.Training corpus (A) is adaptation corpus for a general newspaper one (B): Linear combination  performance are less good than when A is used alone Parser’s Experiments iteration updating process after interaction with other modules adding new lexical entries local grammars generated by a meta-grammar preference for substitution Domain-specific syntactic components (elliptic phrases, navigation verbs, noun groups) mapping lexical entries to domain ontology transforming derivation trees into MMIL representations The Multimedia Information Access Using Multiple Modalities (MIAMM) Project ( ) A prototype of a human-machine dialogue interface regrouping various interaction modalities: speech, haptics, graphics; Case study: searching music into an existing database using all the modalities Multilinguality supported: English, French, German difficulties: multi-modal training corpora not available, Information flow: several models for the same language (one for speech, one for parsing : how to cover the same language), changing specifications during the project Adapting parser’s resources TAG widely accepted formalism for syntactic parsing XML standard for grammars (TAGML) and for semantic representations (MMIL) existing resources for English, French for free texts Speech recognizer’s language model Statistical language model 400 words vocabulary Huge number of possible sentences Training corpus: generated with context-free grammar. Two steps: Bigram model using classes (ex: P(DECADES|the) Uniform distribution of words into classes: P(90’s|DECADES) User Scenarios useful in the context of lacking real human-machine interactions; designed to obtain homogeneous linguistic coverage, for all the languages: several styles (or registers - familiar, elaborated); specific phrases (politeness phrases, time intervals - "from the sixties"); various syntactic components (passive constructions, relative clauses, questions and ellipses); dates or names developers worked independently building exhaustive user scenarios Context-free grammar covers the linguistic phenomena from the user scenario, for every language Technical vocabulary covers the linguistic phenomena from the user scenario, for every language Developping the language resources