Linguistic and Logical Tools for an Advanced Interactive Speech System in Spanish J. Álvarez, V. Arranz, N. Castell & M. Civit TALP Research Centre UPC,

Slides:



Advertisements
Similar presentations
Calendar & Dates Para empezar p el calendario the calendar.
Advertisements

CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.
An interactive environment for creating and validating syntactic rules Panagiotis Bouros*, Aggeliki Fotopoulou, Nicholas Glaros Institute for Language.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM J. Ferreiros,
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 Spontaneous-Speech Dialogue System In Limited Domains ( ) Development of an oral human-machine interface, by way of dialogue, for a semantically.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
9/8/20151 Natural Language Processing Lecture Notes 1.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
April 2005CSA2050:NLTK1 CSA2050: Introduction to Computational Linguistics NLTK.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
1 Computational Linguistics Ling 200 Spring 2006.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
A Language Independent Method for Question Classification COLING 2004.
Introduction to CL & NLP CMSC April 1, 2003.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
CPSC 503 Computational Linguistics
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Supertagging CMSC Natural Language Processing January 31, 2006.
Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.
Natural Language Processing Slides adapted from Pedro Domingos
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
FreeLing Uncovered Lluís Padró Centre de Recerca TALP Universitat Politècnica de Catalunya
Basic Parsing with Context Free Grammars Chapter 13
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Machine Learning in Natural Language Processing
Natural Language - General
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Linguistic and Logical Tools for an Advanced Interactive Speech System in Spanish J. Álvarez, V. Arranz, N. Castell & M. Civit TALP Research Centre UPC, Barcelona

Contents Introduction Corpora Construction System Architecture Understanding Module: –Input Problems & Solutions Adopted –Language Processing: Morphology Syntax Semantic Extraction Dialogue Manager Conclusions

Introduction Increasing need for more natural HMI Development of a dialogue system: –spontaneous speech –restricted-domain: railway information –rather user-friendly communication exchange –language of application: Spanish Other related systems: ATIS, TRAINS, LIMSI ARISE, TRINDI,...

Corpora Construction Project objective: none available in Spanish Two different corpora developed: –human-human –human-machine (Wizard of Oz technique): 150 different situations + an open scenario total of 227 dialogues

System Architecture

Understanding Module

Input Problems (1) Recognition Errors: –Excess of information: U: “sábado treinta de octubre” (“Saturday, October 30”) R: “un tren que o sábado treinta de octubre” (“a train that or...”) –Erroneous recognition: U: “gracias” (“thank you”) R: “sí pero ellos”(“yes but they”)

Input Problems (2) –Grammar errors: Lack of prep+det contractions: “de el”  “del” Wrong use of indefinite determiner: “un de octubre”  “uno de octubre” (“1 October”) Wrong orthographical transcriptions: “qué/que, a/ha”,...  (“what/that, to/has,...”)

Input Problems (3) Problems caused by spontaneous speech: –Syntactic disfluencies: U: “a ver los horarios de los trenes que van de Teruel a Barcelona el este próximo viernes y que vayan de Barcelona a Teruel el próximo que vuelvan de Barcelona a Teruel el próximo domingo” –Lexical disfluencies, pauses, noises,...

Solutions Adapting the recogniser to the domain Adapting the recogniser to spontaneous speech Adapting the understanding module Closing the entry channel

Tools MACO+: Morphological Analyzer Corpus Oriented [Carmona et al., 98] RELAX: Relaxation Labelling Based Tagger [Padró, 97] TACAT: Tagged Corpus Analyzer Tool [Castellón et al., 98] PRE+: Production Rule Environment [Turmo, 99]

Example User turn: “Me gustaría información sobre trenes de Guadalajara a Cáceres para la primera semana de agosto” (“I would like some information about trains from Guadalajara to Caceres for the first week of August”)

Language Processing Transcription Text Morphological Analysis Syntactic Analysis Analysed Text Semantic Extraction Frames Understanding

Morphology (1) MACO+: –contains knowledge organised into classes and inflection paradigms –uses a task/domain lexicon: less ambiguity and better execution time –provides all possible labels per word RELAX: –disambiguates obtained labels –is constraint based with relaxation labelling

Morphology (2) me yo PP1CSO00 gustaría gustar VMCP1S0 información información NCFS000 sobre sobre SPS00 trenes tren NCMP000 de de SPS00 Guadalajara guadalajara NP000C0 a a SPS00 Cáceres cáceres NP000C0 para para SPS00 la la TDFS0 primera primero MOFS00 semana semana NCFS000 de de SPS00 agosto agosto NCMS000.. Fp

Syntax (1) TACAT: –shallow parser –context-free grammar adapted for the domain: rules re-written for dates, timetables and proper names –bottom-up strategy –this adaptation helps semantic searches

Syntax (2) [ { pos=>S } [ { pos=>patons } [ { pos=>pp1cso00, forma=>"Me", lema=>"yo" } ] ] [ { pos=>grup-verb } [ { pos=>vmcp3s0, forma=>"gustaría", lema=>"gustar" } ] ] [ { pos=>sn } [ { pos=>ncfs000, forma=>"información", lema=>"información" } ] ] [ { pos=>grup-sp } [ { pos=>sps00, forma=>"sobre", lema=>"sobre" } ] [ { pos=>sn } [ { pos=>ncmp000, forma=>"trenes", lema=>"tren" } ] ] ] [ { pos=>grup-sp } [ { pos=>sps00, forma=>"de", lema=>"de" } ] [ { pos=>sn } [ { pos=>np000c0, forma=> "Guadalajara", lema=> " Guadalajara" } ] ] ] [ { pos=>grup-sp } [ { pos=>sps00, forma=>"a", lema=>"a" } ] [ { pos=>sn } [ { pos=>np000c0, forma=>"Cáceres", lema=> " Cáceres" } ] ] ] [ { pos=>grup-sp }

Semantic Extraction (1) (HORA-SALIDA) CIUDAD-ORIGEN: Guadalajara CIUDAD-DESTINO: Cáceres INTERVALO-FECHA-SALIDA: / Aim: generation of semantic frames

Semantic Extraction (2) System implemented in PRE+ PRE+: –production rule environment –very flexible and robust: rule conditions contain syntactic patterns and lexical items to search for priority, score and control: allow to specify rule application, location of concept to extract,...

Semantic Extraction (3) (rule CiudadOrigen3 ruleset CiudadOrigen priority 10 score [0,_,1,0] control forever ending Postrule (InputSentence ^tree tree_matching( [{pos=>grup-sp} [{lema=> de|desde}] [{pos=> np000c0, forma=>?forma}] ])) -> (?_ := Print(CiudadOrigen,?forma)) (?_ := REM(CiudadOrigen,X,+a)))

Understanding Module

Dialogue Manager (1) Implemented using YAYA [Alvarez, 00] Reasoning engine combines: –frames from the understanding module, with –facts from the dialogue history, and with –axioms in order to generate: –reaction facts from the system Output based on frames: –for the natural language generator (content) –for the recogniser (Speech Act prediction)

Dialogue Manager (2) Sentence to generate: “De Guadalajara a Cáceres ¿qué día desea viajar?” (“From Guadalajara to Caceres, when do you wish to travel?”) Output Frame: (CONFIRMACIÓN) TIPO: implícita CIUDAD-ORIGEN: Guadalajara CIUDAD-DESTINO: Cáceres (SOLICITUD) CONCEPTO: FECHA-SALIDA

Conclusions Corpus development: valuable resource Adaptation of general NLP tools for: –domain –spontaneous speech dialogue Development of new tools: –semantic extraction (use of PRE+): flexible & robust –dialogue manager (use of YAYA): fast to develop & easy to modify Challenge: processing in real time

Linguistic and Logical Tools for an Advanced Interactive Speech System in Spanish J. Álvarez, V. Arranz, N. Castell & M. Civit TALP Research Centre UPC, Barcelona