Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Slides:



Advertisements
Similar presentations
Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
NorSource The Norwegian HPSG Resource Grammar presented at Språkteknologiske Ressurser ved NTNU Trondheim
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Grammar Development Platform Miriam Butt October 2002.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Kakia Chatsiou GreekGram: Building a parallel grammar for Modern Greek LAC day GreekGram Building a parallel grammar for Modern Greek Kakia.
Kakia Chatsiou Modern Greek Grammar fragment Implementation using XLE FLATLANDS GreekGram Reporting on the progress of the implementation.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
LTAG Semantics on the Derivation Tree Presented by Maria I. Tchalakova.
Generation Miriam Butt January The Two Sides of Generation 1) Natural Language Generation (NLG) Systems which take information from some database.
Towards an NLP `module’ The role of an utterance-level interface.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
Integrating Finite-state Morphologies with Deep LFG Grammars Tracy Holloway King.
PZ02A - Language translation
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Creation of a Russian-English Translation Program Karen Shiells.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Kakia Chatsiou A brief introduction to XLE LG617 - XLE Lab1 LG617 A brief introduction to XLE Kakia Chatsiou Dept of Language.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
1 Natural Language Processing Lecture Notes 11 Chapter 15 (part 1)
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Kakia Chatsiou A brief introduction to XLE LG617 - XLE Lab1 LG617 A brief introduction to XLE Kakia Chatsiou Dept of Language.
Lecture 12 Applications and demos. Building applications Previous lectures have discussed stages in processing: algorithms have addressed aspects of language.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Languages at Inxight Ian Hersey Co-Founder and SVP, Corporate Development and Strategy.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Natural Language Processing Slides adapted from Pedro Domingos
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
NATURAL LANGUAGE PROCESSING
Composing Music with Grammars. grammar the whole system and structure of a language or of languages in general, usually taken as consisting of syntax.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
CS 326 Programming Languages, Concepts and Implementation
[A Contrastive Study of Syntacto-Semantic Dependencies]
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Approaches to Machine Translation
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Artificial Intelligence 2004 Speech & Natural Language Processing
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Presentation transcript:

Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik

Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian A 4-year project ( ) involving groups at: The University of Oslo The University of Bergen NTNU (The University of Trondheim) Cooperation with PARC (John Maxwell) and others

The LOGON system Schematic architecture

XLE: Xerox Linguistic Environment A platform developed over more than 20 years at Xerox PARC (now PARC) Developer: John Maxwell LFG grammar development Parsing Generation Transfer Stochastic parse selection Interaction with shallow methods

An LFG analysis: Det regnet 'It rained'

Develops parallel grammars on XLE: English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese ‘Parallel grammars’ means parallel f-structures: A common inventory of features Common principles of analysis ParGram: The Parallel Grammar Project A long-term project (1993-)

LOGON Analysis Modules Input string Tokenization Named ent. Compounds Morphology LFG lexicons: NKL-derived Hand coded Lexical templates Syntactic rules Rule templates c-structures f-structures MRSs Norsk ordbank lexicon XLE Parser NorGram String of stems and tags Output-input Supporting knowledge base

Scope of NorGram Lexicon: about lemmas. In addition: Automatically analyzed compounds Automatically recognized proper names "Guessed" nouns Syntax: 229 complex rules, giving rise to about arcs Semantics: Minimal Recursion Semantics projections for all readings

Coverage Performance on an unknown corpus of newspaper text: 17 randomly selected pieces of text, limited to coherent text, comprising 1000 sentences taken from 9 newspapers Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende, Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys, from the editions on November 11th 2005.

The LOGON challenge: From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.

Semantics for translation: Two issues The representational subset problem - Desirable: normalization to flat structures with unordered elements. Complete and detailed semantic analyses may be unnecessary. - Desirable: rich possibilities of underspecification

Basics of Minimal Recursion Semantics Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I. Sag A framework for the representation of semantic information Developed in the context of HPSG and machine translation (Verbmobil) Sources of inspiration: - Quasi-Logical Form (H. Alshawi): underspecification, e.g. of quantifier scope - Shake-and-bake translation (P. Whitelock): a bag of words as interface structure

An MRS representation is a bag of semantic entities (some corresponding to words, some not), each with a handle, plus a bag of handle constraints allowing the underspecification of scope, plus a handle and an index. Each semantic entity is referred to as an Elementary Predication (EP). Relations among EPs are captured by means of shared variables. There are three elementary variable types: - handles (or 'labels') (h) - events (e) - referential indices (x)

From standard logical form to MRS «Every ferry crosses some fjord» Two readings: Replace operators with generalized quantifiers: every(variable, restriction, body) some(variable, restriction, body) The first reading (wide-scope every): varrestrictionbody

Make the structure flat: give each EP a handle replace embedded EPs by their handles collect all EPs on the same level (understood as conjunction)

Underspecified scope by means of handle constraints: Make the structure flat: give each EP a handle replace embedded EPs by their handles collect all EPs on the same level (understood as conjunction) Wide scope: someWide scope: every

MRS as feature structure (also adding event variables): Norwegian translation: «Hver ferge krysser en fjord»

Projecting MRS representations from f-structures «Katten sover» 'The cat sleeps'

Projecting MRS representations from f-structures «Katten sover» 'The cat sleeps'

mrs::

  Composition: Top-level MRS with unions of HCONS and RELS:

Post-processing this structure brings us back to the LOGON MRS format:

Examples

bil 'car' (as in "Han kjøpte bil" 'He bought [a] car') No SPEC

disse hans mange spørsmål 'these his many questions' Multiple SPECs

Han jaget barnet ut nakent 'He chased the child out naked'

The Transfer Component Developer of the formalism: Stephan Oepen

Example of transfer Source sentence: Henterhanbilensin? fetcheshe car.DEFPOSS.REFL.SG.MASC 'Does he fetch his car?' Alternative reading: 'Does he fetch the one of the car?'

Parse output:

Choosing the first reading of Henter han bilen sin?

The variables have features. Interrogative is coded as [SF ques] on the event variable.

Two of four transfer outputs

Norwegian transfer input One of four English transfer outputs

Generator output from the chosen transfer output

Transfer formalism (Stephan Oepen) The form of a transfer rule: C = context I = input F = filter O = output

Simple example: Lexical transfer rule, transferring bekk into creek No context, no filter, only the predicate is replaced.

Example with a context restriction: gå en tur (lit. 'go a trip') is transferred into the light-verb construction take a trip. In the context of _tur_n as its second argument, _gå_v is transferred to _take_v.

The SEM-I (Semantic Interface) A documentation of the external semantic interface for a grammar, crucial for the writer of transfer rules. In order to enforce the maintaining of a SEM-I, LOGON parsing returns fail if every parse contains at least one predicate not in the SEM-I.

A small section of the verb part of the NorGram SEM-I Size of the Norwegian SEM-I: slightly less than 6000 entries

Parse Selection Parsing, transfer and generation may each give many solutions, leading to a fanout tree: The outputs at each of the three stages are statistically ranked.

Example Example of a four-way ambiguity: Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain' The Parsebanker Efficient treebank building by discriminants Developer: Paul Meurer, Bergen Predecessors in discriminant analysis: David Carter (1997) Stephan Oepen, Dan Flickinger & al. (2003)

1 2

3 4

Packed representations and discriminants (Paul Meurer)

Clicking on one discriminant is in this case sufficient to select a unique solution:

The Parsebanker

'After all, a human being must be something more than a machine?'

TigerSearch The implementation is under development by Paul Meurer Find selected prepositional phrases with sentential objects:

Find selected prepositional phrases with the preposition 'om' and nominal objects:

Find topicalized objects: