Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.

Slides:



Advertisements
Similar presentations
Feature-based Grammar Ling 571 Deep Techniques for NLP February 2, 2001.
Advertisements

Rule Learning – Overview Goal: learn transfer rules for a language pair where one language is resource-rich, the other is resource-poor Learning proceeds.
Natural Language Processing Syntax. Syntactic structure John likes Mary PN VtVt NP VP S DetPNVtVt NP VP S Every man likes Mary Noun.
 adj (adjectif)  adv (adverbe)  det (déterminant)  nom  prep (préposition)  pron (pronom)  verbe.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
1 Pertemuan 23 Syntatic Processing Matakuliah: T0264/Intelijensia Semu Tahun: 2005 Versi: 1/0.
Resource Acquisition for Syntax-based MT from Parsed Parallel data Alon Lavie, Alok Parlikar and Vamshi Ambati Language Technologies Institute Carnegie.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.
Enabling MT for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Issues in Computational Linguistics: Parsing and Generation Dick Crouch and Tracy King.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
CS 4705 Lecture 11 Feature Structures and Unification Parsing.
Context-Free Parsing Part 2 Features and Unification.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
1 Features and Unification Chapter 15 October 2012 Lecture #10.
Chapter 16: Features and Unification Heshaam Faili University of Tehran.
Ling 570 Day 17: Named Entity Recognition Chunking.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
PHRASE STRUCTURE GRAMMARS RTNs ATNs Augmented phrase structure rules / trees.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
6/2/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 9 Giuseppe Carenini.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
2007CLINT-LIN-FEATSTR1 Computational Linguistics for Linguists Feature Structures.
Syntax Why is the structure of language (syntax) important? How do we represent syntax? What does an example grammar for English look like? What strategies.
ENGLISH SYNTAX Introduction to Transformational Grammar.
CPSC 503 Computational Linguistics
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Learning Transfer Rules for Machine Translation with Limited Data Thesis Defense Katharina Probst Committee: Alon Lavie (Chair) Jaime Carbonell Lori Levin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Carnegie Mellon School of Computer Science Copyright © 2007, Carnegie Mellon. All Rights Reserved. 1 LTI Grammars and Lexicons Grammar Writing Lecture.
W3C PROV Constraints ISWC 2013 Paul Groth slide help from Ivan Herman.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
Nov 17, 2005Learning-based MT1 Learning-based MT Approaches for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon.
Chapter 11: Parsing with Unification Grammars Heshaam Faili University of Tehran.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Enabling MT for Languages with Limited Resources Alon Lavie and Lori Levin Language Technologies Institute Carnegie Mellon University.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Natural Language Processing Vasile Rus
Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Stat-Xfer מציגים: יוגב וקנין ועומר טבח, 05/01/2012
CPSC 503 Computational Linguistics
Answer Questions about Exam2 problems
Presentation transcript:

Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS level; no syntactic structure 2) Add compositional structure to Seed Rule by exploiting previously learned rules 3) Seeded Version Space Learning group seed rules by constituent sequences and alignments, seed rules form s-boundary of VS; generalize with validation

Flat Seed Generation Create a seed rule that is specific to the sentence pair, but abstracted to the pos level. Use SL information (e.g. parses), and any TL information. E.g.: The highly qualified applicant visited the company. Der äußerst qualifizierte Bewerber besuchte die Firma. ((1,1),(2,2),(3,3),(4,4),(5,5),(6,6)) S::S [det adv adj n v det n]→ [det adv adj n v det n] (;;alignments: (x1::y1) (x2::y2) (x3::y3) (x4::y4) (x5::y5) (x6::y6) (x7::y7) ;;constraints: ((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3sg) … )

Compositionality If there is a previously learned rule that can account for part of the sentence, adjust seed rule to reflect this compositional element. Adjust constituent sequences, alignments, and constraints: add context constraints (from possible translations), remove unnecessary ones S::S [det adv adj n v det n]→ [det adv adj n v det n] (;;alignments: (x1::y1) (x2::y2) (x3::y3) (x4::y4) (x5::y5) (x6::y6) (x7::y7) ;;constraints: ((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y4 agr) = *3sg) … ) S::S [NP v det n]→ [NP v det n] (;;alignments: (x1::y1) (x2::y2) (x3::y3) (x4::y4) (x5::y5) (x6::y6) (x7::y7) ;;constraints: ((x5 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3sg) … ) NP::NP [det adv adj n] [det adv adj n] ((x1::y1)… ((y4 agr) = (x4 agr) ….)

Seeded Version Space Learning NP v det nNP VP … 1.Group seed rules into version spaces as above. 2.Make use of partial order of rules in version space. Partial order is defined via the f-structures satisfying the constraints. 3.Generalize in the space by repeated merging of rules: 1.Deletion of constraint 2.Moving value constraints to agreement constraints, e.g. ((x1 num) = *pl), ((x3 num) = *pl) → ((x1 num) = (x3 num) 4. Check translation power of generalized rules against sentence pairs

Future Work Baseline evaluation Adjust generalization step size Revisit generalization operators Introduce specialization operators to retract from overgeneralizations (including seed rules) Learn from an unstructured bilingual corpus Evaluate merges to pick the optimal one at any step: based on cross-validation, number of sentences it can translate