MT For Low-Density Languages Ryan Georgi Ling 575 – MT Seminar Winter 2007.

Slides:

Advertisements

Similar presentations

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.

Advertisements

Author : Zhen Hai, Kuiyu Chang, Gao Cong Source : CIKM’12 Speaker : Wei Chang Advisor : Prof. Jia-Ling Koh ONE SEED TO FIND THEM ALL: MINING OPINION FEATURES.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.

Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.

1/13 Parsing III Probabilistic Parsing and Conclusions.

Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.

1/17 Probabilistic Parsing … and some other approaches.

Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Corpora and Language Teaching

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Chapter 11 – Grammar: Finding a Balance

Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Semantic Parsing for Robot Commands Justin Driemeyer Jeremy Hoffman.

Machine Transliteration T BHARGAVA REDDY (Knowledge sharing)

 Main Idea/Point-of-View  Specific Detail  Conclusion/Inference  Extrapolation  Vocabulary in Context.

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

Introduction to Linguistics Ms. Suha Jawabreh Lecture 18.

Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages Dan Garrette, Jason Mielens, and Jason Baldridge Proceedings of ACL 2013.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Rules, Movement, Ambiguity

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.

Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.

Raphael Cohen, Michael Elhadad Noemie Elhadad. 1. If it has to do with human readable (more or less) text – it NLP! 2. Search engines. 3. Information.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.

LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

Overview of Statistical NLP IR Group Meeting March 7, 2006.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Towards automatic enrichment and analysis of linguistic data for low-density languages Fei Xia University of Washington Joint work with William Lewis and.

Natural Language Processing Vasile Rus

Approaches to Machine Translation

Statistical NLP Winter 2009

Urdu-to-English Stat-XFER system for NIST MT Eval 2008

Approaches to Machine Translation

Statistical Machine Translation Papers from COLING 2004

David Kauchak CS159 – Spring 2019

David Kauchak CS159 – Spring 2019

Presentation transcript:

MT For Low-Density Languages Ryan Georgi Ling 575 – MT Seminar Winter 2007

What is “Low Density”?

In NLP, languages are usually chosen for: In NLP, languages are usually chosen for: Economic Value Economic Value Ease of development Ease of development Funding (NSA, anyone?) Funding (NSA, anyone?)

What is “Low Density”? As a result, NLP work until recently has focused on a rather small set of languages. As a result, NLP work until recently has focused on a rather small set of languages. e.g. English, German, French, Japanese, Chinese e.g. English, German, French, Japanese, Chinese

What is “Low Density”? “Density” refers to the availability of resources (primarily digital) for a given language. “Density” refers to the availability of resources (primarily digital) for a given language. Parallel text Parallel text Treebanks Treebanks Dictionaries Dictionaries Chunked, semantically tagged, or other annotation Chunked, semantically tagged, or other annotation

What is “Low Density”? “Density” not necessarily linked to speaker population “Density” not necessarily linked to speaker population Our favorite example, Iniktitut Our favorite example, Iniktitut

So, why study LDL?

Preserving endangered languages Preserving endangered languages Spreading benefits of NLP to other populations Spreading benefits of NLP to other populations (Tegic has T9 for Azerbaijani now) (Tegic has T9 for Azerbaijani now) Benefits of wide typological coverage for cross- linguistic research Benefits of wide typological coverage for cross- linguistic research (?) (?)

Problem of LDL?

“The fundamental problem for annotation of lower-density languages is that they are lower density” – Maxwell & Hughes “The fundamental problem for annotation of lower-density languages is that they are lower density” – Maxwell & Hughes Easiest NLP development (and often best) done with statistical methods Easiest NLP development (and often best) done with statistical methods Training requires lots of resources Training requires lots of resources Resources require lots of money Resources require lots of money Cost/Benefit chicken and the egg Cost/Benefit chicken and the egg

What are our options? Create corpora by hand Create corpora by hand Very time-consuming (= expensive) Very time-consuming (= expensive) Requires trained native speakers Requires trained native speakers Digitize printed resources Digitize printed resources Time-consuming Time-consuming May require trained native speakers May require trained native speakers e.g. orthography without unicode entries e.g. orthography without unicode entries

What are our options? Traditional requirements are going to be difficult to satisfy, no matter how we slice it. Traditional requirements are going to be difficult to satisfy, no matter how we slice it. We need to, then: We need to, then: Maximize information extracted from resources we can get Maximize information extracted from resources we can get Reduce requirements for building a system Reduce requirements for building a system

Maximizing Information with IGT

Interlinear Glossed Text Interlinear Glossed Text Traditional form of transcription for linguistic field researchers and grammarians Traditional form of transcription for linguistic field researchers and grammarians Example: Example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday”

Benefits of IGT As IGT is frequently used in fieldwork, it is often available for low-density languages As IGT is frequently used in fieldwork, it is often available for low-density languages IGT provides information about syntax, morphology, IGT provides information about syntax, morphology, The translation line is usually a high-density language that we can use as a pivot language. The translation line is usually a high-density language that we can use as a pivot language.

Drawbacks of IGT Data can be ‘abormal’ in a number of ways Data can be ‘abormal’ in a number of ways Usually quite short Usually quite short May be used by grammarian to illustrate fringe usages May be used by grammarian to illustrate fringe usages Often purposely limited vocabularies Often purposely limited vocabularies Still, in working with LDL it might be all we’ve got Still, in working with LDL it might be all we’ve got

Utilizing IGT First, a big nod to Fei (this is her paper!) First, a big nod to Fei (this is her paper!) As we saw in HW#2, word alignment is hard. As we saw in HW#2, word alignment is hard. IGT, however, often gets us halfway there! IGT, however, often gets us halfway there!

Utilizing IGT Take the previous example: Take the previous example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday”

Utilizing IGT Take the previous example: Take the previous example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday”

Utilizing IGT Take the previous example: Take the previous example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday”

Utilizing IGT Take the previous example: Take the previous example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday”

Utilizing IGT Take the previous example: Take the previous example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday”

Utilizing IGT Take the previous example: Take the previous example: Rhoddodd yr athro lyfr I’r bachgen ddoe gave-3sg the teacher book to-the boy yesterday “The teacher gave a book to the boy yesterday” The interlinear already aligns the source with the gloss The interlinear already aligns the source with the gloss Often, the gloss uses words found in the translation already Often, the gloss uses words found in the translation already

Utilizing IGT Alignment isn’t always this easy… Alignment isn’t always this easy… xaraju mina lgurfati wa nah.nu nadxulu xaraj-u: mina ?al-gurfat-i wa nah.nu na-dxulu exited-3MPL from DEF-room-GEN and we 1PL-enter 'They left the room as we were entering it‘ (Source: Modern Arabic: Structures, Functions, and Varieties; Clive Holes)

Utilizing IGT Alignment isn’t always this easy… Alignment isn’t always this easy… xaraju mina lgurfati wa nah.nu nadxulu xaraj-u: mina ?al-gurfat-i wa nah.nu na-dxulu exited-3MPL from DEF-room-GEN and we 1PL-enter 'They left the room as we were entering it‘ (Source: Modern Arabic: Structures, Functions, and Varieties; Clive Holes) We can get a little more by stemming… We can get a little more by stemming…

Utilizing IGT Alignment isn’t always this easy… Alignment isn’t always this easy… xaraju mina lgurfati wa nah.nu nadxulu xaraj-u: mina ?al-gurfat-i wa nah.nu na-dxulu exited-3MPL from DEF-room-GEN and we 1PL-enter 'They left the room as we were entering it‘ (Source: Modern Arabic: Structures, Functions, and Varieties; Clive Holes) We can get a little more by stemming… We can get a little more by stemming… …but we’re going to need more. …but we’re going to need more.

Utilizing IGT Thankfully, with an English translation, we already have tools to get phrase and dependency structures that we can project: Thankfully, with an English translation, we already have tools to get phrase and dependency structures that we can project: (Source: Will & Fei’s NAACL 2007 Paper!)

Utilizing IGT Thankfully, with an English translation, we already have tools to get phrase and dependency structures that we can project: Thankfully, with an English translation, we already have tools to get phrase and dependency structures that we can project: (Source: Will & Fei’s NAACL 2007 Paper!)

Utilizing IGT What can we get from this? What can we get from this? Automatically generated CFGs Automatically generated CFGs Can infer word order from these CFGs Can infer word order from these CFGs Can infer possible constituents Can infer possible constituents …suggestions? …suggestions? From a small amount of data, this is a lot of information, but what about… From a small amount of data, this is a lot of information, but what about…

Reducing data Requirements with Prototyping

Grammar Induction So, we have a way to get production rules from a small amount of data. So, we have a way to get production rules from a small amount of data. Is this enough? Is this enough? Probably not. Probably not. CFGs aren’t known for their robustness CFGs aren’t known for their robustness How about using what we have as a bootstrap? How about using what we have as a bootstrap?

Grammar Induction Given unannotated text, we can derive PCFGs Given unannotated text, we can derive PCFGs Without annotation, though, we just have unlabelled trees: Without annotation, though, we just have unlabelled trees: ROOT ROOT C2 C2 X0 X1 Y2 X0 X1 Y2 the dog Z3 N4 the dog Z3 N4 fell asleep fell asleep Such an unlabelled parse doesn’t give us S -> NP VP, though. Such an unlabelled parse doesn’t give us S -> NP VP, though. p=0.003 p=0.02 p=5.3e-2p=0.09 p=0.45e-4

Grammar Induction Can we get labeled trees without annotated text? Can we get labeled trees without annotated text? Haghighi & Klein (2006) Haghighi & Klein (2006) Propose a way in which production rules can be passed to a PCFG induction algorithm as “prototypical” constituents Propose a way in which production rules can be passed to a PCFG induction algorithm as “prototypical” constituents Think of these prototypes as a rubric that could be given to a human annotator Think of these prototypes as a rubric that could be given to a human annotator e.g. for English, NP -> DT NN e.g. for English, NP -> DT NN

Grammar Induction Let’s take the possible constituent DT NN Let’s take the possible constituent DT NN We could tell our PCFG algorithm to apply this as a constituent everywhere it occurs We could tell our PCFG algorithm to apply this as a constituent everywhere it occurs But what about DT NN NN? (the train station)? But what about DT NN NN? (the train station)? We would like to catch this as well We would like to catch this as well

Grammar Induction K&H’s solution? K&H’s solution? distributional clustering distributional clustering “a similarity measure between two items on the basis of their immediate left and right contexts” “a similarity measure between two items on the basis of their immediate left and right contexts” …to be honest, I lose them in the math here. …to be honest, I lose them in the math here. Importantly, however, weighting the probability of a constituent with the right measure improves from the base unsupervised level of f-measure 35.3 to 62.2 Importantly, however, weighting the probability of a constituent with the right measure improves from the base unsupervised level of f-measure 35.3 to 62.2

So… what now?

Next Steps By extracting production rules from a very small amount of data using IGT and using Haghighi & Klein’s unsupervised methods, it may be possible to bootstrap an effective language model from very little data! By extracting production rules from a very small amount of data using IGT and using Haghighi & Klein’s unsupervised methods, it may be possible to bootstrap an effective language model from very little data!

Next Steps Possible applications: Possible applications: Automatic generation of language resources Automatic generation of language resources (While a system with the same goals would only compound error, automatically annotated data could be easier for a human to correct rather than hand-generate) (While a system with the same goals would only compound error, automatically annotated data could be easier for a human to correct rather than hand-generate) Assist linguists in the field Assist linguists in the field (Better model performance could imply better grammar coverage) (Better model performance could imply better grammar coverage) …you tell me! …you tell me!