Morphological Analysis for Phrase- Based Statistical Machine Translation Luong Minh Thang WING group meeting – 15 Aug, 2008 HYP update - part1 4/30/20151.

Slides:



Advertisements
Similar presentations
CEBUANO-VISAYAN A PEDAGOGIC GRAMMAR FOR Dr. Angel O. Pesirla,
Advertisements

Mini Presentations: How To
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word-counts, visualizations and N-grams Eric Atwell, Language Research.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, /20/20141.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Morphological Analysis for Phrase-Based Statistical Machine Translation LUONG Minh Thang Supervisor: Dr. KAN Min Yen National University of Singapore Web.
Morphology.
Statistical NLP: Lecture 3
PSY 369: Psycholinguistics Some basic linguistic theory part2.
Introduction to Linguistics n About how many words does the average 17 year old know?
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
Morphology I. Basic concepts and terms Derivational processes
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Creation of a Russian-English Translation Program Karen Shiells.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Introduction to English Morphology Finite State Transducers
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The different kinds of morphemes 2. The patterns and rules of.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
Paradigm based Morphological Analyzers Dr. Radhika Mamidi.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
DCU meets MET: Bengali and Hindi Morpheme Extraction Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones CNGL, School of Computing, Dublin City University,
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.
Morphological Processing & Stemming Using FSAs/FSTs.
Introduction to Morphology and Syntax (NGL 243)
Linguistics The ninth week. Chapter 3 Morphology  3.1 Introduction  3.2 Morphemes.
November 16, 2004 Lexicon (An Interacting Subsystem in UG) Part-II Rajat Kumar Mohanty IIT Bombay.
Natural Language Processing Chapter 2 : Morphology.
MORPHOLOGY definition; variability among languages.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
The structure and Function of Phrases and Sentences
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Jeopardy Syntax Morphology Heading3Heading4 Heading5 Q $600 Q $700 Q $800 Q $900 Q $1000 Q $600 Q $700 Q $800 Q $900 Q $1000 Final Jeopardy.
Yun-Pi Yuan1 Morphology I. Parts of Speech II. Basic concepts and terms II. Derivational processes Derivational processes III. Inflection Inflection IV.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
A CRF-BASED NAMED ENTITY RECOGNITION SYSTEM FOR TURKISH Information Extraction Project Reyyan Yeniterzi.
Grammatical Issues in translation
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Characteristic Features of Language. I. Language is a system at many levels. All languages have two levels, called duality of patterning. This consists.
INTRODUCTION ADE SUDIRMAN, S.Pd ENGLISH DEPARTMENT MATHLA’UL ANWAR UNIVERSITY.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
MORPHOLOGY The study of word forms.
Statistical Machine Translation Part II: Word Alignments and EM
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Assessing Grammar Module 5 Activity 5.
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
عمادة التعلم الإلكتروني والتعليم عن بعد
Statistical NLP: Lecture 3
Revision Outcome 1, Unit 1 The Nature and Functions of Language
Assessing Grammar Module 5 Activity 5.
Statistical NLP: Lecture 13
Token generation - stemming
Natural Language - General
CS4705 Natural Language Processing
Introduction to Linguistics
A Joint Model of Orthography and Morphological Segmentation
Presentation transcript:

Morphological Analysis for Phrase- Based Statistical Machine Translation Luong Minh Thang WING group meeting – 15 Aug, 2008 HYP update - part1 4/30/20151

Agenda Introduction - what does my project title mean? Language pair English-Finnish challenges Related works Project direction 4/30/20152

Introduction I: phrase-based SMT Statistical: derive statistical information from large data Phrase-base: capture local constraints 4/30/20153 Marianodabaunabotefadaalabrujaverde NULLMarydidnotslapthegreenwitch Source Target

Introduction II - Morphology Morpheme: minimal meaning-bearing unit – machines = machine + s – translation = translate + ion – goalkeeper = goal + keeper English is a low-inflected language - simple morphological structure  High-inflected languages are much complicated! 4/30/20154

Introduction III – high-inflected languages Concatenate chain of morphemes to form a word Finnish: oppositio + kansa + n + edusta + ja (opposition + people + of + represent + -ative) = opposition of parliarment member Turkish: uygarlas,tiramadiklarimizdanmis,sinizcasina (uygar+las, tir+ama+dik+lar+imiz+dan+mis, siniz+casina) = (behaving) as if you are among those whom we could not cause to become civilized 4/30/20155 This is a word!!!

Introduction IV – Why morphological-aware SMT? Tackle the data sparseness problem (Statistics from sentence pairs) Capture the relations among words 4/30/20156 English machine machines Spanish máquina máquinas Type countToken count English Finnish

Language pair I – our choice? We chose English - Finnish as our main translation task 4/30/20157 Low-inflectedhighly-inflected (Dyer, 2007) Vietnamese

Language pair II – why Finnish? Honestly, I don’t know Finnish … But because: – Available corpora – Finnish is an agglutinative morphologically-complex language, suitable for our project scope – Investigate in translation from low to high inflected languages -> an area to explore, yet hard !!! 4/30/20158

English-Finnish challenges I – many-to-one word relationship Finnish uses suffixes to express grammatical relations and also to derive new words 4/30/20159 CaseSuffixEnglish prep. Sample word form Translation of the sample nominatiivi -talohouse genetiivi-noftalonof (a) house essiivi-naastalonaas a house inessiivi-ssaintalossain (a) house elatiivi-stafrom (inside)talostafrom (a) house komitatiivi-ne-together (with)taloineniwith my house(s) Many-to-one English-Finnish word relationship  need word-morpheme correspondence (about cases for nouns) Not merely concatenating

English-Finnish challenges II – word order Word order is “free” in Finnish – Pete rakastaa Annaa = Pete loves Annaa (normal) – Annaa Pete rakastaa: emphasizes Annaa – Rakastaa Pete Annaa: emphasizes rakastaa = Pete does love Anna – Pete Annaa rakastaa: stress on Pete – Rakastaa Annaa Pete. not sound like a normal sentence, quite understandable. 4/30/201510

English-Finnish challenges III – surface form generation After translating from English words  Finnish morphemes, need a surface generation step oppositio + kansa + n + edusta + ja  oppositiokansanedustaja What if missing morphemes or changes in morpheme order?  Need a more error-tolerate surface recovery algorithm 4/30/201511

Related works I – low-to-high inflected languages Many works from high to low inflected languages, but very few works on the opposite direction, considered hard in (Koehn, 2005) – (Yang & Kirchhoff, 2006): Finnish-English, backoff – (Oflazer & Durgar El-Kahlout, 2006, 2007): English- Turkish, word-morpheme translation, then simply concatenating morphemes All use language-dependent tools & syntactic knowledge: TreeTager, Snowball stemmer … 4/30/201512

Related works II – surface form recovery (Toutanova et. al., 2007, 2008): English-Russian, English-Arabic; translate stem-to-stem; predict inflection from stems using many different features (lexical, morphological, and syntactic) (Avramidis & Koehn, 2008): English-Greek Use syntax to get the “missing” morphology, depending on the syntactic position Noun cases agreement and verb person conjugation  Rely mostly on manual annotation data 4/30/201513

Project direction Use language-independent tool (Morfessor), and based on the unannotated data only (i.e. no feature data or syntactical information) Work on a general surface-form recovery We would like to have a unified view of the transalation process: separating low-low, low- high, high-low, high-high 4/30/ We are at here

Reference I Chirs Dyer, Jurafsky, D., & Martin, J. H. (2007). Speech and language processing book The Finnish language Yang & Kirchhoff, 2006: Phrase-based backoff models for machine translation of highly inflected languages Oflazer & Durgar El-Kahlout, 2006: Initial Explorations in English to Turkish Statistical Machine Translation 4/30/201515

Reference II Oflazer & Durgar El-Kahlout, 2007: Exploring different representational units in English-to-Turkish statistical machine translation Toutanova et. al., 2007: Generating complex morphology for machine translation Toutanova et. al., 2008: Applying morphology generation models to machine translation Avramidis & Koehn, 2008: Enriching morphologically poor languages for statistical machine translation 4/30/201516

Q & A? 4/30/201517

To be continued … Thank you !!! 4/30/201518