Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Topic models Source: Topic models, David Blei, MLSS 09.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Measures of Coincidence Vasileios Hatzivassiloglou University of Texas at Dallas.
Statistical Topic Modeling part 1
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Unsupervised Morpheme Analysis – Overview of Morpho Challenge 2007 in CLEF Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ville Turunen Helsinki University.
1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
09:10 Mikko Kurimo: "Unsupervised Morpheme Analysis -- Morpho Challenge Workshop 2007" 09:30 Mikko Kurimo: "Evaluation by a Comparison to a Linguistic.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Expectation Maximization Algorithm
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1 Unsupervised Discovery of Morphemes Presented by: Miri Vilkhov & Daniel Feinstein linja-autonautonkuljettajallakaan linja-auton auto kuljettajallakaan.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Natural Language Processing Expectation Maximization.
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
Text Analysis Everything Data CompSci Spring 2014.
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
AP Statistics Chapter 9 Notes.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Sampling Approaches to Pattern Extraction
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Integrating Topics and Syntax -Thomas L
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Lecture 2: Statistical learning primer for biologists
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
CSE 517 Natural Language Processing Winter 2015
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Statistical Machine Translation Part II: Word Alignments and EM
Learning Deep Generative Models by Ruslan Salakhutdinov
Data Mining Lecture 11.
Akio Utsugi National Institute of Bioscience and Human-technology,
Stochastic Optimization Maximization for Latent Variable Models
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and Morphologically-rich Languages Haifa, 27 January 2011

Why Unsupervised? No human involvement Language independence Automatic optimization to task

Using a Morphological Analyzer Linguistic morphological analysis intuitive, but  language-dependent  ambiguous  not always optimal manually engineered segmentation schemes can outperform a straightforward linguistic morphological segmentation naive linguistic segmentation may result in even worse performance than a word-based system

Heuristic Segmentation/Merging Rules Widely varying heuristics:  Minimal segmentation Only segment predominant & sure-to-help affixation  Start with linguistic segmentation and take back some segmentations Requires careful study of both linguistics, experimental results  Trial-and-error  Not portable to other language pairs

Adopted Approach Unsupervised learning form a corpus Maximize an objective function (posterior probability)

Morfessor M. Creutz and K. Lagus, “Unsupervised models for morpheme segmentation and morphology learning,” ACM Transactions on Speech and Language Processing, 2007.

Probabilistic Segmentation Model : Observed corpus : Hidden segmentation model for the corpus (≈ “morph” vocabulary)

MAP Segmentation

Probabilistic Model Components : Uniform probability for all possible morph vocabularies of size M for a given morph token count of N (i.e., frequencies do not matter) : For each morph, product of its character probabilities (including end-of-morph marker) : Product of probabilities for each morph token

Original Search Algorithm Greedy Scan the current word/morph vocabulary Accept the best segmentation location (or non-segmentation) and update the model

Parallel Search Less greedy Wait until all the vocabulary is scanned before applying the updates

Sequential Search

Sequential Search (different vocabulary scan orders)

Sequential Search vs. Parallel Search

Random Search Even less greedy Do not automatically accept the maximum probability segmentation, instead make a random draw proportional to the posteriors  cf. Gibbs sampling

Deterministic vs. Random Search

So far… We can obtain lower model costs by being less greedy in search Does it translate to BLEU scores?

Turkish-to-English

English-to-Turkish

Turkish-to-English (1 reference)

English-to-Turkish (1 reference)

On a Large Test Set (1512 sentences) Turkish-to-English, No MERT

Optimizing Segmentation for Statistical Translation The best-performing segmentation is highly task-dependent  Could change when paired with a different language  Depends on size of parallel corpora For a given parallel corpus, what is the optimal segmentation in terms of translation performance?

Adding Bilingual Information : Using IBM Model-1 probability  Estimated via EM

Results

Evolution of the Gibbs Chain

Evolution of the Gibbs Chain (BLEU)

Conclusions Probabilistic model for unsupervised learning of segmentation Improvements to the search algorithm  Parallel search  Random search via Gibbs sampling Incorporated (an approximate) translation probability to the model So far, model scores do not correlate well with BLEU scores