Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus
Introduction Started in Open to all. Organizers selected evaluation tasks, data and metric and performed all the evaluations. Unsupervised and semi-supervised approach. Semi-supervised approach was introduced in Morpho Challenge 2010.
Aim To develop Language – independent algorithms to discover morphemes from text material. Morphemes : It is the smallest grammatical unit in a language. To promote research in machine learning, NLP.
Evaluation tasks & languages # From Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus Morpho Challenge : Evaluations and Results.
Word Segmentation In 2005 : Segment the text into morphemes. In 2007 : Locate the surface form (word segmentation). Locate which surface form are the allomorph of the same underlying morpheme.
Principles for segmentation 1.The evaluation is based on a subset of the word forms given as training data. 2.The frequency of the word form plays no role in evaluation. 3.The evaluation score is balanced F-measure, the harmonic mean of precision and recall. 4.If the linguistic gold standard has several alternative analysis for one word, for full precision, it is enough that one of the alternatives is equivalent to the proposed analysis
Information retrieval The algorithms were tested by using the morpheme segmentations for text retrieval. A stemming algorithm is used to reduce inflected words to base words. Problem : Language specific. Challenges Correct weighting method. Number of queries were limited.
Machine translation Two stages Alignment of parallel sentences in both languages. Training a language model. In 2009 Morph challenge the focus was on alignment problem.
Some Algorithms Bernhard (Bernhard, 2006) : Best for Finnish, English and German linguistic evaluation. First list of prefixes and suffixes is extracted. Segmentations are generated using this list. Best segmentation is selected on the basis of cost function.
Some Algorithms Morfessor algorithm : To discover most basic & compact description of data. Substrings occurring frequently in the training set are also considered as morphemes. Ex. hand, hand+s, hand+ful, left+hand+ed. Gives better result than other algorithms in Finnish & Turkish. # From : Morfessor in the morpho challenge (2006) by Mathias Creutz, Krista Lagus
Result Morpho Challenge : 2010 S = semi-supervised algorithm P = unsupervised algorithm with supervised parameter tuning # From
Open Challenges What is the best analysis algorithm ? What is the meaning of the morphemes ? How to evaluate the alternative analyses ? How to improve the analysis using context ? How to effectively apply semi-supervised learning ?
References Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus Morpho Challenge : Evaluations and Results. Proceedings of the 11th meeting of the ACL special interest group on Computational Morphology and Phonology. Mathias Creutz and Krista Lagus Morfessor in the Morpho Challenge. Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes Official site of Morpho Challenge : Wikipedia : / /
Thank You