Morpho Challenge competition 2005-2010 Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
MET-2013 Amit Jain Nitish Gupta Sukomal Pal Indian School of Mines, Dhanbad.
TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.
Unsupervised Morpheme Analysis – Overview of Morpho Challenge 2007 in CLEF Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ville Turunen Helsinki University.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE Unsupervised Segmentation of Words.
HELSINKI UNIVERSITY OF TECHNOLOGY NEURAL NETWORKS RESEARCH CENTRE Inducing the Morphological Lexicon of a Natural Language from Unannotated Text { Mathias.Creutz,
Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
09:10 Mikko Kurimo: "Unsupervised Morpheme Analysis -- Morpho Challenge Workshop 2007" 09:30 Mikko Kurimo: "Evaluation by a Comparison to a Linguistic.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
HELSINKI UNIVERSITY OF TECHNOLOGY NEURAL NETWORKS RESEARCH CENTRE Induction of a Simple Morphology for Highly-Inflecting Languages {Mathias.Creutz,
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Introduction to Machine Learning Approach Lecture 5.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.
Morphology & Machine Translation Eric Davis MT Seminar 02/06/08 Professor Alon Lavie Professor Stephan Vogel.
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Morphological Processing & Stemming Using FSAs/FSTs.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
Korea Maritime and Ocean University NLP Jung Tae LEE
Twelve Years of Morphology and Language Technology Mathias Creutz Morpho Challenge 2 September 2010.
Language: Why is it important?: A system of words and rules for.
Language: Why is it important?: embedded
Hendrik J Groenewald Centre for Text Technology (CTexT™) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Natural Language Processing Chapter 2 : Morphology.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
2/5/01 Morphology technology Different applications -- different needs –stemmers collapse all forms of a word by pairing with “stem” –for (CL)IR –for (aspects.
Class Imbalance in Text Classification
INTRODUCTION TO ENGLISH MORPHOLOGY BY DEDY SUBANDOWO, M.A TEACHER TRAINING AND EDUCATION FACULTY ENGLISH EDUCATION STUDY PROGRAM MUHAMMADIYAH UNIVERSITY.
MORPHOLOGY definition; variability among languages.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
INTRODUCTION ADE SUDIRMAN, S.Pd ENGLISH DEPARTMENT MATHLA’UL ANWAR UNIVERSITY.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
Statistical Machine Translation Part II: Word Alignments and EM
Linguistic Graph Similarity for News Sentence Searching
Web News Sentence Searching Using Linguistic Graph Similarity
Morphology and syntax.
Multilingual Biomedical Dictionary
Presentation 王睿.
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
University of Illinois System in HOO Text Correction Shared Task
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
A Joint Model of Orthography and Morphological Segmentation
Presentation transcript:

Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus

Introduction Started in Open to all. Organizers selected evaluation tasks, data and metric and performed all the evaluations. Unsupervised and semi-supervised approach. Semi-supervised approach was introduced in Morpho Challenge 2010.

Aim To develop Language – independent algorithms to discover morphemes from text material. Morphemes : It is the smallest grammatical unit in a language. To promote research in machine learning, NLP.

Evaluation tasks & languages # From Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus Morpho Challenge : Evaluations and Results.

Word Segmentation In 2005 : Segment the text into morphemes. In 2007 : Locate the surface form (word segmentation). Locate which surface form are the allomorph of the same underlying morpheme.

Principles for segmentation 1.The evaluation is based on a subset of the word forms given as training data. 2.The frequency of the word form plays no role in evaluation. 3.The evaluation score is balanced F-measure, the harmonic mean of precision and recall. 4.If the linguistic gold standard has several alternative analysis for one word, for full precision, it is enough that one of the alternatives is equivalent to the proposed analysis

Information retrieval The algorithms were tested by using the morpheme segmentations for text retrieval. A stemming algorithm is used to reduce inflected words to base words. Problem : Language specific. Challenges Correct weighting method. Number of queries were limited.

Machine translation Two stages Alignment of parallel sentences in both languages. Training a language model. In 2009 Morph challenge the focus was on alignment problem.

Some Algorithms Bernhard (Bernhard, 2006) : Best for Finnish, English and German linguistic evaluation. First list of prefixes and suffixes is extracted. Segmentations are generated using this list. Best segmentation is selected on the basis of cost function.

Some Algorithms Morfessor algorithm : To discover most basic & compact description of data. Substrings occurring frequently in the training set are also considered as morphemes. Ex. hand, hand+s, hand+ful, left+hand+ed. Gives better result than other algorithms in Finnish & Turkish. # From : Morfessor in the morpho challenge (2006) by Mathias Creutz, Krista Lagus

Result Morpho Challenge : 2010 S = semi-supervised algorithm P = unsupervised algorithm with supervised parameter tuning # From

Open Challenges What is the best analysis algorithm ? What is the meaning of the morphemes ? How to evaluate the alternative analyses ? How to improve the analysis using context ? How to effectively apply semi-supervised learning ?

References Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus Morpho Challenge : Evaluations and Results. Proceedings of the 11th meeting of the ACL special interest group on Computational Morphology and Phonology. Mathias Creutz and Krista Lagus Morfessor in the Morpho Challenge. Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes Official site of Morpho Challenge : Wikipedia : / /

Thank You