Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson.

Slides:



Advertisements
Similar presentations
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Chapter 5: Introduction to Information Retrieval
Morphology.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin Monolingual.
Automatic Discovery of Useful Facet Terms Wisam Dakka – Columbia University Rishabh Dayal – Columbia University Panagiotis G. Ipeirotis – NYU.
Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Unsupervised Morpheme Analysis – Overview of Morpho Challenge 2007 in CLEF Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ville Turunen Helsinki University.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Chapter 6 Identifying Grammatical Morphemes Morphology Lane 333.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE Unsupervised Segmentation of Words.
HELSINKI UNIVERSITY OF TECHNOLOGY NEURAL NETWORKS RESEARCH CENTRE Inducing the Morphological Lexicon of a Natural Language from Unannotated Text { Mathias.Creutz,
Session 6 Morphology 1 Matakuliah : G0922/Introduction to Linguistics
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
09:10 Mikko Kurimo: "Unsupervised Morpheme Analysis -- Morpho Challenge Workshop 2007" 09:30 Mikko Kurimo: "Evaluation by a Comparison to a Linguistic.
HELSINKI UNIVERSITY OF TECHNOLOGY NEURAL NETWORKS RESEARCH CENTRE Induction of a Simple Morphology for Highly-Inflecting Languages {Mathias.Creutz,
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Evidence from Content INST 734 Module 2 Doug Oard.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
Morphology An Introduction to the Structure of Words By Christian Monson.
Computational Investigation of Palestinian Arabic Dialects
ISSPA January 1 N -Gram and Local Context Analysis for Persian text retrieval Tehran University Abolfazl AleAhmad, Parsia Hakimian, Farzad Mahdikhani.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
DCU meets MET: Bengali and Hindi Morpheme Extraction Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones CNGL, School of Computing, Dublin City University,
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.
Morphological Processing & Stemming Using FSAs/FSTs.
Chapter 6: Information Retrieval and Web Search
Morphology An Introduction to the Structure of Words Lori Levin and Christian Monson Grammars and Lexicons Fall Term, 2004.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Twelve Years of Morphology and Language Technology Mathias Creutz Morpho Challenge 2 September 2010.
Chapter 23: Probabilistic Language Models April 13, 2004.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
INTRODUCTION TO ENGLISH MORPHOLOGY BY DEDY SUBANDOWO, M.A TEACHER TRAINING AND EDUCATION FACULTY ENGLISH EDUCATION STUDY PROGRAM MUHAMMADIYAH UNIVERSITY.
MORPHOLOGY definition; variability among languages.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.
Morphology 1 : the Morpheme
1. the study of morphemes and their different forms (allomorphs), and the way they combine in WORD FORMATION, e.g unfriendly is formed from friend, the.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Multimedia Information Retrieval
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Chapter 7 Lexical Analysis and Stoplists
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Information Retrieval and Web Design
Presentation transcript:

Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

Carnegie Mellon Christian Monson 2 Turkish Morphology – Beads on a String takepassivenegative present progressive 2 nd person singular You are not being taken

Carnegie Mellon Christian Monson 3 götürülmsunsunüyor takepassivenegative present progressive You are not being taken 2 nd person singular Turkish Morphology – Beads on a String

Carnegie Mellon Christian Monson 4 Applications of Computational Morphology Machine Translation –Turkish-English (Oflazer, 2007) –Czech-English (Goldwater and McClosky, 2005) Speech Recognition –Finnish (Creutz, 2006) Information Retrieval

Carnegie Mellon Christian Monson 5 Challenges of Computational Morphology Time Consuming for a New Language –Kemal Oflazer estimates 3-4 months to build basic Turkish analyzer Plus lexicon development and maintenance Expertise Needed –Greenlandic Official language of Greenland Agglutinative Inuit language 50,000 speakers Per Langaard

Carnegie Mellon Christian Monson 6 The Solution Raw Text Unsupervised Morphology Induction

Carnegie Mellon Christian Monson 7 ParaMor – Paradigm Morphology ParaMor Identify Search Cluster Filter Segment Evaluation Results ParaMor –Unsupervised morphology induction system Paradigm –The natural structure of morphology

Carnegie Mellon Christian Monson 8 Paradigms – The Structure of Morphology ülmsunsunüyor takepassivenegative present progressive 2 nd person singular StemVoicePolarity Tense & Mood Person & Number götür

Carnegie Mellon Christian Monson 9 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number takepassivenegative present progressive 1 st person singular umum götür

Carnegie Mellon Christian Monson 10 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number takepassivenegative present progressive 3 rd person singular umum Ø götür

Carnegie Mellon Christian Monson 11 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number takepassivenegative present progressive 1 st person plural umum Ø uzuz götür

Carnegie Mellon Christian Monson 12 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number takepassivenegative present progressive umum Ø uzuz götür

Carnegie Mellon Christian Monson 13 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number takepassivenegative future umum Ø uzuz yecek götür

Carnegie Mellon Christian Monson 14 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number takepassivenegative umum Ø uzuz yecek götür

Carnegie Mellon Christian Monson 15 Paradigms – The Structure of Morphology ülmumüyor StemVoicePolarity Tense & Mood Person & Number umum Ø uzuz yecek

Carnegie Mellon Christian Monson 16 Paradigms – The Structure of Morphology ülmumüyor umum Ø uzuz yecek Paradigms

Carnegie Mellon Christian Monson 17 Paradigms – The Structure of Morphology ülmumüyor umum Ø uzuz yecek Paradigms Paradigm –Set of mutually replaceable strings

Carnegie Mellon Christian Monson 18 Paradigms – The Structure of Morphology ülmumüyor umum Ø uzuz yecek Paradigm –Set of mutually replaceable strings

Carnegie Mellon Christian Monson 19 The ParaMor Algorithm ParaMor Identify Search Cluster Filter Segment Evaluation Results Identify suffix paradigms in 3 steps

Carnegie Mellon Christian Monson 20 The ParaMor Algorithm ParaMor Identify Search Cluster Filter Segment Evaluation Results Identify suffix paradigms in 3 steps 1.Search for candidate paradigms

Carnegie Mellon Christian Monson 21 The ParaMor Algorithm ParaMor Identify Search Cluster Filter Segment Evaluation Results Identify suffix paradigms in 3 steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm

Carnegie Mellon Christian Monson 22 The ParaMor Algorithm Identify suffix paradigms in 3 steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter ParaMor Identify Search Cluster Filter Segment Evaluation Results

Carnegie Mellon Christian Monson 23 The ParaMor Algorithm Identify suffix paradigms in 3 steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment words –Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

Carnegie Mellon Christian Monson 24 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms All character boundaries are candidate morpheme boundaries

Carnegie Mellon Christian Monson 25 s ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms autorizaciones buscabamos costas importadoras vallas … Begin search with the most frequent word-final string Spanish

Carnegie Mellon Christian Monson 26 s ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms autorizaciones buscabamos costas importadoras vallas … Ø s 5501 Identify the most frequent mutually replaceable string –Stems that occur with one suffix in a paradigm will likely occur with other suffixes in that paradigm Spanish

Carnegie Mellon Christian Monson 27 s ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms Stop adding suffixes –When the most frequent mutually replaceable string severly decreases the stem count. Ø s 5501 Ø r s 287 autorizaciones buscabamos costas importadoras vallas …

Carnegie Mellon Christian Monson 28 s ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms Move on to the next most frequent word-final string Ø s 5501 Ø r s 287 a 8981

Carnegie Mellon Christian Monson 29 a 8981 s a o 2304 a o os 1410 a as o os 892 Ø s 5501 Ø r s 287 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

Carnegie Mellon Christian Monson 30 n 6051 a 8981 s Ø n 1874 Ø n r 509 Ø do n r 354 Ø da das do dos n ndo r ron 118 a o 2304 a o os 1410 a as o os 892 Ø s 5501 Ø r s 287 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

Carnegie Mellon Christian Monson 31 n 6051 a 8981 s Ø n 1874 Ø n r 509 Ø do n r 354 Ø da das do dos n ndo r ron 118 a o 2304 a o os 1410 a as o os 892 Ø s 5501 es 2751 Ø es 874 Ø r s 287 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

Carnegie Mellon Christian Monson 32 an 1786 n 6051 a 8981 s a an 1049 a an ar 413 a an ar ó 353 a ada adas ado ados an ar aron ó 149 Ø n 1874 Ø n r 509 Ø do n r 354 Ø da das do dos n ndo r ron 118 a o 2304 a o os 1410 a as o os 892 Ø s 5501 es 2751 Ø es 874 Ø r s 287 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

Carnegie Mellon Christian Monson strado 15 rado 167 an 1786 n 6051 a 8981 s a an 1049 a an ar 413 a an ar ó 353 a ada adas ado ados an ar aron ó 149 rada radas rado rados 53 rada rado rados 67 rada rado 89 ra rada radas rado rados ran rar raron ró 23 Ø n 1874 Ø n r 509 Ø do n r 354 Ø da das do dos n ndo r ron 118 a o 2304 a o os 1410 a as o os 892 Ø s 5501 strada strado 12 strada strado stró 9 strada strado strar stró 8 strada stradas strado strar stró 7 es 2751 Ø es 874 Ø r s 287 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms...

Carnegie Mellon Christian Monson 34 Cluster Candidates per Paradigm 15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó 22 Stems: anunci, aplic, apoy, celebr, concentr, … 330 Covered Types 15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó 23 Stems: anunci, apoy, confirm, consider, declar, … 345 Covered Types ParaMor Identify Search Cluster Filter Segment Evaluation Results

Carnegie Mellon Christian Monson 35 Cluster Candidates per Paradigm 15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó 22 Stems: anunci, aplic, apoy, celebr, concentr, … 330 Covered Types 15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó 23 Stems: anunci, apoy, confirm, consider, declar, … 345 Covered Types 16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría ó Cosine Similarity: Covered Types ParaMor Identify Search Cluster Filter Segment Evaluation Results

Carnegie Mellon Christian Monson 36 Cluster Candidates per Paradigm 15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó 25 Stems: anunci, aplic, apoy, celebr, consider, … 375 Covered Types 15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó 22 Stems: anunci, aplic, apoy, celebr, concentr, … 330 Covered Types 15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó 23 Stems: anunci, apoy, confirm, consider, declar, … 345 Covered Types 16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría ó Cosine Similarity: Covered Types ParaMor Identify Search Cluster Filter Segment Evaluation Results

Carnegie Mellon Christian Monson 37 Cluster Candidates per Paradigm 15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó 25 Stems: anunci, aplic, apoy, celebr, consider, … 375 Covered Types 15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó 22 Stems: anunci, aplic, apoy, celebr, concentr, … 330 Covered Types 15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó 23 Stems: anunci, apoy, confirm, consider, declar, … 345 Covered Types 16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría ó Cosine Similarity: Covered Types 17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría ó Cosine Similarity: Covered Types ParaMor Identify Search Cluster Filter Segment Evaluation Results

Carnegie Mellon Christian Monson 38 Filter Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results 2 types of filtering 1.Remove small unclustered candidate paradigms 2.Remove candidates modeling unlikely morpheme boundaries (Harris, 1955)

Carnegie Mellon Christian Monson 39 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas

Carnegie Mellon Christian Monson 40 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas a ada adas ado ados an ar aron ó...

Carnegie Mellon Christian Monson 41 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas a ada adas ado ados an ar aron ó... administrada

Carnegie Mellon Christian Monson 42 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas administr +adas administrada a ada adas ado ados an ar aron ó...

Carnegie Mellon Christian Monson 43 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas administr +adas a as o os administrada

Carnegie Mellon Christian Monson 44 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas administr +adas, administrad +as a as o os administrada Old way: Separate alternative analysis

Carnegie Mellon Christian Monson 45 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas administr +adas, administrad +as a as o os administrada administr +ad +as New way: Augment the current segmentation

Carnegie Mellon Christian Monson 46 Segment Words Using Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results administradas administr +ad +a +s Ø sØ s administradaØ administr +adas, administrad +as, administrada +s

Carnegie Mellon Christian Monson 47 Morpho Challenge 2007 ParaMor Identify Search Cluster Filter Segment Evaluation Results Peer operated competition –For unsupervised morphology induction algorithms 4 languages –English –German –Finnish –Turkish

Carnegie Mellon Christian Monson 48 ParaMor in Morpho Challenge 2007 ParaMor Identify Search Cluster Filter Segment Evaluation Results Developed on Spanish –ParaMor’s free parameters were frozen

Carnegie Mellon Christian Monson 49 2 Methods of Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results 1.Linguistic Segmentations compared to a morphologically analyzed lexicon AnalysisAnswer administradasadministr +ad +a +sadministrar +Adj +Fem +Pl administradaadministr +ad +aadministrar +Adj +Fem

Carnegie Mellon Christian Monson 50 2 Methods of Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results 1.Linguistic Segmentations compared to a morphologically analyzed lexicon AnalysisAnswer administradasadministr +ad +a +sadministrar +Adj +Fem +Pl administradaadministr +ad +aadministrar +Adj +Fem

Carnegie Mellon Christian Monson 51 2 Methods of Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results 2.Task based Information retrieval –Short two-sentence queries –About international news topics –Binary relevance assessments –About 50 queries and 20K relevance judgements for each language.

Carnegie Mellon Christian Monson 52 Linguistic Evaluation F1F1 Bernhard 2 ParaMor Identify Search Cluster Filter Segment Evaluation Results Morfessor 47.2

Carnegie Mellon Christian Monson 53 Linguistic Evaluation F1F1 Bernhard 2 ParaMor Identify Search Cluster Filter Segment Evaluation Results 47.2 MorfessorParaMor 50.6

Carnegie Mellon Christian Monson 54 Linguistic Evaluation F1F1 Bernhard 2 MorfessorParaMorParaMor & Morfessor ParaMor Identify Search Cluster Filter Segment Evaluation Results Bernhard 2Morfessor

Carnegie Mellon Christian Monson 55 Linguistic Evaluation F1F1 Bernhard 2 ParaMor Identify Search Cluster Filter Segment Evaluation Results 50.7 MorfessorParaMorParaMor & Morfessor 60.8

Carnegie Mellon Christian Monson 56 Linguistic Evaluation F1F1 Bernhard 2 ParaMor Identify Search Cluster Filter Segment Evaluation Results MorfessorParaMorParaMor & Morfessor

Carnegie Mellon Christian Monson 57 Linguistic Evaluation F1F1 ParaMor Identify Search Cluster Filter Segment Evaluation Results Bernhard 2MorfessorParaMorParaMor & Morfessor Bernhard 2 MorfessorParaMorParaMor & Morfessor

Carnegie Mellon Christian Monson 58 Linguistic Evaluation F1F1 ParaMor Identify Search Cluster Filter Segment Evaluation Results Bernhard 2MorfessorParaMorParaMor & Morfessor Bernhard 2 MorfessorParaMorParaMor & Morfessor

Carnegie Mellon Christian Monson 59 Linguistic Evaluation F1F1 ParaMor Identify Search Cluster Filter Segment Evaluation Results Bernhard 2MorfessorParaMorParaMor & Morf. Bernhard 2MorfessorParaMorParaMor & Morfessor Bernhard 2MorfessorParaMorParaMor & Morfessor

Carnegie Mellon Christian Monson 60 Linguistic Evaluation F1F1 ParaMor Identify Search Cluster Filter Segment Evaluation Results Bernhard 2MorfessorParaMorParaMor & Morf. MorfessorParaMorParaMor & Morfessor Bernhard 2MorfessorParaMorParaMor & Morfessor Bernhard 2MorfessorParaMorParaMor & Morfessor

Carnegie Mellon Christian Monson 61 IR Evaluation (TF/IDF) Average Precision Morf.P & M ParaMor Identify Search Cluster Filter Segment Evaluation Results McNameePar – No Morphological Analysis

Carnegie Mellon Christian Monson 62 IR Evaluation (TF/IDF) Average Precision Morf.P & M ParaMor Identify Search Cluster Filter Segment Evaluation Results McNameeParaMor 27.0 – No Morphological Analysis

Carnegie Mellon Christian Monson 63 IR Evaluation (TF/IDF) Average Precision Morf.P & M ParaMor Identify Search Cluster Filter Segment Evaluation Results MorfessorParaMorMcNameeParaMorMorfessor BaselineParaMor & M – No Morphological Analysis

Carnegie Mellon Christian Monson 64 IR Evaluation (TF/IDF) Average Precision Morf.P & M ParaMor Identify Search Cluster Filter Segment Evaluation Results MorfessorParaMorMcNameeParaMorMorfessor BaselineParaMor & M – No Morphological Analysis

Carnegie Mellon Christian Monson 65 IR Evaluation (TF/IDF) Average Precision Morf.P & M ParaMor Identify Search Cluster Filter Segment Evaluation Results MorfessorParaMorMorfessorParaMorMcNameeParaMorMorfessor BaselineParaMor & MorfessorMorfessor BaselineParaMor & Morfessor 32.0 – No Morphological Analysis

Carnegie Mellon Christian Monson 66 ParaMor: State-of-the-Art Unsupervised Morphology Induction System Combined system among the best in Morpho Challenge 2007 Consistent across languages Better than no morphology –Task based (IR) measure

Carnegie Mellon Christian Monson 67 Many Future Directions Improve Performance –F 1 of 50-60% is state-of-the-art! –Inflection classes –Morphophonology Beyond beads-on-a-string

Carnegie Mellon Christian Monson 68 Thank You!