Download presentation
Presentation is loading. Please wait.
Published byDinah Robinson Modified over 9 years ago
1
ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin
2
2 You are not being taken Turkish Morphology – Beads on a String götürülmsunsunüyor takepassivenegative present progressive 2 nd person singular One Turkish Word
3
3 Computational Morphology Improves: Machine Translation Turkish-English (Oflazer, 2007) Czech-English (Goldwater and McClosky, 2005) Information Retrieval English, German, Finnish (Kurimo et al., 2008) Speech Recognition Finnish (Creutz, 2006) Grapheme-to-Phoneme Conversion German (Demberg, 2007)
4
4 Morphology is Complex – Operations PrefixationSuffixation
5
5 Morphology is Complex – Operations PrefixationReduplicationSuffixation
6
6 Morphology is Complex – Operations PrefixationReduplication Infixation Suffixation
7
7 Morphology is Complex – Operations PrefixationReduplication Infixation Suffixation
8
8 Morphology is Complex – Operations PrefixationReduplication Infixation Suffixation
9
9 götürülmsunsunüyor takepassivenegative present progressive You are not being taken 2 nd person singular Morphology is Complex – Morphophonology
10
10 sun yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülm takepassivenegative You will not be taken
11
11 sun yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülm takepassivenegative You will not be taken
12
12 sun yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülmeme takepassivenegative You will not be taken
13
13 sin yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülmeme takepassivenegative You will not be taken
14
14 sin yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülmeme takepassivenegative You will not be taken
15
15 Morphology is Complex – Ambiguity Hungarian mentek men+tek go+Present.2 nd.Plural ‘yinz go’
16
16 Morphology is Complex – Ambiguity Hungarian mentek men+tek go+Present.2 nd.Plural ‘yinz go’ men+t+ek go+PastParticiple+Plural ‘those who have gone’
17
17 In Morphology Systems for New Languages Complexity Time + Expertise
18
18 In Morphology Systems for New Languages Complexity Time + Expertise Kemal Oflazer Expert on Turkish Computational morphology Time 3 - 4 Months to manually build a basic Turkish analyzer Plus lexicon development and maintenance
19
19 The Solution Raw Text Unsupervised Morphology Induction
20
20 The Solution Raw Text ?
21
21 The Solution Raw Text Language Structure
22
22 Techniques for Unsupervised Morphology Induction Transition Likelihood Harris (1955) – Finite State Automata Bernhard (2007)
23
23 Transition Likelihood Harris (1955) – Finite State Automata Bernhard (2007) Minimum Description Length Goldsmith (2001, 2006) Creutz’s Morfessor (2006) Techniques for Unsupervised Morphology Induction
24
24 Contextual Similarity Wicentowski (2002) Schone (2002) Techniques for Unsupervised Morphology Induction
25
25 Contextual Similarity Wicentowski (2002) Schone (2002) The Paradigm Snover (2002) ParaMor (2007) Techniques for Unsupervised Morphology Induction
26
26 What is a Paradigm? ülmsunsunüyor takepassivenegative present progressive 2 nd person singular götür
27
27 ülmsunsunüyor takepassivenegative present progressive 2 nd person singular götür Person & Number Paradigms Structure Inflectional Morphology
28
28 um Person & Number 1 st person singular umum ülmüyor takepassivenegative present progressive götür Paradigms Structure Inflectional Morphology
29
29 um Person & Number 3 rd person singular umum Ø ülmüyor takepassivenegative present progressive götür Paradigms Structure Inflectional Morphology
30
30 um umum Ø uzuz ülmüyor takepassivenegative present progressive götür Person & Number Paradigms Structure Inflectional Morphology
31
31 um umum Ø uzuz ülmüyor takepassivenegative present progressive götür Paradigm Mutually substitutable morphological operations Paradigm Paradigms Structure Inflectional Morphology
32
32 ülmum VoicePolarity Tense & Aspect Person & Number umum Ø uzuz üyor yecek Paradigms Structure Inflectional Morphology
33
33 Paradigms Paradigm Mutually substitutable morphological operations ülmum umum Ø uzuz üyor yecek Paradigms Structure Inflectional Morphology
34
34 Paradigm ülmum umum Ø uzuz üyor yecek Paradigm Mutually substitutable strings The ParaMor Algorithm
35
35 Paradigm ülmum umum Ø uzuz üyor yecek Candidate Stems 1 Morpheme Boundary The ParaMor Algorithm
36
36 The ParaMor Algorithm Simplifying Assumptions Suffixes only 70% of the World’s Languages are Suffixing (Dryer, 2005) Strict Concatenation
37
37 The ParaMor Algorithm Simplifying Assumptions Suffixes only 70% of the World’s Languages are Suffixing (Dryer, 2005) Strict Concatenation Only a High-Level Overview
38
38 The ParaMor Algorithm Identify Paradigms in 3 Steps ParaMor Identify
39
39 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms ParaMor Identify Search
40
40 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm ParaMor Identify Search Cluster
41
41 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter least likely candidates ParaMor Identify Search Cluster Filter
42
42 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter least likely candidates Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment
43
43 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Today
44
44 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
45
45 s 10697 autorizaciones buscabamos costas importadoras vallas … Search for Candidate Paradigms Propose a morpheme boundary at every character boundary in every word Consolidate identical candidate suffixes into paradigm seeds Word List 50,000 Types ParaMor Identify Search Cluster Filter Segment Evaluation Results Spanish Example
46
46 s 10697 autorizaciones buscabamos costaØ costas importadoraØ importadoras vallaØ vallas … Ø s 5513 Identify the most frequent mutually replaceable candidate suffix Stems that occur with one suffix in a paradigm will likely occur with other suffixes in that paradigm Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Spanish Example
47
47 s 10697 A Parameter halts the introduction of suffixes When the most frequent mutually replaceable candidate suffix severely decreases the stem count Ø s 5513 Ø r s 281 autorizaciones buscabamos costar costaØ costas importadoraØ importadoras vallaØ vallas … Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
48
48 s 10697 Ø s 5513 Ø r s 281 autorizaciones buscabamos costar costaØ costas importadoraØ importadoras vallaØ vallas … Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Parameters set to produce High-recall Spanish paradigms And then frozen
49
49 Move on to the next most frequent paradigm seed a 9020 s 10697 Ø s 5513 Ø r s 281 Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
50
50 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
51
51 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
52
52 es 2750 Ø es 845 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms
53
53 an 1784 a an 1045 a an ar 417 a an ar ó 355 a ada adas ado ados an ar aron ó 148 es 2750 Ø es 845 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms
54
54 strado 15 rado 167 rada radas rado rados 53 rada rado rados 67 rada rado 89 ra rada radas rado rados ran rar raron ró 23 strada strado 12 strada strado stró 9 strada strado strar stró 8 strada stradas strado strar stró 7... an 1784 a an 1045 a an ar 417 a an ar ó 355 a ada adas ado ados an ar aron ó 148 es 2750 Ø es 845 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms
55
55 strado 15 rado 167 rada rado 89 strada strado 12... an 1784 a an 1045 es 2750 Ø es 845 n 6039 Ø n 1863 a 9020 a o 2325 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms Size of Search Space Huge: 2 |candidate suffixes| Most candidate suffixes have no common stems Still Exponential Greedily searched space: O(|candidate suffixes|) This example is just 0.1% of the searched space
56
56 Step 2: Clustering Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms Bottom-up Agglomerative Clustering ParaMor Identify Search Cluster Filter Segment Evaluation Results
57
57 Step 3: Filtering Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter least likely candidates Segment Words Using the discovered paradigms Adapted from Harris (1955) and Goldsmith (2006) Improved over 2007 Challenge ParaMor Identify Search Cluster Filter Segment Evaluation Results
58
58 A Few of the 42 Final Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima
59
59 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima A Few of the 42 Final Paradigms Number on Nouns
60
60 A Few of the 42 Final Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima Number & Gender on Adjectives
61
61 A Few of the 42 Final Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima Verbal Suffixes
62
62 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Improved over 2007 Challenge
63
63 Segment Words Using the Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradas ‘Feminine gender nouns under administration’ ParaMor Identify Search Cluster Filter Segment Evaluation Results
64
64 Segment Words Using the Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administr + ad + a + s Past Participle Feminine Plural ParaMor Identify Search Cluster Filter Segment Evaluation Results
65
65 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradas Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
66
66 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministrada Also in corpus Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results
67
67 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministrada Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Morpheme Boundary
68
68 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministrada Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Morpheme Boundary
69
69 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministradaØ Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Morpheme Boundary
70
70 Segment Words Using the Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administr + ad + a + s Recovers multiple morpheme boundaries from candidate paradigms which each propose single morpheme boundaries ParaMor Identify Search Cluster Filter Segment Evaluation Results
71
71 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 52.8 ParaMor
72
72 Morfessor Baseline system for Challenge Freely available Minimum Description Length Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 MorfessorParaMor
73
73 Morfessor Baseline system for Challenge Freely available Minimum Description Length Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 MorfessorParaMor
74
74 Join ParaMor and Morfessor For each word, submit 2 analyses: a ParaMor analysis and a Morfessor analysis The Effect Oracle Recall Averaged Precision Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 MorfessorParaMor
75
75 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 56.3 MorfessorParaMorParaMor & Morfessor
76
76 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 47.2 52.8 56.3 MorfessorParaMorParaMor & MorfessorBernhard
77
77 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 47.2 47.8 52.8 44.5 56.3 Morfessor ParaMor ParaMor & MorfessorBernhard
78
78 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 47.2 47.8 52.8 44.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard
79
79 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 47.2 47.8 40.6 52.8 44.5 39.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5
80
80 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 52.8 44.5 39.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5
81
81 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 37.1 52.8 44.5 39.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5
82
82 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 37.1 52.8 44.5 39.5 46.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5
83
83 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 37.1 52.8 44.5 39.5 46.5 56.3 54.1 52.0 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5
84
84 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5
85
85 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5
86
86 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5 Sometimes Morfessor wins
87
87 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5 Sometimes ParaMor wins
88
88 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5 ParaMor and Morfessor are Complementary
89
89 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 36.4 39.3 39.9 MorfessorParaMorParaMor & MorfessorBernhard
90
90 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 47.3 36.4 46.7 39.3 36.3 39.9 47.3 Morfessor ParaMor ParaMor & Morfessor Bernhard
91
91 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 47.3 49.2 36.4 46.7 46.8 39.3 36.3 39.7 39.9 47.3 Morfessor ParaMor ParaMor & Morfessor Bernhard 46.7
92
92 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 47.3 49.2 36.4 46.7 46.8 39.3 36.3 39.7 39.9 47.3 Morfessor ParaMor ParaMor & Morfessor Bernhard 46.7
93
93 ParaMor: State-of-the-Art Unsupervised Morphology Induction System ParaMor Identifies paradigms The organizing structure of inflectional morphology Segments words As discovered paradigms suggest Combined with Morfessor Among the best in Morpho Challenge Consistent across languages
94
94 The Next Steps for ParaMor Beyond suffixes Straightforward extension to ParaMor for Prefixes More Challenging Reduplication, Infixation, etc. Morphophonology Incorporate contextual information when clustering Improve system combination True merging of analyses Combine more systems
95
95 Thank You!
96
96
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.