Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network
Page 2 SenDiS SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet (manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008) preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet Outline
Page 3 SenDiS o hypernyms o hyponyms o similar to o has part o synonyms o antonyms o holonyms o meronyms o coordinate terms o troponyms o entailment Semantic/Lexical Relations
Page 4 SenDiS An excerpt of the WordNet semantic network * * Navigli, R Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009) Semantic/Lexical relations: WordNet
Page 5 SenDiS Semantic/Lexical relations: GRAALAN Tail of relationHead of relationRelation type {synonym } Bidirectional, symmetric {antonym } Bidirectional, symmetric {paronym} Bidirectional, symmetric { hypernym }{hyponym}Bidirectional, asymmetric {connotation}-Unidirectional {holonym}{meronym}Bidirectional, asymmetric {homonym} Bidirectional, symmetric {heteronym} Bidirectional, symmetric {homophone} Bidirectional, symmetric {diminutive of}{diminutive by}Bidirectional, asymmetric {augmentative of}{augmentative by}Bidirectional, asymmetric {extension from}{extension into}Bidirectional, asymmetric {reduction from}{reduction into}Bidirectional, asymmetric {generalization from}{generalization into}Bidirectional, asymmetric {specialization from}{specialization into}Bidirectional, asymmetric {figurative of}{literal for}Bidirectional, asymmetric {reference to}-Unidirectional {derived from}{derived into}Bidirectional, asymmetric {back formatted form}{back formats}Bidirectional, asymmetric {abstract for}{concretized from}Bidirectional, asymmetric {with variant}{variant for}Bidirectional, asymmetric
Page 6 SenDiS manually annotating the glosses from a lexicon (using a specific tool that can ease the process) importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses Obtaining a SenDiS LexNet
Page 7 SenDiS o implied a significant effort, usually measured in months, involving several trained linguists o using a specialized collaborative tool (BuildLNTool – Build Lexicon Network Tool) o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language Creating the SenDiS LexNet
Page 8 SenDiS o BuildLNTool (Build Lexicon Network Tool) provides: a visual and effective mechanism to manually annotate the lexicon glosses a synchronized overview of the already created relations a browsing mechanism for inspecting the already tagged glosses and relations BuildLNTool
Page 9 SenDiS “Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees” “Root & Leaf Meanings” Messages and progress BuildLNTool - Sections
Page 10 SenDiS o “Lemmas & MWEs”: list of lexicon entries o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network o “Lemma/MWE Info”: current lexicon entry being analyzed o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net o section for messages and progress BuildLNTool – Sections II
Page 11 SenDiS selection of lexicon entry type selection of unfinished lexicon entries filter selection of viewing interval text filter lexicon entry text lexicon entry status BuildLNTool – Lemmas & MWEs
Page 12 SenDiS double click BuildLNTool – Selection of a current lexicon entry
Page 13 SenDiS lexicon entry textmorphologic interpretation list of meanings filters meaning/gloss fully tagged meaning/gloss partially tagged meaning/gloss not tagged BuildLNTool – Browsing the meanings of the current lexicon entry
Page 14 SenDiS double click BuildLNTool – Selection of a current meaning for tagging
Page 15 SenDiS unrecognized gloss constituent ‘Enter’ BuildLNTool – Gloss constituent without interpretations
Page 16 SenDiS Default setting: Medium BuildLNTool – Degrees of relevance (in gloss context)
Page 17 SenDiS ‘Strong’ tokens ‘Medium’ tokens ‘Weak’ tokens Ignored (X) tokens BuildLNTool – Degrees of relevance II
Page 18 SenDiS Unsaved annotations Saved annotations BuildLNTool – Gloss tagging
Page 19 SenDiS view of meaning tagging tree selection of constituent / group of gloss constituents set / modify relevance degree edit text of gloss constituent select / modify the sense for the gloss constituent further annotate meaning / save annotations chose the next meaning further on save annotations current gloss constituent without sense interpretations BuildLNTool – Gloss tagging protocol
Page 20 SenDiS LexNets AllTokensOperatedTokensOpTokensValidOpTokensRelatedOpTokens V & R LL_Romanian - 99% 1,528,8191,191,942691,010720,420686,210 LL_English - 2% 36,82830,35018,52317,64117,505 LexNets GlossesTagged GlossesTargeted GlossesTags Density LL_Romanian - 99% 130,087118,53658, LL_English - 2% 259,6513,4967, Built LexNets for Romanian and English
Page 21 SenDiS o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples) o the synsets were split and transformed in to a classical lexicon format o the lexicon network imported: LexNets GlossesTagged GlossesTargeted GlossesTags Density WordNet 206,941206,93859, WordNet_extendedGlosses 206,941 83, LexNets AllTokensOperatedTokensOpTokensValidOpTokensRelatedOpTokens V & R WordNet 2,394,190 2,394,189834,803 WordNet_extendedGlosses 3,114,968 3,114,967936,397 Imported WordNet tagged glosses
Page 22 SenDiS o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and vertices over 1,000,000 edges / arcs o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices Ordering a SenDiS LexNet
Page 23 SenDiS e9e9 e4e4 e5e5 e6e6 e7e7 e8e8 e1e1 e2e2 e3e3 A minimal lexicon net in the original form Unordered LexNet
Page 24 SenDiS V e 11 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 e7e7 e8e8 e9e9 10 e B The same minimal lexicon net leveled Ordered (leveled) LexNet
Page 25 SenDiS LNsVerticesEdges In OLN AlgorithmEdges OutEdges RemovedLevelsTime (s) wn202,361834,803Patentv1821,04813, wn_ex205,188936,397Patentv1936,39774, ro_48%72,067318,741Patentv1308,59210, ro_78%100,175523,192Patentv1504,21018, ro_99%120,472686,784Patentv1659,03027, ro_48%130,407318,741NT_eades308,33410, ro_99%130,099686,784NT_eades654,02532, wn_ex206,941936,397NT_eades904,99231,405461,315 Results on leveling experimental LexNets