Download presentation
Presentation is loading. Please wait.
Published byLouise Norton Modified over 9 years ago
1
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network
2
Page 2 SenDiS SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet (manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008) preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet Outline
3
Page 3 SenDiS o hypernyms o hyponyms o similar to o has part o synonyms o antonyms o holonyms o meronyms o coordinate terms o troponyms o entailment Semantic/Lexical Relations
4
Page 4 SenDiS An excerpt of the WordNet semantic network * * Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009) Semantic/Lexical relations: WordNet
5
Page 5 SenDiS Semantic/Lexical relations: GRAALAN Tail of relationHead of relationRelation type {synonym } Bidirectional, symmetric {antonym } Bidirectional, symmetric {paronym} Bidirectional, symmetric { hypernym }{hyponym}Bidirectional, asymmetric {connotation}-Unidirectional {holonym}{meronym}Bidirectional, asymmetric {homonym} Bidirectional, symmetric {heteronym} Bidirectional, symmetric {homophone} Bidirectional, symmetric {diminutive of}{diminutive by}Bidirectional, asymmetric {augmentative of}{augmentative by}Bidirectional, asymmetric {extension from}{extension into}Bidirectional, asymmetric {reduction from}{reduction into}Bidirectional, asymmetric {generalization from}{generalization into}Bidirectional, asymmetric {specialization from}{specialization into}Bidirectional, asymmetric {figurative of}{literal for}Bidirectional, asymmetric {reference to}-Unidirectional {derived from}{derived into}Bidirectional, asymmetric {back formatted form}{back formats}Bidirectional, asymmetric {abstract for}{concretized from}Bidirectional, asymmetric {with variant}{variant for}Bidirectional, asymmetric
6
Page 6 SenDiS manually annotating the glosses from a lexicon (using a specific tool that can ease the process) importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses Obtaining a SenDiS LexNet
7
Page 7 SenDiS o implied a significant effort, usually measured in months, involving several trained linguists o using a specialized collaborative tool (BuildLNTool – Build Lexicon Network Tool) o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language Creating the SenDiS LexNet
8
Page 8 SenDiS o BuildLNTool (Build Lexicon Network Tool) provides: a visual and effective mechanism to manually annotate the lexicon glosses a synchronized overview of the already created relations a browsing mechanism for inspecting the already tagged glosses and relations BuildLNTool
9
Page 9 SenDiS “Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees” “Root & Leaf Meanings” Messages and progress BuildLNTool - Sections
10
Page 10 SenDiS o “Lemmas & MWEs”: list of lexicon entries o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network o “Lemma/MWE Info”: current lexicon entry being analyzed o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net o section for messages and progress BuildLNTool – Sections II
11
Page 11 SenDiS selection of lexicon entry type selection of unfinished lexicon entries filter selection of viewing interval text filter lexicon entry text lexicon entry status BuildLNTool – Lemmas & MWEs
12
Page 12 SenDiS double click BuildLNTool – Selection of a current lexicon entry
13
Page 13 SenDiS lexicon entry textmorphologic interpretation list of meanings filters meaning/gloss fully tagged meaning/gloss partially tagged meaning/gloss not tagged BuildLNTool – Browsing the meanings of the current lexicon entry
14
Page 14 SenDiS double click BuildLNTool – Selection of a current meaning for tagging
15
Page 15 SenDiS unrecognized gloss constituent ‘Enter’ BuildLNTool – Gloss constituent without interpretations
16
Page 16 SenDiS Default setting: Medium BuildLNTool – Degrees of relevance (in gloss context)
17
Page 17 SenDiS ‘Strong’ tokens ‘Medium’ tokens ‘Weak’ tokens Ignored (X) tokens BuildLNTool – Degrees of relevance II
18
Page 18 SenDiS Unsaved annotations Saved annotations BuildLNTool – Gloss tagging
19
Page 19 SenDiS view of meaning tagging tree selection of constituent / group of gloss constituents set / modify relevance degree edit text of gloss constituent select / modify the sense for the gloss constituent further annotate meaning / save annotations chose the next meaning further on save annotations current gloss constituent without sense interpretations BuildLNTool – Gloss tagging protocol
20
Page 20 SenDiS LexNets AllTokensOperatedTokensOpTokensValidOpTokensRelatedOpTokens V & R LL_Romanian - 99% 1,528,8191,191,942691,010720,420686,210 LL_English - 2% 36,82830,35018,52317,64117,505 LexNets GlossesTagged GlossesTargeted GlossesTags Density LL_Romanian - 99% 130,087118,53658,9760.5757 LL_English - 2% 259,6513,4967,5510.5767 Built LexNets for Romanian and English
21
Page 21 SenDiS o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples) o the synsets were split and transformed in to a classical lexicon format o the lexicon network imported: LexNets GlossesTagged GlossesTargeted GlossesTags Density WordNet 206,941206,93859,2510.3486 WordNet_extendedGlosses 206,941 83,1740.3006 LexNets AllTokensOperatedTokensOpTokensValidOpTokensRelatedOpTokens V & R WordNet 2,394,190 2,394,189834,803 WordNet_extendedGlosses 3,114,968 3,114,967936,397 Imported WordNet tagged glosses
22
Page 22 SenDiS o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices Ordering a SenDiS LexNet
23
Page 23 SenDiS e9e9 e4e4 e5e5 e6e6 e7e7 e8e8 e1e1 e2e2 e3e3 A minimal lexicon net in the original form Unordered LexNet
24
Page 24 SenDiS 9 1 2 3 4 5 6 7 8 V e 11 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 e7e7 e8e8 e9e9 10 e 10 11 B The same minimal lexicon net leveled Ordered (leveled) LexNet
25
Page 25 SenDiS LNsVerticesEdges In OLN AlgorithmEdges OutEdges RemovedLevelsTime (s) wn202,361834,803Patentv1821,04813,755192 4.5 wn_ex205,188936,397Patentv1936,39774,526 3825.7 ro_48%72,067318,741Patentv1308,59210,149 195 1.6 ro_78%100,175523,192Patentv1504,21018,9822442.3 ro_99%120,472686,784Patentv1659,03027,754 291 2.8 ro_48%130,407318,741NT_eades308,33410,4075860 ro_99%130,099686,784NT_eades654,02532,75970330 wn_ex206,941936,397NT_eades904,99231,405461,315 Results on leveling experimental LexNets
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.