Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu.

Similar presentations


Presentation on theme: "Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu."— Presentation transcript:

1 Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu The 8th Meeting of The Global Wordnet Conference in BUCHAREST January 27-30, 2016

2 Motivation – why to validate? Every expandable and developing human-machine system needs a feedback mechanism The quality of wordnet has a strong impact on the quality of NLP tasks that use it Multiple inheritance cases in the semantic hierarchies of wordnet are prone to different semantic errors 2

3 Main aim To prove that semantic hierarchies of wordnet-type dictionaries do contain yet undiscovered substructures which correspond to certain descriptions (test patterns) and … the usage of these patterns to validate semantic hierarchies may improve wordnet structure significantly 3

4 Previous work Cycles (Šmrz, 2004), (Kubis, 2012) Shortcuts (Fischer, 1997) Rings (Liu et al., 2004; Richens, 2008) Dangling uplinks (Koeva et al., 2004; Šmrz, 2004) Orphan nodes (null graphs) (Čapek, 2012) 4

5 An artificial hierarchy 5

6 An artificial hierarchy and specific substructures 6 1 Short cut 2 Heart-shaped substructure 3 Ring 4 Closed subset 5 Dense component 6 Connected roots + 4 substructures 1 2 3 4 5 6 Specific substructures = test patterns

7 Example 1: synset with many roots 7

8 Example 2: dense component 8

9 Example 3: „Compound“ pattern 9

10 Example 4: connected roots Side view Top view 10

11 Estonian Wordnet iterative evolution Version Noun roots Verb roots Multiple inheritance cases Short cuts Rings Synset with many roots Heart-shaped substructure Dense component “Compound ” pattern The largest closed subset 60142241,2962353,4451,1231,8251043013,057×457 …………………………… 65248141,7171942,1717914511324593,875×263 6614441,6771191,7966132591216712,907×218 …………………………… 69121410218291351823350×7 701184517217037123×4 11

12 Statistics of the correction operations Over ten versions of EstWN (during 4 years) 21,911 – removing the hypernymy and hyponymy relations 5,344 – the lexical units in synsets were changed 4,122 – hypernymy and hyponymy relations were replaced by another semantic relation, mainly by near synonymy and fuzzynymy 12

13 Wordnets in comparison Wordnet Noun roots Verb roots Multiple inheritance cases Short cuts Rings Synset with many roots Heart-shaped substructure Dense component „Compound“ pattern The largest closed subsets Princeton WordNet Version 3.0 123341,453402,99118155115358 1,333×167 Finnish Wordnet Version 2.0 123341,453402,99118155115394 1,334×167 Cornetto Version 2.0 222,4383515,309621,226217549 11,032×589 Polish Wordnet Version 2.0 6374210,94255357,887 205,254 5,037778541 30,794×4,683 Estonian Wordnet Version 70 11845172170037123x4 13

14 Summary In this presentation we studied: how to validate semantic hierarchies of wordnet and we proposed to use test patterns which are descriptions of the substructures with the specific nature. To prove the efficiency of test patterns we partially applied these test patterns over 10 versions of EstWN. Instances of different test patterns were extracted by programs of ours and validated by lexicographers. We discovered that the number of multiple inheritance cases decreased during last five versions about 97 procent. 14

15 Future works Applying test patterns on: other semantic relations other wordnets 15


Download ppt "Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu."

Similar presentations


Ads by Google