Download presentation
Presentation is loading. Please wait.
Published bySydney Goodman Modified over 9 years ago
1
Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional Development Fund General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS - Andrei Mincă - aminca@softwin.ro aminca@softwin.roaminca@softwin.ro SenDiS – WSD model, components, algorithms, methods & results
2
Page 2 SenDiS WSD model
3
Page 3 SenDiS System components
4
Page 4 SenDiS Order Lexicon Network (OLN) Build Meaning Semantic Signatures (BMSS) Compare Meaning Semantic Signatures (CMSS) Compute WSD Variants (CwsdV) WSD phases
5
Page 5 SenDiS Input: unordered lexicon network lexicon network optimizations considering number of edges loops or strong connected components number of roots and leafs number of levels (in the case of leveling the LN) Output: ordered lexicon network OLN Algorithms
6
Page 6 SenDiS Input a lexicon network (not necessarily ordered) a meaning ( ID ) Builds a semantic interpretation for the specified meaning over the lexicon network spanning trees sets of nodes sequences of edges or combinations of the above Output : a semantic interpretation (signature) for the meaning BMSS Algorithms
7
Page 7 SenDiS Input: two or more semantic signatures comparison depends on the nature of the semantic signatures Output: degrees of similarity CMSS Algorithms
8
Page 8 SenDiS Input : a matrix with degrees of similarity between the context words sense Output : one or several WSD variants with the highest cost CwsdV Algorithms
9
Page 9 SenDiS Input text list of meanings lexicon network Computing tokenization of text annotation of text tokens with meaning interpretations selecting a window-text for WSD other context filters or topologies build meaning semantic signatures for each word-sense compare meaning semantic signatures and fill the matrix compute best WSD variants Output one or more WSD variants with one or more meaning interpretations for each text token WSD methods
10
Page 10 SenDiS tokenization part-of-speech tagging lemmatization sense interpretations chunking parsing general WSD requirements
11
Page 11 SenDiS Performance indicators P - precision P = noCorrectlyDisambiguated_TargetWords / noDisambiguated_TargetWords R - recall R = noCorrectlyDisambiguated_TargetWords / noTargetWords F-measure 2 * P * R / (P+R) state-of-the-art results (F-measure) lexical sample task coarse-grained: ~ 90% fine-grained: ~ 73% All-words task coarse-grained: ~83% fine-grained: ~ 65% Testing WSD
12
Page 12 SenDiS A test configuration for SenDiS consists of: a meaning inventory a lexicon network an OLN algorithm a BMSS algorithm a CMSS algorithm a CwsdV algorithm a WSD method a Corpus test Testing SenDiS nMIs x nLNs x nOLNs x nBMSSs x nCMSSs x nCwsdVs x nWSDMs x nCorpusTests
13
Page 13 SenDiS Results Senseval 2 No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 224WN_ex0.28910.21760.245976440.4 meaning interpretations only for recognized lemmas 225WN_ex0.31190.29020.299732050.4 20% coverage for GRAALAN Inflection Form Entries 225WN_ex0.3913 0.391275890.36 20% IFEs + corpus target words lemmas tags Senseval 3 No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 254WN_ex0.23130.15950.185077120.1no IFEs 265WN_ex0.21850.20880.213051910.420% IFEs 256WN_ex0.2845 0.284478320.33 20% IFEs + corpus target words lemmas tags Semcor No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 33,855WN_ex0.19610.18380.188888045020% IFEs 33,866WN_ex0.2515 0.251471546 20% IFEs + corpus target words lemmas tags
14
Page 14 SenDiS Tagged glosses as a Test Corpus WN_ex No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 206,941WN_ex0.7120660.7120570.71206139only corpus target words lemmas tags 158,378WN_ex0.33870.33320.335482069020% IFEs 158,667WN_ex0.45770.41980.4341296790 20% IFEs + corpus target words lemmas tags LLR_99% No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 106,899LLR_99%0.48480.28920.3447658289no IFEs 110,596LLR_99%0.5620.56080.56132905262100% IFEs 110,635LLR_99%0.66410.65050.65627624246 100% IFEs + corpus target words lemmas tags LLE_2% No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 2,927LLE_2%0.64660.58350.60801071.4no IFEs 3,125LLE_2%0.76330.76250.7628381453% IFEs 3,071LLE_2%0.8594 0.859375791.5 53% IFEs + corpus target words lemmas tags
15
Page 15 SenDiS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.