Presentation is loading. Please wait.

Presentation is loading. Please wait.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

Similar presentations


Presentation on theme: "WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G."— Presentation transcript:

1 WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G

2 Overview  What is WSD ?  How wordnet is analyzed as a Complex Network  What are the results  Project Methodology  Area of study  Key Findings/Results  New approaches  Improvement techniques  Conclusion

3 Project Description  Objective  Study on WSD Effects of WSD in Word Sense Ontology Characteristics of WordNet  Results  How do match words with other words Parameters taken for study of word sense Improvise them by making necessary changes Study network characteristics

4 WordNet - overview  Machine readable semantic dictionary interlinked by semantic relations  Developed at Princeton University as a large lexical database for English language  Most widely used linguistic resource  Free for public (GPL )  Forms a scale free network with small average shortest path having words as nodes and concepts as links  Easily navigable

5 WordNet (Structure)  Shows the relation in the form of  Noun, Verb, Adjective, adverb Synonym Hypernym (Is a kind of …) Hyponym (… Is a kind of) Troponym (particular ways to …) Meronym (parts of …) ---- about 25 relations  Also available for online navigation

6 WordNet online - by Princeton University

7 WordNet Browser

8 WordNet (working)  WSD:  Corpus based approaches  Set of samples that enables the system  Knowledge based approaches  Machine readable dictionary with relations  WordNet Research  Open source Ranking of synsets derived from word frequencies in the British National Corpus Top 1000  Content manipulation of text Dataset I – controlled and calibrated study Dataset II – collected using mechanical trunk using pairs

9 Word Sense Disambiguation (WSD)  Task of determining the meaning of an ambiguous word in the given context  Bank Edge of a river or Financial institution that accepts money  Refers to the resolution of lexical semantic ambiguity and its goal is to attribute the correct senses to words (AI-complete problem)

10 WSD: Area of Research  Assigning correct sense to words having electronic dictionary as source of word definitions  Open research field in Natural Language Processing (NLP)  Hard Problem which is a popular area for research  Used in speech synthesis by identifying the correct sense of the word

11 JavaScript Visual WordNet

12 Visual Thesaurus

13 WordNet – Theoretical aspects  Wordnet – word sense ontology  Symbols are words  Synset: list of words and semantic relations between them  Word sense disambiguation Wordnet structure using latent semantics Variable lexical notation for a concept Citibase – Thesaurus Semantic relatedness And few others…

14 WSD: using latent semantics  Measures the semantic distance of concepts  Relatedness and between-ness are calculated  Matrix form of wordnet data structure is used  Can be used to integrate with other applications  Uses Singular Value Decomposition (SVD) algorithm  Example: Multiple synsets are  {car, gondola}  {car, railway car}  {car, automobile} {Motor vehicle}, {Coupe}, {Sedan}, {Taxi}

15 MDS-example 12345678910111213 10111223112422 21022123223433 31202334223533 41220323221413 52133012222333 62232101111222 73343210222133 81222212022331 91222212202331 102331212220313 1144 5 4321333044 122331323331404 132333323113440 1, 2, 3, 4, 10, 12 5, 6, 7, 8, 9, 11, 13 Geodesic Distance Matrix MDS k-means S 15

16 WSD: using latent semantics

17 WSD: variable lexical notations for a concept GGeneric concept notation: D = I ∪ J ∪ K ∴ J = D − (I ∪ K) = (D − I ) ∩ (D − K) = D ∩ (I ∪ K) J = D ∩ ( I ∩ K) since, B = D ∪ E ∪ F D = B − (E ∪ F) =(B − E) ∩ (B − F) = B ∩ (E ∪ F) D =B ∩ (E ∩ F) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications ¯¯¯¯ ¯ ¯ ¯ ¯

18 WSD: variable lexical notations for a concept J = D ∩ ( I ∩ K) =( B ∩ (E ∩ F) ) ∩ ( I ∩ K) J = B ∩ ( (E ∩ F) ∩ ( I ∩ K) ) when J = fly, D = fish lure I = spinner k = troll And introducing boolean operators, AND for ∩ OR for ∪ NOT for ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

19 WSD: variable lexical notations for a concept  (“fly”) becomes : (“fisherman's lure” OR “fish lure”) AND ( (NOT “spinner”) AND (NOT “troll”) ) then B = lure, E = ground bait, F = stool pigeon  (“fly”) becomes : (“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

20 Thesaurus as a complex network  As a Directed Graph  sink composed of the 73,046 terms with kout = 0  source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words absolute source : without incoming links kin = 0 normal source : (kout > 0 and kin > 0) bridge source : without outgoing links to root words (kout(source) = 0) 1 – Normal source 2 – Bridge source 3 – Absolute source 4 – sink Source: arXiv:cond-mat/0312586 v1 2003

21 WSD: Semantic relatedness and word sense disambiguation Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications  Concepts that occur more frequently and closer with each others are “more related” to each others than the concepts that appear less frequently and farther one

22 WordNet Relationship  Semantic relatedness  Involves relationships among words car-wheel (meronym) hot-cold (antonym) pencil-paper (functional) penguin-antarctica (association) Bank-trust company (synonym)  Probability and Distance calculation  Frequency of synsets or words  Performance in NLP applications

23 WordNet Relationship Browser

24 WordNet Connect  Program to find all possible connections between two words in WordNet  Used in computing Semantic Opposition among word sense ontology  WordNet lexical database dictionary is used to read the semantic relations  Capabilities like number of paths, shortest path, overall network structure is studied

25 WordNet Connect

26

27

28 Future work  WordNet structure in terms of complex network  Key assumptions  WordNet lexical dictionary analyzed under the scope of source node, target node with an additional reference node  Achieve a cost effective path which is conditionally related to mean reference node  Control the path traversal with a relation of focus  Include Common File Number to make it more efficient

29 Conclusion  A single visualization can not reveal the entire structure of wordnet  There are different ways of analyzing the effectiveness of the overall system  A new method to evaluate the usefullness of the WordNet network structure

30 Questions and Comments


Download ppt "WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G."

Similar presentations


Ads by Google