Download presentation
Presentation is loading. Please wait.
Published byHope Parker Modified over 9 years ago
1
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G
2
Overview What is WSD ? How wordnet is analyzed as a Complex Network What are the results Project Methodology Area of study Key Findings/Results New approaches Improvement techniques Conclusion
3
Project Description Objective Study on WSD Effects of WSD in Word Sense Ontology Characteristics of WordNet Results How do match words with other words Parameters taken for study of word sense Improvise them by making necessary changes Study network characteristics
4
WordNet - overview Machine readable semantic dictionary interlinked by semantic relations Developed at Princeton University as a large lexical database for English language Most widely used linguistic resource Free for public (GPL ) Forms a scale free network with small average shortest path having words as nodes and concepts as links Easily navigable
5
WordNet (Structure) Shows the relation in the form of Noun, Verb, Adjective, adverb Synonym Hypernym (Is a kind of …) Hyponym (… Is a kind of) Troponym (particular ways to …) Meronym (parts of …) ---- about 25 relations Also available for online navigation
6
WordNet online - by Princeton University
7
WordNet Browser
8
WordNet (working) WSD: Corpus based approaches Set of samples that enables the system Knowledge based approaches Machine readable dictionary with relations WordNet Research Open source Ranking of synsets derived from word frequencies in the British National Corpus Top 1000 Content manipulation of text Dataset I – controlled and calibrated study Dataset II – collected using mechanical trunk using pairs
9
Word Sense Disambiguation (WSD) Task of determining the meaning of an ambiguous word in the given context Bank Edge of a river or Financial institution that accepts money Refers to the resolution of lexical semantic ambiguity and its goal is to attribute the correct senses to words (AI-complete problem)
10
WSD: Area of Research Assigning correct sense to words having electronic dictionary as source of word definitions Open research field in Natural Language Processing (NLP) Hard Problem which is a popular area for research Used in speech synthesis by identifying the correct sense of the word
11
JavaScript Visual WordNet
12
Visual Thesaurus
13
WordNet – Theoretical aspects Wordnet – word sense ontology Symbols are words Synset: list of words and semantic relations between them Word sense disambiguation Wordnet structure using latent semantics Variable lexical notation for a concept Citibase – Thesaurus Semantic relatedness And few others…
14
WSD: using latent semantics Measures the semantic distance of concepts Relatedness and between-ness are calculated Matrix form of wordnet data structure is used Can be used to integrate with other applications Uses Singular Value Decomposition (SVD) algorithm Example: Multiple synsets are {car, gondola} {car, railway car} {car, automobile} {Motor vehicle}, {Coupe}, {Sedan}, {Taxi}
15
MDS-example 12345678910111213 10111223112422 21022123223433 31202334223533 41220323221413 52133012222333 62232101111222 73343210222133 81222212022331 91222212202331 102331212220313 1144 5 4321333044 122331323331404 132333323113440 1, 2, 3, 4, 10, 12 5, 6, 7, 8, 9, 11, 13 Geodesic Distance Matrix MDS k-means S 15
16
WSD: using latent semantics
17
WSD: variable lexical notations for a concept GGeneric concept notation: D = I ∪ J ∪ K ∴ J = D − (I ∪ K) = (D − I ) ∩ (D − K) = D ∩ (I ∪ K) J = D ∩ ( I ∩ K) since, B = D ∪ E ∪ F D = B − (E ∪ F) =(B − E) ∩ (B − F) = B ∩ (E ∪ F) D =B ∩ (E ∩ F) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications ¯¯¯¯ ¯ ¯ ¯ ¯
18
WSD: variable lexical notations for a concept J = D ∩ ( I ∩ K) =( B ∩ (E ∩ F) ) ∩ ( I ∩ K) J = B ∩ ( (E ∩ F) ∩ ( I ∩ K) ) when J = fly, D = fish lure I = spinner k = troll And introducing boolean operators, AND for ∩ OR for ∪ NOT for ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
19
WSD: variable lexical notations for a concept (“fly”) becomes : (“fisherman's lure” OR “fish lure”) AND ( (NOT “spinner”) AND (NOT “troll”) ) then B = lure, E = ground bait, F = stool pigeon (“fly”) becomes : (“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
20
Thesaurus as a complex network As a Directed Graph sink composed of the 73,046 terms with kout = 0 source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words absolute source : without incoming links kin = 0 normal source : (kout > 0 and kin > 0) bridge source : without outgoing links to root words (kout(source) = 0) 1 – Normal source 2 – Bridge source 3 – Absolute source 4 – sink Source: arXiv:cond-mat/0312586 v1 2003
21
WSD: Semantic relatedness and word sense disambiguation Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications Concepts that occur more frequently and closer with each others are “more related” to each others than the concepts that appear less frequently and farther one
22
WordNet Relationship Semantic relatedness Involves relationships among words car-wheel (meronym) hot-cold (antonym) pencil-paper (functional) penguin-antarctica (association) Bank-trust company (synonym) Probability and Distance calculation Frequency of synsets or words Performance in NLP applications
23
WordNet Relationship Browser
24
WordNet Connect Program to find all possible connections between two words in WordNet Used in computing Semantic Opposition among word sense ontology WordNet lexical database dictionary is used to read the semantic relations Capabilities like number of paths, shortest path, overall network structure is studied
25
WordNet Connect
28
Future work WordNet structure in terms of complex network Key assumptions WordNet lexical dictionary analyzed under the scope of source node, target node with an additional reference node Achieve a cost effective path which is conditionally related to mean reference node Control the path traversal with a relation of focus Include Common File Number to make it more efficient
29
Conclusion A single visualization can not reveal the entire structure of wordnet There are different ways of analyzing the effectiveness of the overall system A new method to evaluate the usefullness of the WordNet network structure
30
Questions and Comments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.