WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G
Overview What is WSD ? How wordnet is analyzed as a Complex Network What are the results Project Methodology Area of study Key Findings/Results New approaches Improvement techniques Conclusion
Project Description Objective Study on WSD Effects of WSD in Word Sense Ontology Characteristics of WordNet Results How do match words with other words Parameters taken for study of word sense Improvise them by making necessary changes Study network characteristics
WordNet - overview Machine readable semantic dictionary interlinked by semantic relations Developed at Princeton University as a large lexical database for English language Most widely used linguistic resource Free for public (GPL ) Forms a scale free network with small average shortest path having words as nodes and concepts as links Easily navigable
WordNet (Structure) Shows the relation in the form of Noun, Verb, Adjective, adverb Synonym Hypernym (Is a kind of …) Hyponym (… Is a kind of) Troponym (particular ways to …) Meronym (parts of …) ---- about 25 relations Also available for online navigation
WordNet online - by Princeton University
WordNet Browser
WordNet (working) WSD: Corpus based approaches Set of samples that enables the system Knowledge based approaches Machine readable dictionary with relations WordNet Research Open source Ranking of synsets derived from word frequencies in the British National Corpus Top 1000 Content manipulation of text Dataset I – controlled and calibrated study Dataset II – collected using mechanical trunk using pairs
Word Sense Disambiguation (WSD) Task of determining the meaning of an ambiguous word in the given context Bank Edge of a river or Financial institution that accepts money Refers to the resolution of lexical semantic ambiguity and its goal is to attribute the correct senses to words (AI-complete problem)
WSD: Area of Research Assigning correct sense to words having electronic dictionary as source of word definitions Open research field in Natural Language Processing (NLP) Hard Problem which is a popular area for research Used in speech synthesis by identifying the correct sense of the word
JavaScript Visual WordNet
Visual Thesaurus
WordNet – Theoretical aspects Wordnet – word sense ontology Symbols are words Synset: list of words and semantic relations between them Word sense disambiguation Wordnet structure using latent semantics Variable lexical notation for a concept Citibase – Thesaurus Semantic relatedness And few others…
WSD: using latent semantics Measures the semantic distance of concepts Relatedness and between-ness are calculated Matrix form of wordnet data structure is used Can be used to integrate with other applications Uses Singular Value Decomposition (SVD) algorithm Example: Multiple synsets are {car, gondola} {car, railway car} {car, automobile} {Motor vehicle}, {Coupe}, {Sedan}, {Taxi}
MDS-example , 2, 3, 4, 10, 12 5, 6, 7, 8, 9, 11, 13 Geodesic Distance Matrix MDS k-means S 15
WSD: using latent semantics
WSD: variable lexical notations for a concept GGeneric concept notation: D = I ∪ J ∪ K ∴ J = D − (I ∪ K) = (D − I ) ∩ (D − K) = D ∩ (I ∪ K) J = D ∩ ( I ∩ K) since, B = D ∪ E ∪ F D = B − (E ∪ F) =(B − E) ∩ (B − F) = B ∩ (E ∪ F) D =B ∩ (E ∩ F) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications ¯¯¯¯ ¯ ¯ ¯ ¯
WSD: variable lexical notations for a concept J = D ∩ ( I ∩ K) =( B ∩ (E ∩ F) ) ∩ ( I ∩ K) J = B ∩ ( (E ∩ F) ∩ ( I ∩ K) ) when J = fly, D = fish lure I = spinner k = troll And introducing boolean operators, AND for ∩ OR for ∪ NOT for ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept (“fly”) becomes : (“fisherman's lure” OR “fish lure”) AND ( (NOT “spinner”) AND (NOT “troll”) ) then B = lure, E = ground bait, F = stool pigeon (“fly”) becomes : (“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Thesaurus as a complex network As a Directed Graph sink composed of the 73,046 terms with kout = 0 source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words absolute source : without incoming links kin = 0 normal source : (kout > 0 and kin > 0) bridge source : without outgoing links to root words (kout(source) = 0) 1 – Normal source 2 – Bridge source 3 – Absolute source 4 – sink Source: arXiv:cond-mat/ v1 2003
WSD: Semantic relatedness and word sense disambiguation Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications Concepts that occur more frequently and closer with each others are “more related” to each others than the concepts that appear less frequently and farther one
WordNet Relationship Semantic relatedness Involves relationships among words car-wheel (meronym) hot-cold (antonym) pencil-paper (functional) penguin-antarctica (association) Bank-trust company (synonym) Probability and Distance calculation Frequency of synsets or words Performance in NLP applications
WordNet Relationship Browser
WordNet Connect Program to find all possible connections between two words in WordNet Used in computing Semantic Opposition among word sense ontology WordNet lexical database dictionary is used to read the semantic relations Capabilities like number of paths, shortest path, overall network structure is studied
WordNet Connect
Future work WordNet structure in terms of complex network Key assumptions WordNet lexical dictionary analyzed under the scope of source node, target node with an additional reference node Achieve a cost effective path which is conditionally related to mean reference node Control the path traversal with a relation of focus Include Common File Number to make it more efficient
Conclusion A single visualization can not reveal the entire structure of wordnet There are different ways of analyzing the effectiveness of the overall system A new method to evaluate the usefullness of the WordNet network structure
Questions and Comments