Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov.

Slides:



Advertisements
Similar presentations
CICWSD: configuration guide
Advertisements

CICWSD: programming guide
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Improved TF-IDF Ranker
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
Creating a Similarity Graph from WordNet
Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
RIMS II Online Order and Delivery System Tutorial on Downloading and Viewing Multipliers.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Programming Languages: Telling the Computers What to Do Chapter 16.
D YNAMIC B UILDING OF D OMAIN S PECIFIC L EXICONS U SING E MERGENT S EMANTICS Final Presentation Matt Selway Supervisor: Professor Markus Stumptner.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
SENSEVAL2 Scott Cotton and Martha Palmer ISLE Meeting Dec 11, 2000 University of Pennsylvania.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Part 3. Knowledge-based Methods for Word Sense Disambiguation.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Word Sense Disambiguation (WSD)
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Lesson 17 Getting Started with Access Essentials
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
CS 320 Assignment 1 Rewriting the MISC Osystem class to support loading machine language programs at addresses other than 0 1.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures Torsten Zesch and Iryna Gurevych Ubiquitous Knowledge.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.
Introduction to programming in the Java programming language.
Clustering Word Senses Eneko Agirre, Oier Lopez de Lacalle IxA NLP group
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Using Semantic Relatedness for Word Sense Disambiguation
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
UsersTraining StatisticsCommunication Tests Knowledge Board Welcome to the Knowledge Board interactive guide! We encourage you to start with a click on.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Chapter – 8 Software Tools.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
E Copyright © 2006, Oracle. All rights reserved. Using SQL Developer.
Word Sense Disambiguation Algorithms in Hindi
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
SENSEVAL: Evaluating WSD Systems
A Brief Introduction to Distant Supervision
WordNet WordNet, WSD.
A method for WSD on Unrestricted Text
臺灣大學資訊工程學系 高紹航 臺灣大學外國語文學系 高照明
CS224N Section 3: Corpora, etc.
University of Illinois System in HOO Text Correction Shared Task
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Presentation transcript:

Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov

 What is CICWSD?  Quick Start  Excel file ◦ Experimental setup sheet ◦ Performance sheet ◦ Decisions summary sheet ◦ Problem summary sheet ◦ Miscellaneous sheet ◦ Detail sheet  Contact information and citation

CICWSD is a Java API and command for word sense disambiguation. Its main features are:  It has included some state-of-the-art WSD dictionary-based algorithms for you to use.  Easy configuration of many parameters such as window size, number of senses retrieved from the dictionary, back-off method, tie solving method and conditions for retrieving window words.  Easy configuration on a single XML file.  Output is generated in a simple XLS file by using JExcelApi.JExcelApi The API is licensed under the GNU General Public License (v2 or later). Source is included. Senseval 2 and Senseval 3 English-All-Words task are bundled together within CICWSD.GNU General Public License

1. Download CICWSD from ziphttp://fviveros.gelbukh.com/downloads/CICWSD- 1.0.zip 2. Unzip files 3. Open a command line 4. Change the current directory to the CICWSD directory 5. Edit the current configuration file: config.xml 6. Execute java –jar cicwsd.jar. You should see something like this:

The excel files contain all the results generated by the experimentation. These results are presented in the following sheets: ◦ Experimental setup sheet: Contains the description of each tested algorithm and its configuration. ◦ Performance sheet: Contains the performance measures of each algorithm per document on the test set. ◦ Decisions summary sheet: Contains the detailed performance for each tested algorithm. ◦ Problem summary sheet: Contains the frequency and IDF of the words inside the target documents. ◦ Miscellaneous sheet: Contains some interesting disambiguation facts. ◦ Detail sheet: Explains how each algorithm’s decision was made.

This sheet contains the description of each tested algorithm and its configuration. The data depicted in the sheet is the following: ◦ Knowledge source: Is the information source of the bag of words of the senses. Also, it is explained how many senses were retrieved. I.E.  WNGlosses;WNSamples. * Retrieved Senses: All. Tell us that the bag of words were extracted from WordNet definitions and samples for all senses of the word. ◦ Tests: The tested algorithms are described in the following form:  Test N: Is the name for resuming an algorithm and its configuration.  WSD method: The selected WSD algorithm.  Back-off method: The selected back-off strategy.  Tie solving method: The selected tie solving WSD algorithm.  Window size: The number of context words used as the information window.  Windowing conditions: A list containing the conditions for filtering the context words.

This sheet contains the performance measures of each algorithm per document on the test set. The performance measure are the following:

Performance data is presented individually for each tested algorithm. The format of the result tables is the following:  Rows: Each row shows the measures registered in each test set document. The final row contains the obtained overall results.  Columns: Columns contains the three performance measures for each word class (noun, verb, adjectives, adverbs). The final three columns correspond to the global results. A no attempt is depicted when cells have no data or error data. No attempt means that the algorithm did not try any word of an specific word class.

This sheet contains the detailed performance of each WSD algorithm. The data is presented for each tested algorithm as following:  Rows: The rows contain the results obtained for each attempted lemma.  First N Columns: The first N columns show the number of attempts made in each target document. N is the number of target documents.  Overall attempts: Is the number of disambiguation attempts made by the algorithm for an specific lemma in all the target documents.  Overall correct answers: Is the number of times that the algorithm correctly disambiguated an specific word.  IDF: Is the calculated IDF of the target word. IDF is calculated from the loaded samples and/or definitions.

This sheet contains the frequency and IDF of the words inside the target documents. The information is presented as following:  Rows: Each row contains information for every word inside the target documents.  First N columns: The first N columns contain the frequency of the word inside each target document. N is the number of target documents.  Overall appearances: Is the frequency of the word in all the target documents.  IDF: Is the calculated IDF of the target word. IDF is calculated from the loaded samples and/or definitions.

This sheet contains some interesting data regarding disambiguation. The data is presented as following:  Rows: Each row shows the measures registered in each test set document. The final row contains the obtained overall results.  Average words used column: Is the number of words from the window that allow the algorithm giving an answer. I.E. if you set a window size of 4 and this column contains a 2, this means that only 2 of these 4 words were useful for disambiguation.  Average sense addressed: Is the number of senses which score more than a 0 (meaning that they are possible answers).  Probability of addressing the correct sense: is the probability of having the correct sense among the possible answers.  Average polysemy: is the average number of senses of the attempted words.  Average score: is the average score of the algorithm ‘s selected answers.

This sheet explains how each word was disambiguated. It is recommended that you do not generate this sheet for test sets with multiple documents and/or multiple WSD algorithms. The creation of this sheet requires a lot of computational resources. Data is presented for each attempted word showing the following information: target word, window words, score obtained for each sense, words that produce the score increments, and, the selected answer.

For any question regarding the CICWSD API please contact Francisco Viveros-Jiménez by or Skype Please cite the following paper in your work: Viveros-Jiménez, F., Gelbukh, A., Sidorov, G.: Improving Simplified Lesk Algorithm by using simple window selection practices. Submitted.

 Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proc. of SIGDOC-86: 5th International Conference on Systems Documentation, Toronto, Canada.  Rada R, Mill H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets, in IEEE Transactions on Systems, Man and Cybernetics, vol. 19, no. 1, pp  Miller G (1995) WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11:  Agirre E, Rigau G (1996) Word Sense Disambiguation using Conceptual Density Proceedings of COLING'96, Copenhagen (Denmark).  Kilgarriff A (1997) I don't believe in word senses. Computers and the Humanities. 31(2), pp. 91–113.

 Edmonds P (2000) Designing a task for SENSEVAL-2. Tech. note. University of Brighton, Brighton. U.K.  Kilgarriff A, Rosenzweig J (2000) English Framework and Results Computers and the Humanities 34 (1-2), Special Issue on SENSEVAL.  Toutanova K, Manning C D (2000) Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp  Cotton S, Edmonds P, Kilgarriff A, Palmer M (2001) “SENSEVAL-2.” Second International Workshop on Evaluating Word Sense Disambiguation Systems. SIGLEX Workshop, ACL03. Toulouse, France.  Mihalcea R, Edmons P (2004) Senseval-3 Third International Workshop on Evaluating of Systems for the Semantic Analysis of Text. Association for Computational Linguistics. ACL 04. Barcelona, Spain.

 Vasilescu F, Langlais P, Lapalme G (2004) Evaluating Variants of the Lesk Approach for Disambiguating Words. LREC, Portugal.  Mihalcea R (2006) Knowledge Based Methods for Word Sense Disambiguation, book chapter in Word Sense Disambiguation: Algorithms, Applications, and Trends, Editors Phil Edmonds and Eneko Agirre, Kluwer.  Navigli R, Litkowski K, Hargraves O (2007) SemEval-2007 Task 07: Coarse-Grained English All-Words Task. Proc. of Semeval-2007 Workshop (SemEval), in the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic.  Sinha R, Mihalcea R (2007) Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity, in Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2007), Irvine, CA.  Navigli R (2009) Word Sense Disambiguation: a Survey. ACM Computing Surveys, 41(2), ACM Press, pp