CICWSD: programming guide

Slides:



Advertisements
Similar presentations
Introduction to Java 2 Programming Lecture 4 Writing Java Applications, Java Development Tools.
Advertisements

Introduction to Java 2 Programming Lecture 3 Writing Java Applications, Java Development Tools.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
Introduction to Eclipse. Start Eclipse Click and then click Eclipse from the menu: Or open a shell and type eclipse after the prompt.
CICWSD: configuration guide
Samsung Smart TV is a web-based application running on an application engine installed on digital TVs connected to the Internet.
Cognos Web Services Business Intelligence. SOA SOA (Service Oriented Architecture) The SOA approach involves seven key principles: -- Coarse -grained.
Programmer-defined classes Part 2. Topics Returning objects from methods The this keyword Overloading methods Class methods Packaging classes Javadoc.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.
Using Eclipse. Getting Started There are three ways to create a Java project: 1:Select File > New > Project, 2 Select the arrow of the button in the upper.
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
1 Frameworks. 2 Framework Set of cooperating classes/interfaces –Structure essential mechanisms of a problem domain –Programmer can extend framework classes,
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
© The McGraw-Hill Companies, 2006 Chapter 17 The Java Collections Framework.
AP Computer Science.  Not necessary but good programming practice in Java  When you override a super class method notation.
Guide To UNIX Using Linux Third Edition
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 th Ed Chapter Chapter 7 Defining Your Own Classes Part 2.
Create a New Application and Project Open the Create Application dialog. Enter the application name of your choice and the directory. Select No Template.
Java Enterprise Edition Java Web Development Structure of a web project Introduction to Web Applications The first project Introduction to Java Web Development.
Programming Languages: Telling the Computers What to Do Chapter 16.
Classes and objects Practice 2. Basic terms  Classifier is an element of the model, which specifies some general features for a set of objects. Features.
Lab Assignment 7 | Web Forms and Manipulating Strings Interactive Features Added In this assignment you will continue the design and implementation of.
LATTICE TECHNOLOGY, INC. For Version 10.0 and later XVL Web Master Advanced Tutorial For Version 10.0 and later.
Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov.
Chapter 3 Vector Class. Agenda Design and Implementation of Vector class – add, get, set remove, copy, equals, ensureCapacity Hangman using Vector class.
JUnit in Action SECOND EDITION PETAR TAHCHIEV FELIPE LEME VINCENT MASSOL GARY GREGORY ©2011 by Manning Publications Co. All rights reserved. Slides Prepared.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Hash Functions and the HashMap Class A Brief Overview On Green Marble John W. Benning.
The Function Design Recipe CS 5010 Program Design Paradigms “Bootcamp” Lesson 1.1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
1 Functions 1 Parameter, 1 Return-Value 1. The problem 2. Recall the layout 3. Create the definition 4. "Flow" of data 5. Testing 6. Projects 1 and 2.
Java Spring PImage Let’s look at the PImage class in ProcessingPImage –What are the fields (i.e., variables)? –What methods are available? –What.
CSCE 2013L: Lab 1 Overview  Java Basics The JVM Anatomy of a Java Program  Object-Oriented Programming Overview  Example: Payroll.java JDK Tools and.
1 Documenting with Javadoc. 2 Motivation  Why document programs? To make it easy to understand, e.g., for reuse and maintenance  What to document? Interface:
Hey, Ferb, I know what we’re gonna do today! Aims: Use formatted printing. Use the “while” loop. Understand functions. Objectives: All: Understand and.
WS-Freefluo-VLAM Tutorial Spiros Koulouzis. Outline Get the WS-Freefluo-VLAM (GUI) Start the WS-Freefluo-VLAM (GUI) Generate modules from WSDL Compose.
IBM Software Group ® Context-Sensitive Help with the DITA Open Toolkit Jeff Antley IBM October 4, 2007.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
(1) Unit Testing and Test Planning CS2110: SW Development Methods These slides design for use in lab. They supplement more complete slides used in lecture.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Topic 1 Object Oriented Programming. 1-2 Objectives To review the concepts and terminology of object-oriented programming To discuss some features of.
1 JAVA API & Packages Starring: Java Documentation Co-Starring: BlueJ IDE.
1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.
A brief introduction to javadoc and doxygen. What’s in a program file? 1. Comments 2. Code.
Doxygen Documentation
Page 1 – Autumn 2009Steffen Vissing Andersen SDJ I1, Autumn 2009 Agenda: Java API Documentation Code Documenting (in javadoc format) Debugging.
Surya Bahadur Kathayat Outline  Ramses  Installing Ramses  Ramses Perspective (Views and Editors)  Importing/Exporting Example.
Chapter 4 Grouping Objects. Flexible Sized Collections  When writing a program, we often need to be able to group objects into collections  It is typical.
Introduction To Greenfoot
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
Advanced Task Engine Doing Cool Stuff with Cool stuff!
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
Chapter 7- Defining Your Own Classes Part 2 : Objectives After you have read and studied this chapter, you should be able to –Describe how objects are.
Today Javadoc. Packages and static import. Viewing API source code. Upcoming Topics: –protected access modifier –Using the debugger in Eclipse –JUnit testing.
Click to edit Master text styles Stacks Data Structure.
Unity Application Generator How Can I… Export variables of a Control module with all parameters, modify the some of the parameters like Initial values.
E Copyright © 2006, Oracle. All rights reserved. Using SQL Developer.
Solvency II Tripartite template V2 and V3 Presentation of the conversion tools proposed by FundsXML France.
UAB Requirements for 2016 Ivan Prieto Barreiro 18/04/2016 UAB Requirements for
Business rules.
Microsoft Office Access 2010 Lab 1
The need for Programming Languages
LOCO Extract – Transform - Load
Introduction to javadoc
Operation System Program 4
Hardware Hash Quality Assurance Tool V2
Most Common Grading Issues
Introduction to javadoc
Java IDE Dwight Deugo Nesa Matic Portions of the notes for this lecture include excerpts from.
Presentation transcript:

CICWSD: programming guide Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov

Contents What is CICWSD? Quick Start Adding CICWSD into your code TestSet class Test class Input class AmbiguousWord class WSDAlgorithm class Decision class Condition class Pruning class Contact information and citation

What is CICWSD? CICWSD is a Java API and command for word sense disambiguation. Its main features are: It has included some state-of-the-art WSD dictionary-based algorithms for you to use. Easy configuration of many parameters such as window size, number of senses retrieved from the dictionary, back-off method, tie solving method and conditions for retrieving window words. Easy configuration on a single XML file. Output is generated in a simple XLS file by using JExcelApi. The API is licensed under the GNU General Public License (v2 or later). Source is included. Senseval 2 and Senseval 3 English-All-Words task are bundled together within CICWSD.

Quick Start Download CICWSD from http://fviveros.gelbukh.com/downloads/CICWSD- 1.0.zip Unzip files Open a command line Change the current directory to the CICWSD directory Edit the current configuration file: config.xml Execute java –jar cicwsd.jar. You should see something like this:

Adding CICWSD to your code You need to add the following two jar libraries into your classpath for using CICWSD into your code: cicwsd.jar: Contains the disambiguation library. CICWN/cicwn.jar: Contains the WordNet connector. The API documentation is placed in the corresponding doc folder of each jar library. Please use these documents for a more detailed reference.

TestSet class The TestSet class is the main entrance point of CICWSD. It loads an XML config file, instances all the proper objects for conducting the experiments, and, save the test results in Excel files. Here is a sample Java code snippet: import cic.wsd.testing.TestSet; … TestSet.runTests(“config.xml”); After running the runTests function the excel files will be generated

Test class (1) If you want to have more control of the results, the Test class if what you are looking for. The Test class runs an algorithm over a document set, retrieving all the algorithm answers. Here is a sample code snippet: import cic.wsd.testing.Test; import cic.wordnet.WordNet; … //Initialize WordNet connector WordNet.setPath("CICWN/"); WordNet.loadDataBase(KNSources); //Initialize the Test object Test t=new Test(docList, WSD, 4, backoff, testsetName, tie, conditionList, retrievedSenses, KNSources); //Generate the answers for each word in each target document ArrayList<ArrayList<Decision>> decisions=t.run();

Test class (2) Let us explain the code a little bit. WordNet.setPath("CICWN/"); WordNet.loadDataBase(KNSources); The previous two code lines load WordNet 3.1 dictionary. setPath(path) tells the connector where WordNet lexicography files are placed. loadDataBase(KNSources) create the bag of words for all senses by using the specified knowledge sources. Valid KNsources values are: WNGlosses: Definitions extracted from WordNet 3.1 WNSamples: Samples extracted from WordNet 3.1 SemCor: SemCor corpus You can combine these sources, I.E.: "WNGlosses;WNSamples“ “SemCor” “WNGlosses;SemCor"

Test class (3) Test constructor has the following arguments: Test t=new Test(docList, WSD, 4, backoff, testsetName, tie, conditionList, retrievedSenses, KNSources); docList: an ArrayList<Input> containing the loaded test set. WSD: an instance of any WSD algorithm found in the disambiguation package. Window Size: a number specifying how many words are going to be retrieved from the context. backoff: an instance of any WSD algorithm found in the disambiguation package that is going to be used as back-off strategy. It accepts null. testsetName: Name of the test set you are solving. If you do not want to set a name you can simply use the path. tie: an instance of any WSD algorithm found in the disambiguation package that is going to be used as tie solving strategy (i.e., when the algorithm return more than a single answer). It accepts null.

Test class (4) conditionList: an ArrayList<Condition> containing the filters for retrieving context words. You can use an empty ArrayList for avoid using filters, like this new ArrayList<Condition>. RetrievedSenses: a String specifying which senses are going to be retrieved from the dictionary. The valid values are: "All": Read all senses. "+N": Read the first N senses. "*N": Read only the Nth sense. "-N": Exclude the Nth sense For example the word newspaper have seven senses in WordNet. The following table shows which senses are going to be loaded: KNSources: is the same String as specified in WordNet.loadDataBase(KNsources). All +2 *2 -2 Loaded sense set (1,2,3,4,5,6,7) (1,2) (2) (1,3,4,5,6,7)

Test class (5) Test.run() method generates an ArrayList<ArrayList<Decision>> containing all the decisions made by the algorithm in each word of each target document. I.E. For the following code snippet: ArrayList<ArrayList<Decision>> decisions=t.run(); If you run decisions.get(0) you will retrieve an ArrayList<Decision> corresponding to the first loaded document. decisions.get(1) you will retrieve an ArrayList<Decision> for the second loaded document, and so on. decisions.get(0).get(0) you will retrieve the Decision made for the first word of the first loaded document.

Input class (1) Input class allows loading a XML SemCor formatted document. The following code snippet illustrates how to use the Input class: import cic.wsd.semcor.Input; import java.io.File; … Input I=new Input(new File(file), pruningList); You can easily load a folder containing only XML SemCor formatted by using the following code snippet: import cic.wordnet.WordNet; Import java.util.ArrayList; ArrayList<Input> testset=new ArrayList<Input>(); for(File f:WordNet.getAllFiles(new File(“folderPath”))); testset.add(new Input(new File(file), pruningList));

Input class (2) The pruningList is an ArrayList<Pruning> containing filters for removing senses from words. If you do not want to filter any senses just use an empty ArrayList like this new ArrayList<Pruning>(). A method that you can find useful is Input.getAmbiguousWords(). This methods returns an ArrayList<AmbiguousWords> containing all the open-class words of the document.

AmbiguousWord class AmbiguousWord class contains an open-class word, its possible senses and its correct sense(s). Class attributes are: correctSenses: senses marked as answers in the wnsn attribute in the SemCor file. idf: inverse document frequency from this word calculated by using dictionary as a corpus. index: the position of this word in the current document. lemma: A valid WordNet lemma corresponding to this word. pos: This lemma part of speech tag. senses: Senses for the lemma retrieved from WordNet. tf: term frequency of this lemma. You can access them through its corresponding get methods.

WSDAlgorithm class (1) WSDAlgorithm class is a generic template for creating your own implementation of a WSDAlgorithm. CICWSD currently contains five algorithms. Here is a brief explanation for creating your own algorithm: Your constructor should call to super(), have no arguments and set this.name equals to the name of your algorithm. You must implement the disambiguate(AmbiguousWord target, ArrayList<AmbiguousWord> window) method. This method returns a Decision and uses the target word and some words extracted from the context. Window is retrieved by extracting an equal number of words from the right and left of the target word. If you set some window filters (Condition objects) in the Test object some context words will be excluded.

WSDAlgorithm class (2) The parameters will be loaded automatically through the setParams(String) method. If you algorithm uses parameters you can get the values by retrieving them through the ArrayList<KeyString> param array. The parameters should be specified in the config file like this: <algorithm disambiguation=“yourWSD;P1:value1,..,PN:valueN” … where the Ps are the parameter names and the values are the parameter values. KeyString is an object that contain the parameter name and its String value. If you want, you can override the solve method for getting a more detailed control on how your algorithm is going to work. You can use the original solve implementation source code as reference.

Decision class (1) Decision stores an answer of a WSDAlgorithm. The following code snippet illustrates how to use of the decision class in your own disambiguate method: import cic.wsd.testing.Decision; … //First, instantiate a decision for the current word with the current window. Decision d= new Decision(targetWord, window); //Set weight for all senses using the logic that you want. //decisionWords is an ArrayList<String> containing the words you algorithm used for incrementing the weight of this sense. d.setSense(senseNumber, weight, decisionWords); //Calculate answer before returning it d.calculateAnswer();

Decision class (2) You should not worry about the tie and the back off methods. The solve method will take charge of it for you. If you want to use the answers you have access to them through three functions: int getAnswers(): Returns the int indexes of the senses selected by the algorithm. String getAnswerStrings(): Returns "( [answer1 [,answerN]*]* )". double getScore(): Returns this decision's score following the Senseval score system. You can always retrieve the target word by using the getTarget() method.

Condition class Condition class is a template for creating filters for retrieving context words. There are currently five filters defined in CICWSD. If you want to create your own filter this is what you have to do: Your constructor should call to super(), have no arguments and set this.name equals to the name of your filter. Parameters values are stored in ArrayList<String> parameters. The parameters should be specified in the config file like this: <condition type=“yourCondition:P1,..,PN”/> where P are the parameter values you want to set. You must implement the boolean satisfiesCondition(AmbiguousWord target, AmbiguousWord possibleWord, window), where target is the target word, possibleWord is the word you want to add to the window, and window are the selected context words. You should return true if possibleWord should be included in the window.

Pruning class Pruning class is a generic template for pruning methods. Pruning methods are used for removing senses from words. These are used for testing the effects of disambiguating when using fewer senses. Note that pruning is not the same as clustering. You should do the following for creating your own pruning method: Your constructor should call to super(), have no arguments and set this.name equals to the name of your method. Parameters values are stored in ArrayList<String> parameters. The parameters are treated in the same way as in the Condition object. Override the prune(target) method. This method must remove senses from the target word.

Contact information For any question regarding the CICWSD API please contact Francisco Viveros-Jiménez by email (pacovj@hotmail.com) or Skype (pacovj). Please cite the following paper in your work: Viveros-Jiménez, F., Gelbukh, A., Sidorov, G.: Improving Simplified Lesk Algorithm by using simple window selection practices. Submitted.

References (1) Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proc. of SIGDOC-86: 5th International Conference on Systems Documentation, Toronto, Canada. Rada R, Mill H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets, in IEEE Transactions on Systems, Man and Cybernetics, vol. 19, no. 1, pp 17-30. Miller G (1995) WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41. Agirre E, Rigau G (1996) Word Sense Disambiguation using Conceptual Density Proceedings of COLING'96, 16-22. Copenhagen (Denmark). Kilgarriff A (1997) I don't believe in word senses. Computers and the Humanities. 31(2), pp. 91–113.

References (2) Edmonds P (2000) Designing a task for SENSEVAL-2. Tech. note. University of Brighton, Brighton. U.K. Kilgarriff A, Rosenzweig J (2000) English Framework and Results Computers and the Humanities 34 (1-2), Special Issue on SENSEVAL. Toutanova K, Manning C D (2000) Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70. Cotton S, Edmonds P, Kilgarriff A, Palmer M (2001) “SENSEVAL-2.” Second International Workshop on Evaluating Word Sense Disambiguation Systems. SIGLEX Workshop, ACL03. Toulouse, France. Mihalcea R, Edmons P (2004) Senseval-3 Third International Workshop on Evaluating of Systems for the Semantic Analysis of Text. Association for Computational Linguistics. ACL 04. Barcelona, Spain.

References (3) Vasilescu F, Langlais P, Lapalme G (2004) Evaluating Variants of the Lesk Approach for Disambiguating Words. LREC, Portugal. Mihalcea R (2006) Knowledge Based Methods for Word Sense Disambiguation, book chapter in Word Sense Disambiguation: Algorithms, Applications, and Trends, Editors Phil Edmonds and Eneko Agirre, Kluwer. Navigli R, Litkowski K, Hargraves O (2007) SemEval-2007 Task 07: Coarse-Grained English All-Words Task. Proc. of Semeval-2007 Workshop (SemEval), in the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic. Sinha R, Mihalcea R (2007) Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity, in Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2007), Irvine, CA. Navigli R (2009) Word Sense Disambiguation: a Survey. ACM Computing Surveys, 41(2), ACM Press, pp. 1-69.