Download presentation
Presentation is loading. Please wait.
Published byBenedict Osborne Modified over 9 years ago
1
Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation of RAS Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com) Jönköping University, Sweden
2
2 Contents Wiki and Wiktionary intro MRD, parser and Wiktionaries comparison Correlation of relatedness measures Experiment scheme Result and comparison Results, applications and future
3
Goal Is it possible to find related terms by the current version of Wiktionary as successfully as by WordNet? for ontology matching, for application in text search systems, etc. What advantages?
4
4 Wiki-resources Distributed users and authors (edit pages) Centralized storage (e.g. MySQL, Apache, PHP) Set of hyper linked articles Each article has one or more categories (tree) * Example: http://en.wikipedia.org
5
Wiktionary is a free-content multilingual dictionary
6
6 Wiktionary data: +, -, simplicity & complexity −Different wiktionaries have different levels of standartization. −Fast growing data, but it’s created by a huge community (a developed parser should be very stable) +Rich data +thesaurus (synonyms, antonyms ) +phrase books +etymologies +pronunciations +sample quotations +translations +Fast growing data +Interwiki (add. data) +GNU DFL
7
7 Wiktionary machine- readable dictionary database scheme
8
Size of Wiktionaries WordNet (2006): 150,000 words, 115,000 synsets
10
A shortest path in Russian Wiktionary
11
Correlation of relatedness measures Correlation with human judgments of relatedness measures 353-TC to measures based on WordNet, English Wikipedia, Russian Wiktionary
12
Largest eight Wiktionary editions (March 2008)
13
Application of Machine- readable dictionary (MRD) Thesaurus data: Related Terms Search Search request extension (by synonyms) / request reformulation (in search systems) Request recognition in question-answering systems Word sense disambiguation Media data (audio + pictures) Language learning
14
Work plan: done and todo Russian Wiktionary Extraction (by RE) –Definition –Relations (synonyms…) –Translation –Audio –Graphics Database API Visualization (MRD browser) Quiz & tests (test application) Russian Wiktionary Database scheme –Definition –Relations (synonyms…) –Translation –Audio –Graphics Database API English Wiktionary
15
15 Implementation Software based on Synarcher code Java MySQL or SQLite database JUnit test framework
16
16 Results The scheme of the experiment for calculating the semantic relatedness measure based on Russian Wiktionary data The parser of Russian Wiktionary Database scheme designed Database API implemented in Java Compared the results of related terms search based on Wiktionary and WordNet Project site (Wiki tool kit) http://code.google.com/p/wikokit/
17
Future work Finish creation MRD Database and software Russian Wiktionary and English Wiktionary Visualization (JavaFX) MRD browser Quiz & tests (learning application) Online application (Java Web-start) asdf
18
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.