Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.

Similar presentations


Presentation on theme: "Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool."— Presentation transcript:

1 Advanced Information Systems Laboratory http://iaaa.cps.unizar.es Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool for thesauri management, browsing and supporting advanced searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria Münster, 26-27 June 2003

2 15-ene-152 Contents  Introduction  Architecture of THManager application  Basic capabilities  Enhanced capabilities  Conclusions

3 15-ene-153 Introduction to thesauri  „ A thesaurus is a set of terms that describe the vocabulary of a controlled indexing language, formally organized so that the a priori relationships between concepts (for example synonymous terms, broader terms, narrower terms and related terms) are made explicit“ [ISO 2788]  Used to improve the precision and recall of information retrieval in digital libraries  provide a uniform and consistent vocabulary for indexing metadata ("description of the data holdings“)  supply users with a suitable vocabulary for the retrieval.  expansion of users queries by automatically adding new terms to the query

4 15-ene-154 Introduction to thesauri  A thesaurus management tool becomes a vital component in the development of any kind of digital library  One of the main objectives of Spatial Data Infrastructures is to provide the discovery, evaluation and access to spatial data for a community of users.  an SDI can be considered as digital library specialised in geographic information resources.  A thesaurus management tool will be also a vital component for the development of SDIs.

5 15-ene-155 Level 3. Application Level 2. GUI Level 1. Model Level 0. Database Thesaurus management Import/export Thesaurus.model Keywords expansion Keywords Thesaurus -100% SQL (basic) -Oracle IntermediaText (enhanced) WordNet files Metadata records Thesaurus.gui Generic GUI components for thesauri visualization Architecture of THManager application Lexicon WordNetPolisemy Polisemy extraction Branch disambiguation ThesaurusMngmt ThManager basic enhanced >

6 15-ene-156 Basic Capabilities  Edition of thesauri according to ISO norms  Broader (BT), narrrower terms (NT)  Related terms (RT), preferred terms (PT)  Scope notes (SN), Synonyms (SYN,USE)  Language translations (TR)  Visualization of thesauri  Hierarchical, alphabetical  Search of terms  Multilingual access support  Browsing according to the language selected by users  Import/Export  Text file proprietary formats

7 15-ene-157 Browsing /Edition

8 15-ene-158 Import/export formats  Formats  Dot based notation  sucession of narrower terms + additional relationships (SYN,TR,...)  Hierarchical Numbering of terms  It should use more standardized formats:  RDFS/XML,...

9 15-ene-159 Enhanced capabilities  Thesauri are intended for the homogeneous classification of resources  They are used to fill metadata keywords  However, there is still heterogeneity in metadata keywords  Metadata creators use different thesauri in different application domains  If metadata catalogs provide access to general public  Queries may not contain same terms as keywords in metadata records  A possible solution to fill the semantic gap  Disambiguation of thesauri (and queries) in relation with the concepts of an upper level ontology

10 15-ene-1510 Enhanced capabilities  Additional tools around semantic disambiguation  Browsing WordNet as another thesaurus  Searching polysemic senses in WordNet  Thesauri disambiguation  Automatic Expansion of Keywords Other knowledge representation models Thesaurus 1 Thesaurus 2 Thesaurus N Controlled list 1 Controlled list 2 Controlled list N WordNet

11 15-ene-1511 Browsing WordNet  WordNet is structured in a hierarchy of synsets  Synsets are defined as set of synonyms representing a particular concept (sense)  WordNet libraries and files are accessed by JNI

12 15-ene-1512 Searching polysemic senses in WordNet  Functionality provided by Polisemy package  Compound terms are partioned if no synset is found  If adjectives found, associated nouns are also searched to reduce number of not-found words

13 15-ene-1513 Thesauri Disambiguation  Unsupervised disambiguation method  The senses of every thesaurus term are searched in WordNet.  The hierarchical structure of the thesaurus is used as the word context for a voting algorithm to find the closest sense  Thesauri are partitioned into branches (trees formed by BT/NT terms whose root has no BT) accident source environmental accident major accident traffic accident work accident technological accident shipping accident nuclear accident core meltdown oil sick accident explosion leakage administration...

14 15-ene-1514 Thesauri Disambiguation II  Voting algorithm to obtain the disambiguated synset of a term a  Every synset s associated to the rest of terms in the branch votes (proximity weight) for the synsets of term “a”  Main weight: number of subsummers in WordNet hierarchy  Matches in WordNet hierarchy of ancestors  Discounting factors:  Synset depth  Branch distance  Polisemy of term associated with synset “s”

15 15-ene-1515 Thesauri disambiguation III Annotation of disambiguated synsets

16 15-ene-1516 Automatic expansion of keywords with new disambiguated thesauri Comparison between the initial collection of synsets and the synsets of a new term

17 15-ene-1517 Expansion of keywords II

18 15-ene-1518 Conclusions & future lines  ThManager is a flexible tool to manage thesauri  It provides enhanced functionality for the improvement of classifications.  This tool can be easily integrated in other tools  It is used by a metadata edition tool (also presented here) to select the appropriate term for the distinct metadata fields.  Future lines:  Creation of a thesaurus Web Service providing some of the functionality offered by this tool.  thesaurus browsing, WordNet polysemy extraction, keywords expansion,...  Concept based retrieval  Exploit the semantic disambiguation of thesauri to test different information retrieval strategies for geographic data catalogs.  It is possible to index metadata records according to a unified system: the disambiguated WordNet synsets

19 15-ene-1519 Advanced Information Systems Laboratory http://iaaa.cps.unizar.es


Download ppt "Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool."

Similar presentations


Ads by Google