Download presentation
Presentation is loading. Please wait.
Published byGeorgiana Griffith Modified over 9 years ago
1
EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007
2
Discussion Topics… Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings
3
Biocomplexity Thesaurus http://thesaurus.nbii.gov http://thesaurus.nbii.gov
4
EIONET GEMET Thesaurus http://www.eionet.europa.eu/gemet/webservices?langcode=en
5
NBII/EIONET Thesaurus Web-service 1 Background - collaboration through Ecoinformatics TWG Primary Goal – access distributed multi-lingual thesauri Results – SKOS web-service & client
6
Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed http://thesaurus.nbii.gov
7
Demo Client
8
Initial Challenges Identified Thesaurus scope, intent, purpose, and coverage is different NBII = sub-discipline of environment Endangered species Broader Terms:Species, Special status species, TaxaSpeciesSpecial status species Taxa EIOINET = broad environment Broader Terms:environmental protectionenvironmental protection
9
Current State Users Most aren’t aware of the underlying vocabulary Vocabulary are often unique to organization and more for “categorization” than retrieval Goal Include all Vocabularies and let Search Engine handle results
10
Demonstration Search Retrieval Created a demonstration datasets NBII Cataloged Resources ~30,000 web-sites, publications, images, maps, etc. Xml structured data – controlled subject NBII FGDC Metadata ~22,000 resources on research studies 150-200 elements Semi-structured with no controlled vocabulary
11
NBII Catalog Records Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers
12
NBII Metadata CH
13
Process Added thesaurus capabilities to Development Search Engine for: NBII Thesaurus EIONET GEMET Thesaurus Used BT, RT, NT relationships & weighting Performed sample queries within the test repositories for: No thesaurus GEMET only aided searching NBII only aided searching GEMET+NBII aided searching (X)
14
Test Repository 1 NBII Resource Catalog (Dublin Core)
15
No Thesauri – “invasive species”
16
NBII Thesaurus – “invasive species”
17
GEMET Thesaurus – “invasive species”
18
No Thesauri – “Endangered Species”
19
NBII Thesaurus – “endangered species”
20
GEMET Only – “endangered species”
21
No Thesaurus – “rare species”
22
NBII Thesaurus – “rare species”
23
GEMET Thesaurus – “rare species”
24
GEMET Thesaurus – “rare species” (expanded degrees of relevance)
25
No Thesauri – “protected species”
26
NBII Thesaurus – “protected species”
27
GEMET Thesaurus – “protected species”
28
Results – NBII Catalog Resources termNoneNBIIGEMET “invasive species” 2487108022487 “endangered species” 161235321619 “rare species” “rare species” (expanded) 2497186290 5847 “”protected species” 20323451664
29
Results – NBII Resource Catalog
30
Test Repository 2 NBII FGDC Metadata
31
Sample Queries – No vocabularies Metadata CH “ invasive species”
32
Sample Queries – NBII only Metadata CH “ invasive species”
33
Sample Queries – GEMET only Metadata CH “ invasive species”
34
Sample Queries – No vocabularies Metadata CH “endangered species”
35
Sample Queries – NBII only Metadata CH “endangered species”
36
Sample Queries – GEMET only Metadata CH “ endangered species”
37
No Thesauri – Metadata CH “rare species”
38
NBII Thesaurus – Metadata CH “rare species”
39
GEMET Thesaurus – Metadata CH “rare species”
40
Sample Queries – No vocabularies Metadata CH “protected species”
41
Sample Queries – NBII only Metadata CH “ protected species”
42
Sample Queries – GEMET only Metadata CH “ protected species”
43
Results – FGDC Metadata termNoneNBIIGEMET “invasive species” 302 7884302 “endangered species” 100826901019 “rare species”59425964 “protected species” 1121521011
44
Results – NBII Resource Catalog
45
Overall Results General Findings Assumption that a Thesaurus improves “number” of results is valid Degree does vary by the term and mappings Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results
46
Overall Results Using only GEMET Terminology Terms not included in the NBII thesaurus that were in GEMET improved search results GEMET strength of broad coverage aided searches In General for the Metadata repository Results varied somewhat, but often same top 10 results
47
Overall Results General Findings With “No thesaurus” test results produced poorer #1 results Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)
48
Issues “integrating” multi-scope and purpose thesauri presents challenges: Can’t turn the effort into a thesaurus project Degrees of relevance of terms is an issue Concept matching or different intent Differing classification (RT vs. NT) across thesauri Differing “weighting” algorithms
49
Further Study Options 1.) Take multiple thesauri “as is” 2.) Do some “attempted” concept matching i.e. “endangered animal species” – “endangered animal” 3.) If not match is present, add term and relationship as is 4.) Obtain terms from XMDR
50
Further Study Options – cont. Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated thesaurus as compared to>>>>>>> Repeat queries with machine integration Complete By June
51
Questions, Comments,
52
GEMET Control file endangered species,category of endangered species[.2],endangered animal species[0.8],endangered plant species[0.8] protected species,category of endangered species[0.2],endangered species [0.2] rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.