1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders.

Presentation on theme: "1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders."— Presentation transcript:

1 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders Friis-Christensen European Commission, DG Joint Research Centre Institute for Environment and Sustainability Spatial Data Infrastructures Unit TP 262, Ispra (VA), Italy

2 2 EnviroInfo 2006, 05/09/06 Graz The mission of the JRC is to provide customer-driven scientific and technical support for the conception, development, implementation and monitoring of EU policies. As a service of the European Commission, the JRC functions as a reference centre of science and technology for the Union. Close to the policy-making process, it serves the common interest of the Member States, while being independent of special interests, whether private or national. JRCs Mission

3 3 EnviroInfo 2006, 05/09/06 Graz Outline Introduction Objectives of the study Approach Results Conclusions

4 4 EnviroInfo 2006, 05/09/06 Graz GI Policy GI standards Spatial Information Services Fundamental GI data sets Introduction – components of a European SDI

5 5 EnviroInfo 2006, 05/09/06 Graz Introduction Metadata and discovery services are key components of SDI Multilingualism important

6 6 EnviroInfo 2006, 05/09/06 Graz Introduction INSPIRE requirements metadata* spatial data sets and spatial data services* network services* –EU geo-portal access and rights of use for Community institutions and bodies** monitoring and reporting mechanisms** process and procedures * technical: under JRC responsibility ** legal/procedural: under Eurostat responsibility

7 7 EnviroInfo 2006, 05/09/06 Graz Introduction European interoperability framework for pan- European eGovernment servicesEuropean interoperability framework for pan- European eGovernment services Recommendations related to multilingualism, e.g.,Recommendations related to multilingualism, e.g., –For the Pan-European services provided via portals, the top-level EU portal interface should be fully multilingual, the second-level pages (introductory texts and the descriptions of links) should be offered in the official languages and the external links and related pages on the national websites should be available in at least one other language (for example English) in addition to the national language(s).

8 EcoInformatics meeting, 17/01/06 Ispra Introduction Issues on Multilingualism identified by the INSPIRE DT on Network Services –only mentioned in the context of the interoperability of spatial data sets and services for key attributes and corresponding multilingual thesauri –Granularity: should the list of available languages be a service feature or at the data set or even at the feature attribute level ? –Metadata/Data: should only metadata be multilingual or datasets as well ? –Attributes label versus Attribute value: Should only attributes label be multilingual or should the attribute values be as well multilingual?

9 EcoInformatics meeting, 17/01/06 Ispra Introduction

10 10 EnviroInfo 2006, 05/09/06 Graz Outline Introduction Objectives of the study Approach Results Conclusions

11 11 EnviroInfo 2006, 05/09/06 Graz Objective of the study Focus on discovery of resources Answer question: –Is, from a technical point of view, a common ontology or thesaurus desirable and feasible for multi-lingual resource discovery in a European Spatial Data Infrastructure?

12 12 EnviroInfo 2006, 05/09/06 Graz Outline Introduction Objectives of the study Approach Results Conclusions

13 13 EnviroInfo 2006, 05/09/06 Graz Approach Implement and extend work of H. Chen, et al., "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project," IEEE Transactions on Pattern Analysis and Machine Intelligence vol. 18 pp. 771-782, 1996. Integrate thesauri, vocabularies and gazetteers in resource discovery Experiments P. Smits, A. Friis-Christensen, Resource Discovery in a European Spatial Data Infrastructure. IEEE Transactions on Knowledge and Data Engineering (accepted for publication)

14 14 EnviroInfo 2006, 05/09/06 Graz Approach What is a Concept Space? Simply put: –An index of all concepts existing in a metadata repository –With numerical relationships defined between any two concepts –To be queried by associative retrieval

15 15 EnviroInfo 2006, 05/09/06 Graz Two-step approach –Creation of multi- lingual concept space –Associative retrieval based on a neural network H. Chen, B. Schatz, T. Ng, J. Martinez, A. Kirchhoff, C. Lin, A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois digital library initiative project. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, No. 8, August 1996, pp. 771-782. Approach

16 16 EnviroInfo 2006, 05/09/06 Graz Approach Creation of the multi-lingual concept space –Collection of resource descriptors –Object filtering and indexing identify those concepts and terms that we already have in our human-created ontology which includes any thesauri and vocabulary to filter out any irrelevant terms like stop words in order to improve performance to store any remaining terms in the concept space

17 17 EnviroInfo 2006, 05/09/06 Graz Approach - Associative query Initialize the associative retrieval –The neural network is initialized at query time by assigning initial membership values to the units of the neural network = concepts in the Concept Space Terms in the concept space that match exactly a query term: 1 Partial matches get membership value < 1 Terms that do not match the query: 0

18 18 EnviroInfo 2006, 05/09/06 Graz Approach - Associative query Initialize the associative retrieval Query: soil Soil, bodem 1 Sub-surface information 0 0 Situation at t=0 Wij = 0 Wij = 0.7

19 19 EnviroInfo 2006, 05/09/06 Graz Approach - Associative query Iterate though the neural network Soil, bodem 1 Sub-surface information 0 0 Situation at t=0 Wij = 0 Wij = 0.7 Soil, bodem 1 Sub-surface information 0.7 0 Situation at t=1 Wij = 0 Wij = 0.7

20 20 EnviroInfo 2006, 05/09/06 Graz Approach - Associative query Link membership values of concepts to resource descriptors Soil, bodem 1 Sub-surface information 0.7 0 Situation at t=1 Wij = 0 Wij = 0.7 Membership > threshold? Use index to find resources that contain the concept Order found resources in order of relevance, based on membership values

21 21 EnviroInfo 2006, 05/09/06 Graz Outline Introduction Objectives of the study Approach Results Conclusions

22 22 EnviroInfo 2006, 05/09/06 Graz

23 23 EnviroInfo 2006, 05/09/06 Graz Results Creating the metadata repository

24 24 EnviroInfo 2006, 05/09/06 Graz Results

25 25 EnviroInfo 2006, 05/09/06 Graz Results

26 26 EnviroInfo 2006, 05/09/06 Graz Results Query computationally expensive queryRemark Time required for four iterations of neural network (600 MHz, 512 MB RAM) soil (eng)Query term found in the concept space (GEMET 2001.1 concept no. 7843) 16.1 s. infrastructuur (nld)Query term not literally defined in the concept space or ontology. 27.8 s.

27 27 EnviroInfo 2006, 05/09/06 Graz Outline Introduction Objectives of the study Approach Results Conclusions

28 28 EnviroInfo 2006, 05/09/06 Graz Conclusions from the study It will be impractical to rely only on one common ontology for resource discovery in a European SDI The approach of using human-created ontologies in combination with automatic concept space generation and associative retrieval is a powerful means to the discovery of geospatial resources. Proposed approach is useful and merits further investigation and development The importance of structured information, using metadata standards, is underlined by our study and is also a basic assumption of our work.

