Metadata Schema for CERIF Andrei Lopatenko Vienna University of Technology
What we have now SGML DTD to describe CERIF data (old version of CERIF) SGML is used for data exchange between national institutions and ERGO SGML DTD is only for old version of CERIF (projects) Strictly defined structure and semantic of elements
What we need Metadata format to describe the CERIF-2000 data (with new entities, attributes) Due to diversity of data descriptions in different countries, institutions it should be possible to extend schema with expressing meaning of new elements
Possible solution Semantic Web – RDF (Resource Description Framework) to encode data, DAML + OIL (DARPA Agent Markup Language + Ontology Inference Layer) to express semantic of classes and attributes
Advantages The direct way to Knowledge Management solution The possible way to solve problems of different vocabularies, classifications. Ready to work in heterogeneous distributed environment Easy to implement contrasting to KIF/KQML, Description Logic solutions
Advantages XML experience can be utilized for development SW solutions XML compatibility makes solution close to industry solutions Semantic richness of SW makes possible to developed advanced information retrieval over SW encoded data Already developed tools can be applied
Disadvantages XML experience is not enough. Developed should be taught to SW Not so powerful as complete Description Logic solutions Not so efficient on huge volumes of data as traditional database technologies (replication)
DAML + OIL Allows to describe hierarchical relations between classes of data Allows to specify classes (create vocabulary!) of data using slot restrictions Example: “Workshop” is “Event” “EU project” is a “Project”, which value of attribute “funding organization” is an object of class “European Funding Organization”
DAML + OIL Distributed ontologies My (AURIS-MM) project is a subClassOf CERIF:Project. Tools for ontology checking (Description Logics, CLOS based theory for DAML ) Tools for ontology development Tools for ontology visualization
DAML + OIL Advanced information retrieval solutions Implemented and tested Projects: EU Projects (On-To-Knowledge, KA3:IAF ), DARPa project CAKE, WebScript, DAML Services, Knowledge Creation tools for DAML, ASCS, etc See, derpi.tuwien.ac.at/~andrei/DAML.htmwww.cordis.luwww.darpa.mil
DAML + OIL Developed the first version of ontology dc-mn.daml dc-mn.daml Mapping (as a subclass relations and axioms) to other well-known schemas (DublinCore and MathNet) Tested for simple information retrieval operations (but including semantic information)
DAML + OIL example of schema CERIF.Workshop 16:19:
DAML + OIL Easy creation of custom vocabularies based on shared vocabularies Easy specification of which classes (multiple classes possible) instantiate given object
DAML + OIL Example: Publications database: classes for researchers: Dissertation, Conference article, Journal article, Journal with evaluations, Patent Classes for university administration: Class A (score 2): International Patent, Class B (score 1): Journal Article in International journal which is Journal with Evaluation
DAML + OIL Created hierarchy of slots what makes information retrieval more clear and easy to implement Example: full-text search operations based on “full-text description” slot (attribute) project_abstract, project_title, project_desription are subslots of “full-text description” If new slot added “project_last_year_summary” to include it nto full text search it would be enough tp specify it as a subslot of “full-text description”
DAML + OIL Example of class hierarchy: from extended CERIF
RDF DAML + OIL specifies schema. Also possible to encode data (“instances”) in DAML For EuroCRIS we propose use RDF as encoding format RDF description should be consistent with DAML + OIL Schema
RDF Developed a toolset to export/import data CERIF database CERIF RDF Toolset to query CERIF RDF data (now very simple information retrieval operation but distributed and with semantic) Toolset to get data from CERIF RDF and put into Prolog knowledge base is beeing developed
Current work RDF version of CERIF Knowledge Management solution for research but data store is RDF New advanced information retrieval possibilities for CERIF
Proposal For testing try to use DAML + OIL and RDF for data sharing and distributed retrieval operation between different EuroCRIS organization Create and deploy advanced IR solution based on CERIF RDF and compatible with any CERIF database. Make it free and a par of CERIF implementation