Mapping Cultural Heritage Information to CIDOC-CRM*

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Trondheim, August 21, Martin Doerr Trondheim August 21, 2003 FORTH, Greece Chair, CIDOC CRM Special Interest Group The CIDOC Conceptual Reference.
The Acquisition and Sharing of Domain Knowledge Contained in Software with a Compliant SIK Architecture by Prof. dr. Vasile AVRAM Academy of Economic Studies.
Mapping Memory Manager Use Case: Mapping the dFMRÖ coin database to CIDOC-CRM Martin Doerr, Maria Theodoridou Foundation for Research and Technology –
Chapter 1: The Database Environment
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Heraklion, April 2, Mapping a Data Structure to the CIDOC Conceptual Reference Model Martin Doerr (ICS-FORTH, Crete, Greece) Heraklion, Crete, April.
Chapter 1: The Database Environment
Chapter 1 1 © Prentice Hall, 2002 Database Design Dr. Bijoy Bordoloi Introduction to Database Processing.
ICS – FORTH, August 31, 2000 Why do we need an “Object Oriented Model” ? Martin Doerr Atlanta, August 31, 2000 Foundation for Research and Technology -
ICS-FORTH October 14, The CIDOC CRM, factor for the integration and presentation of cultural information Martin Doerr Foundation for Research and.
Idea-garden.org SOCIAL SEMANTIC INFORMATION SPACE An Interactive Learning Environment Fostering Creativity Grant agreement no: nd CIDOC CRM-SIG.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Harmonising without Harm: towards an object-oriented formulation of FRBR aligned on the CIDOC CRM ontology Maja Žumer (University of Ljubljana) & Patrick.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
A CIDOC CRM – compatible metadata model for digital preservation
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
Smithsonian, March 26, International Symposium “Sharing the Knowledge” Martin Doerr Smithsonian, Washington DC March 26, 2003 FORTH, Greece Chair,
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Data Provision and Aggregation Mapping Culture Semantically with CIDOC-CRM & 3M CRM SIG Maria Theodoridou Foundation for Research and Technology – Hellas.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
Methodological tips for mappings to CIDOC CRM Maria Theodoridou, George Bruseker, Maria Daskalaki, Martin Doerr FORTH - Institute of Computer Science {bruseker,
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
RSC Strategy Gordon Dunsire, Chair, RDA Steering Committee
CS 325 Spring ‘09 Chapter 1 Goals:
Databases (CS507) CHAPTER 2.
The Semantic Web By: Maulik Parikh.
RDA work plan: current and future activities
Object Management Group Information Management Metamodel
RDA, linked data, and development
From FRBR to FRBROO through CIDOC CRM…
Database Management:.
Relational Databases.
RDA, linked data, and development
Chapter 4 Relational Databases
IFLA FRBR-Library Reference Model and RDA
PDAP Query Language International Planetary Data Alliance
Chapter 2: Database System Concepts and Architecture
RDA, linked data, and update on development
The Re3gistry software and the INSPIRE Registry
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
File Systems and Databases
Metadata for research outputs management
Chapter 1: The Database Environment
2. An overview of SDMX (What is SDMX? Part I)
Analysis models and design models
Database Systems Instructor Name: Lecture-3.
Chapter 1: The Database Environment
The Database Environment
Metadata The metadata contains
The new RDA: resource description in libraries and beyond
Chapter 2 Database Environment Pearson Education © 2009.
The role of metadata in census data dissemination
Future directions for RDA
Modeling Properties of Properties in the CIDOC CRM RDF encoding
Chapter 2 Database Environment Pearson Education © 2009.
The Database Environment
Instructor Materials Chapter 5: Ensuring Integrity
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

Mapping Cultural Heritage Information to CIDOC-CRM* Maria Theodoridou Foundation for Research and Technology – Hellas Institute of Computer Science *Somewhat adjusted by C.-E. Ore

Overview X3ML An interface for sustainable management of data mapping process Use Case Mapping the dFMRÖ coin database to CIDOC-CRM http://139.91.183.3/3M/Login

Cultural Diversity and Data Standards Cultural information is more than a domain: Collection description (art, archeology, natural history….) Archives and literature (records, treaties, letters, artful works..) Administration, preservation, conservation of material heritage Science and scholarship – investigation, interpretation Presentation – exhibition making, teaching, publication But how to make a documentation standard? Each aspect needs its methods, forms, communication means Data overlap, but do not fit in one schema

“One model to rule them all” The CIDOC CRM The CIDOC Conceptual Reference Model A collaboration with the International Council of Museums An ontology of 86 classes and 137 properties for culture and more With the capacity to explain hundreds of (meta)data formats Accepted by ISO TC46 in September 2000 International standard since 2006 - ISO 21127:2006 Serving as: intellectual guide to create schemata, formats, profiles A language for analysis of existing sources for integration/mediation “Identify elements with common meaning” Transportation format for data integration / migration / Internet

What Means Mapping of One Schema to Another A sufficient specification for the transformation of each instance of a source schema into an instance of a target schema while preserving as much as possible its initial ‘meaning’ CIDOC-CRM Approach (target schema = CIDOC-CRM): interpretation of source schema as semantic model (nodes and links), mapping each element of that to an equivalent CIDOC-CRM path, such that each instance of an element of the source semantic model can be converted into a valid instance of the CIDOC-CRM with the same meaning.

Interpreting a Schema as Semantic Model 1. Interpreting tables, columns as entities 2. Interpreting records as entity instances 3. Interpreting fieldnames as relationships and entities 4. Interpreting field contents as entity instances Each field is interpreted as entity-relationship-entity (e-r-e) The whole schema is decomposed into e-r-e’s Each e-r-e is mapped individually to the CIDOC-CRM

X3ML X3ML is an XML based language which describes schema mappings in such a way that they can be collaboratively created and discussed by experts. Mappings have been done in very many custom ways in the past. In practice mappings are produced manually by Domain/IT experts: labor-intensive error prone time consuming Emphasis is on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.

X3ML: A Mapping Language X3ML is a declarative, XML based language which describes schema mappings in such a way that they can be collaboratively created and discussed by experts. Mappings have been done in very many custom ways in the past. In practice mappings are produced manually by Domain/IT experts: labor-intensive error prone time consuming Emphasis is on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.

X3ML toolkit the X3ML Toolkit is a set of small, open source, microservices that follow the SYNERGY Reference Model. They are designed with open interfaces and they can be easily customized and adapted to complex environments. The key components of the toolkit are: Mapping Memory Manager 3M Editor X3ML Engine FORTH’s open access service is found at: https://www.ics.forth.gr/isl/3M

3M : Mapping Memory Manager 3M is a tool for managing mapping definition files. It’s based on FIMS management system for the administration of the files and also on the 3MEditor for editing and viewing the files. It provides a number of administrative actions that assist the experts to manage their mapping definition files. The source code is open source available on github http://github.com/isl/Mapping-Memory-Manager

3M Editor: A Mapping Editor 3MEditor is a software that allows domain experts to build and discuss mappings with little resource to any particular software skills. It is the interface tool envisioned to allow domain experts to build mappings. It provides: Source and target agnostic mapping facility Guided mapping according to deployed ontology’s logic Comment and justification facility Mapping storage Separated instance generation practice for IT professionals The source code is open source available on github https://github.com/isl/3MEditor

X3ML engine: A Transformation Tool The X3ML Engine realizes the transformation of the source records to the target format. The engine takes as input the source data (currently in the form of an XML document), the description of the mappings in the X3ML mapping definition file and the URI generation policy file and is responsible for transforming the source document into a valid RDF document which corresponds to the input XML file, with respect to the given mappings and policy. The source code is open source available on github https://github.com/isl/x3ml

URI generation specification X3ML Workflow Domain Experts IT Experts CIDOC-CRM Schema Matching URI generation specification Terminology Mapping DB2 Schema Matching Definition file DB2 DB1

URI generation specification X3ML Workflow Domain Experts IT Experts CIDOC-CRM Schema Matching URI generation specification Terminology Mapping DB2 Schema Matching Definition file DB2 DB1

dFMRÖ digitale FundMünzen der Römischen Zeit in Österreich Austrian Academy of Sciences Numismatic Commission Klaus Vondrovec klaus.vondrovec@khm.at Access DB since 1999 MySQL DB online since 2007 http://www.oeaw.ac.at/numismatik/projekte/dfmroe/dfmroe.html Elmer: VO 27. Juni 1928, 70. Geburtstag Kubitscheks

Tables

Interpreting a Schema as Semantic Model, Example The field name stands for a relationship and the kind of contents The field contents stand for an entity instance Object 627 has ID: Identifier 627 <COIN> <ID>627</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>1980er Jahre</FIND_DATE> <AUTHORITY_ID>566</AUTHORITY_ID> <ISSUER_ID>536</ISSUER_ID> <DENOMINATION>30</DENOMINATION> <MINT_ID>244</MINT_ID> <OFFICINA>99</OFFICINA> <DATE_CA>0</DATE_CA> <DATE_FROM>-116</DATE_FROM> <DATE_TO>-115</DATE_TO> <DAT_VAL>1088408850</DAT_VAL> <WEIGHT>3.46</WEIGHT> <DIE_AXE>4</DIE_AXE> <STATUS_ID>1</STATUS_ID> <RV_LEG>CN.DOMI</RV_LEG> <RV_PIC>Iuppiter in Quadriga n. r. ..</RV_PIC> <ARCH_INFO>-</ARCH_INFO> <PH_NAME>000627</PH_NAME> <DAT_TXT>116 - 115 v. Chr.</DAT_TXT> </COIN> The whole record corresponds to one entity (data example from dFMRÖ)

Mapping the First Element: Creating an Equivalent Proposition Source Schema interpretation Source Domain: COIN Target Domain: E22 Man-Made Object CIDOC-CRM Schema maps to: Source Path: “has ID” Target Path: P1 is identified by Source Range: ID Target Range: E41 Appellation

Mapping the First Element: Instance valid for both schemata http://coin/627 Source Schema interpretation Source Domain: COIN Target Domain: E22 Man-Made Object CIDOC-CRM Schema Source Path: “has ID” Target Path: P1 is identified by Source Range: ID Target Range: E41 Appellation RDF encoding: <crm:E22_Man-Made_Object rdf:about="http://coin/627"> <crm:P1_is_identified_by> <crm:E41_Appellation rdf:about="http://id/627"/> </crm:P1_is_identified_by> </crm:E22_Man-Made_Object> XML export: <COIN> <ID>626</ID> </COIN> http://id/627

Mapping the First Element: X3ML specification <mappings> <mapping> <domain> <source_node>COIN</source_node> <target_node> <entity> <type>crm:E22_Man-Made_Object</type> <instance_generator name="UUID"/> </entity> </target_node> </domain> <link> …………………. </link> </mapping> </mappings>

Mapping the First Element: X3ML specification <link> <path> <source_relation><relation>ID</relation></source_relation> <target_relation> <relationship>crm:P1_is_identified_by</relationship> </target_relation> </path> <range> <source_node>ID</source_node> <target_node> <entity> <type>crm:E41_Appellation</type> <instance_generator name="UUID"/> </entity> </target_node> </range> </link>

Interpreting a Schema as Semantic Model, Example The field name stands for a relationship and the kind of contents The field contents stand for an entity instance Object 627 weights: 3.46 <COIN> <ID>627</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>1980er Jahre</FIND_DATE> <AUTHORITY_ID>566</AUTHORITY_ID> <ISSUER_ID>536</ISSUER_ID> <DENOMINATION>30</DENOMINATION> <MINT_ID>244</MINT_ID> <OFFICINA>99</OFFICINA> <DATE_CA>0</DATE_CA> <DATE_FROM>-116</DATE_FROM> <DATE_TO>-115</DATE_TO> <DAT_VAL>1088408850</DAT_VAL> <WEIGHT>3.46</WEIGHT> <DIE_AXE>4</DIE_AXE> <STATUS_ID>1</STATUS_ID> <RV_LEG>CN.DOMI</RV_LEG> <RV_PIC>Iuppiter in Quadriga n. r. ..</RV_PIC> <ARCH_INFO>-</ARCH_INFO> <PH_NAME>000627</PH_NAME> <DAT_TXT>116 - 115 v. Chr.</DAT_TXT> </COIN> Implicit information: Weight is measured in grams The whole record corresponds to one entity (data example from dFMRÖ)

Mapping to Paths: Introducing an intermediate node Source Domain: Coin Target Domain: E22 Man-Made Object P43 has dimension Source Path: weights Target Path: Intermediate Node: E54 Dimension P90 has value Source Range: WEIGHT Target Range: Literal

Mapping to Paths: Introducing an intermediate node Instance of source http://coin/627 Source Schema interpretation Target Domain: E22 Man-Made Object Instance of target Source Domain: COIN CIDOC-CRM Schema http://coin/627 P43 has dimension Source Path: weights Intermediate Node: E54 Dimension http://dim/d1 Source Range: WEIGHT P90 has value http://id/627 Target Range: Literal http://id/627

Mapping to Paths: Introducing an additional node Source Domain: Coin Target Domain: E22 Man-Made Object Constant Node: E58 Measurement Unit gr P43 has dimension Source Path: weights Target Path: Intermediate Node: E54 Dimension P91 has unit P2 has type ConstantNode: E55 Type P90 has value weight Source Range: WEIGHT Target Range: Literal

Mapping to Paths: Introducing intermediate & additional nodes Instance of source Instance of target http://coin/627 http://coin/627 RDF encoding: <crm:E22_Man-Made_Object rdf:about="http://coin/627"> <crm:P43_has_dimension> <crm:E54_Dimension rdf:about="http://dim/d1"> <crm:P90_has_value>3.46</crm:P90_has_value> <crm:P91_has_unit rdf:resource="http://www.oeaw.ac.at/MU/gr"/> <crm:P2_has_type rdf:resource="http://www.oeaw.ac.at/DIM/weight"/> </crm:E54_Dimension> </crm:P43_has_dimension> </crm:E22_Man-Made_Object> XML export: <COIN> <WEIGHT>3.46</WEIGHT> </COIN> http://dim/d1 http://www.oeaw.ac.at/MU/gr http://www.oeaw.ac.at/DIM/weight" 3,46 3,46

Mapping to Paths: Introducing an additional node in the Domain Source Domain: //COIN Target Domain: E22 Man-Made Object P2 has type Constant Node: E55 Type coin 28

Mapping under condition Source Domain: Coin Target Domain: E22 Man-Made Object P108i was produced by if DATE_CA = 1 Intermediate Node: E12 Production P4 has time-span Source Path: DATE_CA Target Path: Intermediate Node: E52 Time-Span Target Range: E55 Type Source Range: DATE_CA circa

Guidelines Domain: Two approaches for defining the Target Domain Introduce a specialization of a CIDOC-CRM class: e.g. Exx Coin subclass of E22 Man-Made Object Define the Type of the CIDOC-CRM class: E22 Man-Made Object. P2 has type: E55 Type = “Coin” To choose we need to answer the question: Does the new class Coin have new properties that are not available in E22? Identifiers: We map local identifiers in relational database tables explicitly only if these identifiers are visible in the user interface and used in other documents as well. Alternatively, we use the local database identifiers only for generating URIs for the record instance, here the coin instance, and do NOT map the COIN.ID at all.

COUNTRY_ID == COUNTRY_ID Mapping joins Source Domain: //Coin Target Domain: E22 Man-Made Object P108i was produced by Source Path: COUNTRY_ID == COUNTRY_ID Intermediate Node: E12 Production P10 falls within Source Range: //COUNTRY Target Range: E4 Period

Mixing categorical and factual info Need to separate categorical and factual data Inconsistent information: Find spot -> for a specific coin Historical facts -> for a category of coins

Categorical production Need to extend the model in order to support categorical production (similar to FRBR R26 produced things of type) Type can take values such as "AU from Rome, mint ..." which characterize the "edition" of the mint that can be recognized to be outcome of the same minting process. Typically we would assume that there is a unique stamp used. E22 Man-Made Object MyCoin P137 exemplifies E55 Type AU from Rome P108i was produced by PC1 produced things of type E12 Production p1

Mixing categorical and factual info P2 has type E7 Activity ia1 E55 Type Issuing P17 was motivated by Needs specialization “gave order” P108 has produced E12 Production p1 E22 Man-Made Object MyCoin PC1 produced things of type P2 has type E55 Type AU from Rome, mint … E55 Type “AU” (DENOMINATION)

Issuer "Issuer" is an accidental role, does not characterize an actor independently from particular contexts of activity. Therefore the Actor does not have the type "Issuer" but the activity only has the type "Issuing" Target Domain: E22 Man-Made Object P108i was produced by Source Domain: //COIN E12 Production p1 P17 was motivated by Source Path: ISSUER_ID == PR_ID E7 Activity ia1 E55 Type Issuing P14 carried out by Source Range: //ISSUER Target Range: E39 Actor

dFMRÖ coin db

dFMRÖ coin db

dFMRÖ coin db

CIDOC CRM Mapping Repository Published schema matching definitions are available at: http://www.ics.forth.gr/isl/3M-PublishedMappings/ The schema matching definition (Version 1.0) format is available: http://www.ics.forth.gr/isl/mapping_technology/xsd/x3ml/x3ml_v1.0.xsd The Mapping Memory Manager (3M) is available: http://www.ics.forth.gr/isl/3M/ Domain experts are able to easily understand & edit X3ML mapping files You are kindly invited to send us your schema matching definition.

Mapping to the CRM: Conclusions Mapping to the CRM can serve just as guide for good-practice data structures. It can be used to create a Semantic Web of cultural knowledge. It can be used to preserve data in a neutral form. Even though mapping can become weird, good data structures transform easily, and there are commercial tools. No tool can guess all the experts intention in a data structure: Domain experts must assist the mapping.

Lessons from mapping experiences Semantic Interoperability can be defined by the capability of mapping Mapping for epistemic networks is relatively simple: Specialist/primary information databases frequently employ a flat schema, reducing complex relationships into simple fields Source fields frequently map to composite paths under the CRM, making semantics explicit using a small set of primitives Intermediate nodes are postulated or deduced (e.g., “production” from “coin”, “birth” from “person”). They are the hooks for integration with complementary sources Cardinality constraints must not be enforced= Alternative or incomplete knowledge Domain experts easily learn schema mapping IT experts may not understand meaning, underestimate it or are bored by it! Intuitive tools for domain experts needed: Separate identifier matching from schema mapping Separate terminology mediation from schema mapping

Thank you!