Mapping Cultural Heritage Information to CIDOC-CRM* Maria Theodoridou Foundation for Research and Technology – Hellas Institute of Computer Science *Somewhat adjusted by C.-E. Ore
Overview X3ML An interface for sustainable management of data mapping process Use Case Mapping the dFMRÖ coin database to CIDOC-CRM http://139.91.183.3/3M/Login
Cultural Diversity and Data Standards Cultural information is more than a domain: Collection description (art, archeology, natural history….) Archives and literature (records, treaties, letters, artful works..) Administration, preservation, conservation of material heritage Science and scholarship – investigation, interpretation Presentation – exhibition making, teaching, publication But how to make a documentation standard? Each aspect needs its methods, forms, communication means Data overlap, but do not fit in one schema
“One model to rule them all” The CIDOC CRM The CIDOC Conceptual Reference Model A collaboration with the International Council of Museums An ontology of 86 classes and 137 properties for culture and more With the capacity to explain hundreds of (meta)data formats Accepted by ISO TC46 in September 2000 International standard since 2006 - ISO 21127:2006 Serving as: intellectual guide to create schemata, formats, profiles A language for analysis of existing sources for integration/mediation “Identify elements with common meaning” Transportation format for data integration / migration / Internet
What Means Mapping of One Schema to Another A sufficient specification for the transformation of each instance of a source schema into an instance of a target schema while preserving as much as possible its initial ‘meaning’ CIDOC-CRM Approach (target schema = CIDOC-CRM): interpretation of source schema as semantic model (nodes and links), mapping each element of that to an equivalent CIDOC-CRM path, such that each instance of an element of the source semantic model can be converted into a valid instance of the CIDOC-CRM with the same meaning.
Interpreting a Schema as Semantic Model 1. Interpreting tables, columns as entities 2. Interpreting records as entity instances 3. Interpreting fieldnames as relationships and entities 4. Interpreting field contents as entity instances Each field is interpreted as entity-relationship-entity (e-r-e) The whole schema is decomposed into e-r-e’s Each e-r-e is mapped individually to the CIDOC-CRM
X3ML X3ML is an XML based language which describes schema mappings in such a way that they can be collaboratively created and discussed by experts. Mappings have been done in very many custom ways in the past. In practice mappings are produced manually by Domain/IT experts: labor-intensive error prone time consuming Emphasis is on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.
X3ML: A Mapping Language X3ML is a declarative, XML based language which describes schema mappings in such a way that they can be collaboratively created and discussed by experts. Mappings have been done in very many custom ways in the past. In practice mappings are produced manually by Domain/IT experts: labor-intensive error prone time consuming Emphasis is on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.
X3ML toolkit the X3ML Toolkit is a set of small, open source, microservices that follow the SYNERGY Reference Model. They are designed with open interfaces and they can be easily customized and adapted to complex environments. The key components of the toolkit are: Mapping Memory Manager 3M Editor X3ML Engine FORTH’s open access service is found at: https://www.ics.forth.gr/isl/3M
3M : Mapping Memory Manager 3M is a tool for managing mapping definition files. It’s based on FIMS management system for the administration of the files and also on the 3MEditor for editing and viewing the files. It provides a number of administrative actions that assist the experts to manage their mapping definition files. The source code is open source available on github http://github.com/isl/Mapping-Memory-Manager
3M Editor: A Mapping Editor 3MEditor is a software that allows domain experts to build and discuss mappings with little resource to any particular software skills. It is the interface tool envisioned to allow domain experts to build mappings. It provides: Source and target agnostic mapping facility Guided mapping according to deployed ontology’s logic Comment and justification facility Mapping storage Separated instance generation practice for IT professionals The source code is open source available on github https://github.com/isl/3MEditor
X3ML engine: A Transformation Tool The X3ML Engine realizes the transformation of the source records to the target format. The engine takes as input the source data (currently in the form of an XML document), the description of the mappings in the X3ML mapping definition file and the URI generation policy file and is responsible for transforming the source document into a valid RDF document which corresponds to the input XML file, with respect to the given mappings and policy. The source code is open source available on github https://github.com/isl/x3ml
URI generation specification X3ML Workflow Domain Experts IT Experts CIDOC-CRM Schema Matching URI generation specification Terminology Mapping DB2 Schema Matching Definition file DB2 DB1
URI generation specification X3ML Workflow Domain Experts IT Experts CIDOC-CRM Schema Matching URI generation specification Terminology Mapping DB2 Schema Matching Definition file DB2 DB1
dFMRÖ digitale FundMünzen der Römischen Zeit in Österreich Austrian Academy of Sciences Numismatic Commission Klaus Vondrovec klaus.vondrovec@khm.at Access DB since 1999 MySQL DB online since 2007 http://www.oeaw.ac.at/numismatik/projekte/dfmroe/dfmroe.html Elmer: VO 27. Juni 1928, 70. Geburtstag Kubitscheks
Tables
Interpreting a Schema as Semantic Model, Example The field name stands for a relationship and the kind of contents The field contents stand for an entity instance Object 627 has ID: Identifier 627 <COIN> <ID>627</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>1980er Jahre</FIND_DATE> <AUTHORITY_ID>566</AUTHORITY_ID> <ISSUER_ID>536</ISSUER_ID> <DENOMINATION>30</DENOMINATION> <MINT_ID>244</MINT_ID> <OFFICINA>99</OFFICINA> <DATE_CA>0</DATE_CA> <DATE_FROM>-116</DATE_FROM> <DATE_TO>-115</DATE_TO> <DAT_VAL>1088408850</DAT_VAL> <WEIGHT>3.46</WEIGHT> <DIE_AXE>4</DIE_AXE> <STATUS_ID>1</STATUS_ID> <RV_LEG>CN.DOMI</RV_LEG> <RV_PIC>Iuppiter in Quadriga n. r. ..</RV_PIC> <ARCH_INFO>-</ARCH_INFO> <PH_NAME>000627</PH_NAME> <DAT_TXT>116 - 115 v. Chr.</DAT_TXT> </COIN> The whole record corresponds to one entity (data example from dFMRÖ)
Mapping the First Element: Creating an Equivalent Proposition Source Schema interpretation Source Domain: COIN Target Domain: E22 Man-Made Object CIDOC-CRM Schema maps to: Source Path: “has ID” Target Path: P1 is identified by Source Range: ID Target Range: E41 Appellation
Mapping the First Element: Instance valid for both schemata http://coin/627 Source Schema interpretation Source Domain: COIN Target Domain: E22 Man-Made Object CIDOC-CRM Schema Source Path: “has ID” Target Path: P1 is identified by Source Range: ID Target Range: E41 Appellation RDF encoding: <crm:E22_Man-Made_Object rdf:about="http://coin/627"> <crm:P1_is_identified_by> <crm:E41_Appellation rdf:about="http://id/627"/> </crm:P1_is_identified_by> </crm:E22_Man-Made_Object> XML export: <COIN> <ID>626</ID> </COIN> http://id/627
Mapping the First Element: X3ML specification <mappings> <mapping> <domain> <source_node>COIN</source_node> <target_node> <entity> <type>crm:E22_Man-Made_Object</type> <instance_generator name="UUID"/> </entity> </target_node> </domain> <link> …………………. </link> </mapping> </mappings>
Mapping the First Element: X3ML specification <link> <path> <source_relation><relation>ID</relation></source_relation> <target_relation> <relationship>crm:P1_is_identified_by</relationship> </target_relation> </path> <range> <source_node>ID</source_node> <target_node> <entity> <type>crm:E41_Appellation</type> <instance_generator name="UUID"/> </entity> </target_node> </range> </link>
Interpreting a Schema as Semantic Model, Example The field name stands for a relationship and the kind of contents The field contents stand for an entity instance Object 627 weights: 3.46 <COIN> <ID>627</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>1980er Jahre</FIND_DATE> <AUTHORITY_ID>566</AUTHORITY_ID> <ISSUER_ID>536</ISSUER_ID> <DENOMINATION>30</DENOMINATION> <MINT_ID>244</MINT_ID> <OFFICINA>99</OFFICINA> <DATE_CA>0</DATE_CA> <DATE_FROM>-116</DATE_FROM> <DATE_TO>-115</DATE_TO> <DAT_VAL>1088408850</DAT_VAL> <WEIGHT>3.46</WEIGHT> <DIE_AXE>4</DIE_AXE> <STATUS_ID>1</STATUS_ID> <RV_LEG>CN.DOMI</RV_LEG> <RV_PIC>Iuppiter in Quadriga n. r. ..</RV_PIC> <ARCH_INFO>-</ARCH_INFO> <PH_NAME>000627</PH_NAME> <DAT_TXT>116 - 115 v. Chr.</DAT_TXT> </COIN> Implicit information: Weight is measured in grams The whole record corresponds to one entity (data example from dFMRÖ)
Mapping to Paths: Introducing an intermediate node Source Domain: Coin Target Domain: E22 Man-Made Object P43 has dimension Source Path: weights Target Path: Intermediate Node: E54 Dimension P90 has value Source Range: WEIGHT Target Range: Literal
Mapping to Paths: Introducing an intermediate node Instance of source http://coin/627 Source Schema interpretation Target Domain: E22 Man-Made Object Instance of target Source Domain: COIN CIDOC-CRM Schema http://coin/627 P43 has dimension Source Path: weights Intermediate Node: E54 Dimension http://dim/d1 Source Range: WEIGHT P90 has value http://id/627 Target Range: Literal http://id/627
Mapping to Paths: Introducing an additional node Source Domain: Coin Target Domain: E22 Man-Made Object Constant Node: E58 Measurement Unit gr P43 has dimension Source Path: weights Target Path: Intermediate Node: E54 Dimension P91 has unit P2 has type ConstantNode: E55 Type P90 has value weight Source Range: WEIGHT Target Range: Literal
Mapping to Paths: Introducing intermediate & additional nodes Instance of source Instance of target http://coin/627 http://coin/627 RDF encoding: <crm:E22_Man-Made_Object rdf:about="http://coin/627"> <crm:P43_has_dimension> <crm:E54_Dimension rdf:about="http://dim/d1"> <crm:P90_has_value>3.46</crm:P90_has_value> <crm:P91_has_unit rdf:resource="http://www.oeaw.ac.at/MU/gr"/> <crm:P2_has_type rdf:resource="http://www.oeaw.ac.at/DIM/weight"/> </crm:E54_Dimension> </crm:P43_has_dimension> </crm:E22_Man-Made_Object> XML export: <COIN> <WEIGHT>3.46</WEIGHT> </COIN> http://dim/d1 http://www.oeaw.ac.at/MU/gr http://www.oeaw.ac.at/DIM/weight" 3,46 3,46
Mapping to Paths: Introducing an additional node in the Domain Source Domain: //COIN Target Domain: E22 Man-Made Object P2 has type Constant Node: E55 Type coin 28
Mapping under condition Source Domain: Coin Target Domain: E22 Man-Made Object P108i was produced by if DATE_CA = 1 Intermediate Node: E12 Production P4 has time-span Source Path: DATE_CA Target Path: Intermediate Node: E52 Time-Span Target Range: E55 Type Source Range: DATE_CA circa
Guidelines Domain: Two approaches for defining the Target Domain Introduce a specialization of a CIDOC-CRM class: e.g. Exx Coin subclass of E22 Man-Made Object Define the Type of the CIDOC-CRM class: E22 Man-Made Object. P2 has type: E55 Type = “Coin” To choose we need to answer the question: Does the new class Coin have new properties that are not available in E22? Identifiers: We map local identifiers in relational database tables explicitly only if these identifiers are visible in the user interface and used in other documents as well. Alternatively, we use the local database identifiers only for generating URIs for the record instance, here the coin instance, and do NOT map the COIN.ID at all.
COUNTRY_ID == COUNTRY_ID Mapping joins Source Domain: //Coin Target Domain: E22 Man-Made Object P108i was produced by Source Path: COUNTRY_ID == COUNTRY_ID Intermediate Node: E12 Production P10 falls within Source Range: //COUNTRY Target Range: E4 Period
Mixing categorical and factual info Need to separate categorical and factual data Inconsistent information: Find spot -> for a specific coin Historical facts -> for a category of coins
Categorical production Need to extend the model in order to support categorical production (similar to FRBR R26 produced things of type) Type can take values such as "AU from Rome, mint ..." which characterize the "edition" of the mint that can be recognized to be outcome of the same minting process. Typically we would assume that there is a unique stamp used. E22 Man-Made Object MyCoin P137 exemplifies E55 Type AU from Rome P108i was produced by PC1 produced things of type E12 Production p1
Mixing categorical and factual info P2 has type E7 Activity ia1 E55 Type Issuing P17 was motivated by Needs specialization “gave order” P108 has produced E12 Production p1 E22 Man-Made Object MyCoin PC1 produced things of type P2 has type E55 Type AU from Rome, mint … E55 Type “AU” (DENOMINATION)
Issuer "Issuer" is an accidental role, does not characterize an actor independently from particular contexts of activity. Therefore the Actor does not have the type "Issuer" but the activity only has the type "Issuing" Target Domain: E22 Man-Made Object P108i was produced by Source Domain: //COIN E12 Production p1 P17 was motivated by Source Path: ISSUER_ID == PR_ID E7 Activity ia1 E55 Type Issuing P14 carried out by Source Range: //ISSUER Target Range: E39 Actor
dFMRÖ coin db
dFMRÖ coin db
dFMRÖ coin db
CIDOC CRM Mapping Repository Published schema matching definitions are available at: http://www.ics.forth.gr/isl/3M-PublishedMappings/ The schema matching definition (Version 1.0) format is available: http://www.ics.forth.gr/isl/mapping_technology/xsd/x3ml/x3ml_v1.0.xsd The Mapping Memory Manager (3M) is available: http://www.ics.forth.gr/isl/3M/ Domain experts are able to easily understand & edit X3ML mapping files You are kindly invited to send us your schema matching definition.
Mapping to the CRM: Conclusions Mapping to the CRM can serve just as guide for good-practice data structures. It can be used to create a Semantic Web of cultural knowledge. It can be used to preserve data in a neutral form. Even though mapping can become weird, good data structures transform easily, and there are commercial tools. No tool can guess all the experts intention in a data structure: Domain experts must assist the mapping.
Lessons from mapping experiences Semantic Interoperability can be defined by the capability of mapping Mapping for epistemic networks is relatively simple: Specialist/primary information databases frequently employ a flat schema, reducing complex relationships into simple fields Source fields frequently map to composite paths under the CRM, making semantics explicit using a small set of primitives Intermediate nodes are postulated or deduced (e.g., “production” from “coin”, “birth” from “person”). They are the hooks for integration with complementary sources Cardinality constraints must not be enforced= Alternative or incomplete knowledge Domain experts easily learn schema mapping IT experts may not understand meaning, underestimate it or are bored by it! Intuitive tools for domain experts needed: Separate identifier matching from schema mapping Separate terminology mediation from schema mapping
Thank you!