Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mapping Cultural Heritage Information to CIDOC-CRM*

Similar presentations


Presentation on theme: "Mapping Cultural Heritage Information to CIDOC-CRM*"— Presentation transcript:

1 Mapping Cultural Heritage Information to CIDOC-CRM*
Maria Theodoridou Foundation for Research and Technology – Hellas Institute of Computer Science *Somewhat adjusted by C.-E. Ore

2 Overview X3ML An interface for sustainable management of data mapping process Use Case Mapping the dFMRÖ coin database to CIDOC-CRM

3 Cultural Diversity and Data Standards
Cultural information is more than a domain: Collection description (art, archeology, natural history….) Archives and literature (records, treaties, letters, artful works..) Administration, preservation, conservation of material heritage Science and scholarship – investigation, interpretation Presentation – exhibition making, teaching, publication But how to make a documentation standard? Each aspect needs its methods, forms, communication means Data overlap, but do not fit in one schema

4 “One model to rule them all” The CIDOC CRM
The CIDOC Conceptual Reference Model A collaboration with the International Council of Museums An ontology of 86 classes and 137 properties for culture and more With the capacity to explain hundreds of (meta)data formats Accepted by ISO TC46 in September 2000 International standard since ISO 21127:2006 Serving as: intellectual guide to create schemata, formats, profiles A language for analysis of existing sources for integration/mediation “Identify elements with common meaning” Transportation format for data integration / migration / Internet

5 What Means Mapping of One Schema to Another
A sufficient specification for the transformation of each instance of a source schema into an instance of a target schema while preserving as much as possible its initial ‘meaning’ CIDOC-CRM Approach (target schema = CIDOC-CRM): interpretation of source schema as semantic model (nodes and links), mapping each element of that to an equivalent CIDOC-CRM path, such that each instance of an element of the source semantic model can be converted into a valid instance of the CIDOC-CRM with the same meaning.

6 Interpreting a Schema as Semantic Model
1. Interpreting tables, columns as entities 2. Interpreting records as entity instances 3. Interpreting fieldnames as relationships and entities 4. Interpreting field contents as entity instances Each field is interpreted as entity-relationship-entity (e-r-e) The whole schema is decomposed into e-r-e’s Each e-r-e is mapped individually to the CIDOC-CRM

7 X3ML X3ML is an XML based language which describes schema mappings in such a way that they can be collaboratively created and discussed by experts. Mappings have been done in very many custom ways in the past. In practice mappings are produced manually by Domain/IT experts: labor-intensive error prone time consuming Emphasis is on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.

8 X3ML: A Mapping Language
X3ML is a declarative, XML based language which describes schema mappings in such a way that they can be collaboratively created and discussed by experts. Mappings have been done in very many custom ways in the past. In practice mappings are produced manually by Domain/IT experts: labor-intensive error prone time consuming Emphasis is on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.

9 X3ML toolkit the X3ML Toolkit is a set of small, open source, microservices that follow the SYNERGY Reference Model. They are designed with open interfaces and they can be easily customized and adapted to complex environments. The key components of the toolkit are: Mapping Memory Manager 3M Editor X3ML Engine FORTH’s open access service is found at:

10 3M : Mapping Memory Manager
3M is a tool for managing mapping definition files. It’s based on FIMS management system for the administration of the files and also on the 3MEditor for editing and viewing the files. It provides a number of administrative actions that assist the experts to manage their mapping definition files. The source code is open source available on github 

11 3M Editor: A Mapping Editor
3MEditor is a software that allows domain experts to build and discuss mappings with little resource to any particular software skills. It is the interface tool envisioned to allow domain experts to build mappings. It provides: Source and target agnostic mapping facility Guided mapping according to deployed ontology’s logic Comment and justification facility Mapping storage Separated instance generation practice for IT professionals The source code is open source available on github 

12 X3ML engine: A Transformation Tool
The X3ML Engine realizes the transformation of the source records to the target format. The engine takes as input the source data (currently in the form of an XML document), the description of the mappings in the X3ML mapping definition file and the URI generation policy file and is responsible for transforming the source document into a valid RDF document which corresponds to the input XML file, with respect to the given mappings and policy. The source code is open source available on github 

13 URI generation specification
X3ML Workflow Domain Experts IT Experts CIDOC-CRM Schema Matching URI generation specification Terminology Mapping DB2 Schema Matching Definition file DB2 DB1

14 URI generation specification
X3ML Workflow Domain Experts IT Experts CIDOC-CRM Schema Matching URI generation specification Terminology Mapping DB2 Schema Matching Definition file DB2 DB1

15 dFMRÖ digitale FundMünzen der Römischen Zeit in Österreich
Austrian Academy of Sciences Numismatic Commission Klaus Vondrovec Access DB since 1999 MySQL DB online since 2007 Elmer: VO 27. Juni 1928, 70. Geburtstag Kubitscheks

16

17 Tables

18 Interpreting a Schema as Semantic Model, Example
The field name stands for a relationship and the kind of contents The field contents stand for an entity instance Object 627 has ID: Identifier 627 <COIN> <ID>627</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>1980er Jahre</FIND_DATE> <AUTHORITY_ID>566</AUTHORITY_ID> <ISSUER_ID>536</ISSUER_ID> <DENOMINATION>30</DENOMINATION> <MINT_ID>244</MINT_ID> <OFFICINA>99</OFFICINA> <DATE_CA>0</DATE_CA> <DATE_FROM>-116</DATE_FROM> <DATE_TO>-115</DATE_TO> <DAT_VAL> </DAT_VAL> <WEIGHT>3.46</WEIGHT> <DIE_AXE>4</DIE_AXE> <STATUS_ID>1</STATUS_ID> <RV_LEG>CN.DOMI</RV_LEG> <RV_PIC>Iuppiter in Quadriga n. r. ..</RV_PIC> <ARCH_INFO>-</ARCH_INFO> <PH_NAME>000627</PH_NAME> <DAT_TXT> v. Chr.</DAT_TXT> </COIN> The whole record corresponds to one entity (data example from dFMRÖ)

19 Mapping the First Element: Creating an Equivalent Proposition
Source Schema interpretation Source Domain: COIN Target Domain: E22 Man-Made Object CIDOC-CRM Schema maps to: Source Path: “has ID” Target Path: P1 is identified by Source Range: ID Target Range: E41 Appellation

20 Mapping the First Element: Instance valid for both schemata
Source Schema interpretation Source Domain: COIN Target Domain: E22 Man-Made Object CIDOC-CRM Schema Source Path: “has ID” Target Path: P1 is identified by Source Range: ID Target Range: E41 Appellation RDF encoding: <crm:E22_Man-Made_Object rdf:about=" <crm:P1_is_identified_by> <crm:E41_Appellation rdf:about=" </crm:P1_is_identified_by> </crm:E22_Man-Made_Object> XML export: <COIN> <ID>626</ID> </COIN>

21 Mapping the First Element: X3ML specification
<mappings> <mapping> <domain> <source_node>COIN</source_node> <target_node> <entity> <type>crm:E22_Man-Made_Object</type> <instance_generator name="UUID"/> </entity> </target_node> </domain> <link> …………………. </link> </mapping> </mappings>

22 Mapping the First Element: X3ML specification
<link> <path> <source_relation><relation>ID</relation></source_relation> <target_relation> <relationship>crm:P1_is_identified_by</relationship> </target_relation> </path> <range> <source_node>ID</source_node> <target_node> <entity> <type>crm:E41_Appellation</type> <instance_generator name="UUID"/> </entity> </target_node> </range> </link>

23 Interpreting a Schema as Semantic Model, Example
The field name stands for a relationship and the kind of contents The field contents stand for an entity instance Object 627 weights: 3.46 <COIN> <ID>627</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>1980er Jahre</FIND_DATE> <AUTHORITY_ID>566</AUTHORITY_ID> <ISSUER_ID>536</ISSUER_ID> <DENOMINATION>30</DENOMINATION> <MINT_ID>244</MINT_ID> <OFFICINA>99</OFFICINA> <DATE_CA>0</DATE_CA> <DATE_FROM>-116</DATE_FROM> <DATE_TO>-115</DATE_TO> <DAT_VAL> </DAT_VAL> <WEIGHT>3.46</WEIGHT> <DIE_AXE>4</DIE_AXE> <STATUS_ID>1</STATUS_ID> <RV_LEG>CN.DOMI</RV_LEG> <RV_PIC>Iuppiter in Quadriga n. r. ..</RV_PIC> <ARCH_INFO>-</ARCH_INFO> <PH_NAME>000627</PH_NAME> <DAT_TXT> v. Chr.</DAT_TXT> </COIN> Implicit information: Weight is measured in grams The whole record corresponds to one entity (data example from dFMRÖ)

24 Mapping to Paths: Introducing an intermediate node
Source Domain: Coin Target Domain: E22 Man-Made Object P43 has dimension Source Path: weights Target Path: Intermediate Node: E54 Dimension P90 has value Source Range: WEIGHT Target Range: Literal

25 Mapping to Paths: Introducing an intermediate node
Instance of source Source Schema interpretation Target Domain: E22 Man-Made Object Instance of target Source Domain: COIN CIDOC-CRM Schema P43 has dimension Source Path: weights Intermediate Node: E54 Dimension Source Range: WEIGHT P90 has value Target Range: Literal

26 Mapping to Paths: Introducing an additional node
Source Domain: Coin Target Domain: E22 Man-Made Object Constant Node: E58 Measurement Unit gr P43 has dimension Source Path: weights Target Path: Intermediate Node: E54 Dimension P91 has unit P2 has type ConstantNode: E55 Type P90 has value weight Source Range: WEIGHT Target Range: Literal

27 Mapping to Paths: Introducing intermediate & additional nodes
Instance of source Instance of target RDF encoding: <crm:E22_Man-Made_Object rdf:about=" <crm:P43_has_dimension> <crm:E54_Dimension rdf:about=" <crm:P90_has_value>3.46</crm:P90_has_value> <crm:P91_has_unit rdf:resource=" <crm:P2_has_type rdf:resource=" </crm:E54_Dimension> </crm:P43_has_dimension> </crm:E22_Man-Made_Object> XML export: <COIN> <WEIGHT>3.46</WEIGHT> </COIN> 3,46 3,46

28 Mapping to Paths: Introducing an additional node in the Domain
Source Domain: //COIN Target Domain: E22 Man-Made Object P2 has type Constant Node: E55 Type coin 28

29 Mapping under condition
Source Domain: Coin Target Domain: E22 Man-Made Object P108i was produced by if DATE_CA = 1 Intermediate Node: E12 Production P4 has time-span Source Path: DATE_CA Target Path: Intermediate Node: E52 Time-Span Target Range: E55 Type Source Range: DATE_CA circa

30 Guidelines Domain: Two approaches for defining the Target Domain
Introduce a specialization of a CIDOC-CRM class: e.g. Exx Coin subclass of E22 Man-Made Object Define the Type of the CIDOC-CRM class: E22 Man-Made Object. P2 has type: E55 Type = “Coin” To choose we need to answer the question: Does the new class Coin have new properties that are not available in E22? Identifiers: We map local identifiers in relational database tables explicitly only if these identifiers are visible in the user interface and used in other documents as well. Alternatively, we use the local database identifiers only for generating URIs for the record instance, here the coin instance, and do NOT map the COIN.ID at all.

31 COUNTRY_ID == COUNTRY_ID
Mapping joins Source Domain: //Coin Target Domain: E22 Man-Made Object P108i was produced by Source Path: COUNTRY_ID == COUNTRY_ID Intermediate Node: E12 Production P10 falls within Source Range: //COUNTRY Target Range: E4 Period

32 Mixing categorical and factual info
Need to separate categorical and factual data Inconsistent information: Find spot -> for a specific coin Historical facts -> for a category of coins

33 Categorical production
Need to extend the model in order to support categorical production (similar to FRBR R26 produced things of type) Type can take values such as "AU from Rome, mint ..." which characterize the "edition" of the mint that can be recognized to be outcome of the same minting process. Typically we would assume that there is a unique stamp used. E22 Man-Made Object MyCoin P137 exemplifies E55 Type AU from Rome P108i was produced by PC1 produced things of type E12 Production p1

34 Mixing categorical and factual info
P2 has type E7 Activity ia1 E55 Type Issuing P17 was motivated by Needs specialization “gave order” P108 has produced E12 Production p1 E22 Man-Made Object MyCoin PC1 produced things of type P2 has type E55 Type AU from Rome, mint … E55 Type “AU” (DENOMINATION)

35 Issuer "Issuer" is an accidental role, does not characterize an actor independently from particular contexts of activity. Therefore the Actor does not have the type "Issuer" but the activity only has the type "Issuing" Target Domain: E22 Man-Made Object P108i was produced by Source Domain: //COIN E12 Production p1 P17 was motivated by Source Path: ISSUER_ID == PR_ID E7 Activity ia1 E55 Type Issuing P14 carried out by Source Range: //ISSUER Target Range: E39 Actor

36 dFMRÖ coin db

37 dFMRÖ coin db

38 dFMRÖ coin db

39 CIDOC CRM Mapping Repository
Published schema matching definitions are available at: The schema matching definition (Version 1.0) format is available: The Mapping Memory Manager (3M) is available: Domain experts are able to easily understand & edit X3ML mapping files You are kindly invited to send us your schema matching definition.

40 Mapping to the CRM: Conclusions
Mapping to the CRM can serve just as guide for good-practice data structures. It can be used to create a Semantic Web of cultural knowledge. It can be used to preserve data in a neutral form. Even though mapping can become weird, good data structures transform easily, and there are commercial tools. No tool can guess all the experts intention in a data structure: Domain experts must assist the mapping.

41 Lessons from mapping experiences
Semantic Interoperability can be defined by the capability of mapping Mapping for epistemic networks is relatively simple: Specialist/primary information databases frequently employ a flat schema, reducing complex relationships into simple fields Source fields frequently map to composite paths under the CRM, making semantics explicit using a small set of primitives Intermediate nodes are postulated or deduced (e.g., “production” from “coin”, “birth” from “person”). They are the hooks for integration with complementary sources Cardinality constraints must not be enforced= Alternative or incomplete knowledge Domain experts easily learn schema mapping IT experts may not understand meaning, underestimate it or are bored by it! Intuitive tools for domain experts needed: Separate identifier matching from schema mapping Separate terminology mediation from schema mapping

42 Thank you!


Download ppt "Mapping Cultural Heritage Information to CIDOC-CRM*"

Similar presentations


Ads by Google