TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway.

Slides:



Advertisements
Similar presentations
Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
Advertisements

METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
IUFRO International Union of Forest Research Organizations Eero Mikkola Description of WP2 – NEFIS Metadata and Controlled Vocabularies Standards - work.
Music Encoding Initiative (MEI) DTD and the OCVE
ISO (CIDOC CRM) - a very concise introduction What it is What it’s for How it’s being used Questions Nick Crofts – Convenor ISO TC46/SC4/WG9.
1 CIDOC CRM + FRBR ER = FRBR OO … an equation for a harmonised view of museum information and bibliographic information Martin Doerr First CASPAR Seminar.
TEI, CIDOC-CRM and a Possible Interface between the Two Øyvind Eide & Christian-Emil Ore* Unit for Digital Documentation, University of Oslo, Norway (*ICOM.
FRBR: Functional Requirements for Bibliographic Records it is the Final Report of the IFLA Study Group on the Functional Requirements for Bibliographic.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Presented by Karen W. Gwynn LS – Metadata University of Alabama Prof. Steven MacCall Spring 2011.
Martin Doerr, Gerald Hiebel, Institute of Computer Science
Digital Encoding What’s behind E-text Resources?.
Z39.50, XML & RDF Applications ZIG Tutorial January 2000 Poul Henrik Jørgensen, Danish Bibliographic Centre,
ICS – FORTH, August 31, 2000 Why do we need an “Object Oriented Model” ? Martin Doerr Atlanta, August 31, 2000 Foundation for Research and Technology -
Idea-garden.org SOCIAL SEMANTIC INFORMATION SPACE An Interactive Learning Environment Fostering Creativity Grant agreement no: nd CIDOC CRM-SIG.
Harmonising without Harm: towards an object-oriented formulation of FRBR aligned on the CIDOC CRM ontology Maja Žumer (University of Ljubljana) & Patrick.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Using an ontology-driven system to integrate museum information and library information Paper presented on the occasion of the Symposium on Digital Semantic.
Moving Cataloguing into the 21 st Century Presentation given at the CLA pre-conference Shaping Tomorrow’s Metadata with RDA June 2, 2010 by Tom Delsey.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Conceptual models: museums & libraries towards an object-oriented formulation of FRBR aligned on the CIDOC CRM ontology The title of the present ELAG.
Descriptive metadata in the Finnish National digital library and the role of CIDOC CRM in the standards portfolio of NDL Juha Hakala The National Library.
Jenn Riley Metadata Librarian IU Digital Library Program New Developments in Cataloging.
Standards for digital encoding Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž.
Moving from a locally-developed data model to a standard conceptual model Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Metadata for Music: Understanding the Landscape Jenn Riley Indiana University Digital Library Program.
The CIDOC Conceptual Reference Model A core-ontology for information integration Karl H. Lampe, Zoologisches Forschungsmuseum Alexander Koenig (ZFMK) Bonn/Germany.
1 Exploring time and space in the annotation of museum catalogues: The Sloane virtual exhibition experience Stephen Stead Vienna November 2014 University.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
XML for Text Markup An introduction to XML markup.
Evidence from Metadata INST 734 Doug Oard Module 8.
RDA Compared with AACR2 Presentation given at the ALA conference program session Look Before You Leap: taking RDA for a test-drive July 11, 2009 by Tom.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
Clarino WP4 – Electronic Editions Platform Christian-Emil Ore, UiO Clarino Solstrand-møte 12. september 2013.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Description of Bibliographic Items. Review Encoding = Markup. The library cataloging “markup” language is MARC. Unlike HTML, MARC tags have meaning (i.e.,
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Standards for digital encoding Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 2: TEI.
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
From FRBR to FRBR OO through CIDOC CRM… A Common Ontology for Cultural Heritage Information Patrick Le Bœuf, National Library of France International Symposium.
Digital Humanities Dr. Øyvind Eide TEI and neighbouring standards: CIDOC-CRM Øyvind Eide Universität Passau (with thanks to Christian-Emil.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Helsinki, November FRBR: the bright new future? Part 1 Maja Žumer University of Ljubljana Slovenia.
Tiziana // Alessandra Lenzi - MG Breaking down the walls Project Museo Galileo and the Linked Open Data A joint project between.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Sally McCallum Library of Congress
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
TEI presentation for IS 590 Robert Patrick Waltz July 10 th, 2012.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
RSC Strategy Gordon Dunsire, Chair, RDA Steering Committee
From FRBR to FRBROO through CIDOC CRM…
Markup of Educational Content
M Using CIDOCs CRM in creating a common database for museum objects – some “real life” experiences The Museum Project University of Oslo, Norway Jon.
M Using CIDOCs CRM in creating a common database for museum objects – some “real life” experiences The Museum Project University of Oslo, Norway Jon.
Introducing IFLA-LRM Gordon Dunsire, Chair, RSC
The new RDA: resource description in libraries and beyond
Proposal of a Geographic Metadata Profile for WISE
FRBR and FRAD as Implemented in RDA
From XML to objects and events in a CRM compatible database
Presentation transcript:

TEI, CIDOC-CRM and a Possible Interface between the Two? Øyvind Eide & Christian-Emil Ore Unit for Digital Documentation, University of Oslo, Norway

The CIDOC Conceptual Reference Model (cidoc.ics.forth.gr) What is the CIDOC CRM? –An object oriented ontology developed by ICOM-CIDOC, –Accepted as ISO in June 2005 –About 80 classes and 130 properties for cultural and natural history –CRM instances can be encoded in many forms: RDBMS, ooDBMS, XML, RDF(S), OWL. What is the CIDOC CRM for? –Intellectual guide to create schemata, formats, profiles Extension of CRM with a categorical level, e.g. reoccurring events –Best practice guide –A language for analysis of existing sources and models for data integration (mapping) –Transportation format for data integration / migration /Internet Ongoing activities –CRM-Core –Harmonisation with object oriented version of FRBR, (Functional Requirement for Bibliographic Records, IFLA), first version will be published in fall 2006 –Extension of CRM with a categorical level, e.g. reoccurring events

The CIDOC CRM Top-level Classes relevant for Integration participate in E39 Actors (persons, inst.) E55 Types E28 Conceptual Objects E18 Physical Things E2 Temporal Entities (Events) E41 Appellations refer to / refine refer to / identifie have location within E53 Places E52 Time-Spans at affect or refer to

CIDOC CRM: Class hierarchy

CIDOC CRM: Events

CIDOC CRM: Things and Conceptual object

Original text (text witness) Bibliographical record Text with XML mark-up 1. Structural mark-up (2. Lemmatization etc.) Step 1: registration Step 3: transcription Facsimile Step 2: reproduction Text with XML mark-up Information elements identified and marked up according to a simple information model, DTD) Step 4: content mark-up Museum database artefacts, excavations, referential information Event/object oriented model (CIDOC-CRM compatible) Motivation: Grey literature in Museums

Catalogue entry 8. Malayan dagger, taken from pirates of the Indian Oceans. Beautiful handle, graven as a human figure above waistline. Snake winded blade. VII, IX, p, 2. Daa,O., 99. Donated April from Captain Teiste. Motivation: Grey literature in Museums

Catalogue entry with mark up 8. Malayan dagger, taken from pirates of the Indian Oceans. Beautiful handle, graven as a human figure above waistline. Snake winded blade. VII, IX, p, 2. Daa,O., 99. Donated April from Captain Teiste. Motivation: Grey literature in Museums

The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces. Motivation: Grey literature in Museums

E31 Document E21 Person (actor) E82 Actor appellaton ”Dr. Diggey” E7 Activity E52 Time span E50 Date ”2005” E55 Type ”Archaeological report” P2 has type P1 is identified by E11 Modification ”Breaking of the sword” P9 forms part of P14 carried out by E22 Man–Made object “Sword” P12 was present at P70 documents P4 has time-span E55 Type ”Archaeological excavation” E53 Place E44 Place appellaton ”Wasteland” P7 took place at E82 Object identifier ” C50435” P2 has type The content of the text expressed in CIDOC-CRM P1 is identified by P78 is identified byP87 is identified by

Originally, a research project within the humanities –Founded in –Sponsored by three professional associations –Funded by US NEH, EU LE Programme etal Major influences –digital libraries and text collections –language corpora –scholarly datasets International consortium established June 1999 (see TEI - where did itcome from? Acc. to L. Burnard

better interchange and integration of scholarly data support for all texts, in all languages, from all periods guidance for the perplexed: what to encode — hence, a user-driven codification of existing best practice assistance for the specialist: how to encode — hence, a loose framework into which unpredictable extensions can be fitted These apparently incompatible goals result in a highly flexible, modular, environment Goals of the TEI Acc. to L. Burnard

A set of recommendations for text encoding, covering both generic text structures and some highly specific areas based on (but not limited by) existing practice A very large collection of element (400+) definitions with associated declarations for various schema languages a modular system for creating personalized schemas or DTDs from the foregoing for the full picture see c.org/TEI/Guidelines/ TEI Deliverables Acc. to L. Burnard

a way of looking at what ‘text’ really is a codification of current scholarly practice (crucially) a set of shared assumptions about the digital agenda: –focus on content and function (rather than presentation) –identify generic solutions (rather than application- specific ones) Legacy of the TEI Acc. to L. Burnard

Elements for detailed bibliographic description: –File description Title statement Edition statement Extent statement Publication statement Series statement Notes Source Description – bibliographic elements (Manuscript description) –Encoding description –Profile description –Revision description Mapping to other meta data standards –Marc, discusset –Dublin Core unfinished TEI - the header

Base Tag Set for Verse Performance Texts Transcription of Speech Print Dictionaries Manuscript description Linking and alignment; analysis Feature structures; Certainty; physical transcription; textual criticism, Names and dates Graphs, networks and trees Graphics, figures and tables Language Corpora Representation of non-standard characters and glyphs Feature System Declaration TEI additional element sets

Some “ontological” elements in TEI: Events History –groups elements describing the full history of a manuscript or manuscript part. Origin –contains any descriptive or other information concerning the origin of a manuscript or manuscript part CustEvent –describes a single event during the custodial history of a manuscript Provenance –contains any descriptive or other information concerning the origin of a manuscript or manuscript part Acquisition –contains any descriptive or other information concerning the process by which a manuscript or manuscript part entered the holding institution.

Event –(Event) any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication. Eg. “ceiling collapses” during a recorded interview persEvent –contains a description of a particular event of significance in the life of a person Birth,death –contains information about a person's birth/death, such as its date and place Date –contains a date in any format. Occasion –a temporal expression (either a date or a time) given in terms of a named occasion such as a holiday, a named time of day, or some notable event Some “ontological” elements in TEI: Events, time appellations

Person –provides information about an identifiable individual, for example a participant in a language interaction, or a person referred to in a historical source. Hand –used in the header to define each distinct scribe or handwriting style. Author –in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item Name –(name, proper noun) contains a proper noun or noun phrase Some “ontological” elements in TEI: Actors and appellations

Ovid Publius Ovidius Naso 20 March 43 BC Sulmona Italy 17 or 18 AD Tomis (Constanta) Romania Some “ontological” elements in TEI: Person example (from P5 guidelines)

A simple extension of the TEI-dtd The root CIDOC-CRM element The class element <!ATTLIST crmClass id#ID className#CDATA> The property element <!ELEMENT crmProperty #EMPTY <!ATTLIST crmProperty id#ID propName#CDATA from#IDREF to#IDREF>

The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces. The sample text revisited

The text expressed with a TEI mark-up The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.

archaeological excavation Dr. Diggey 2005 … … … Encoding the information in an RDF-triplet fashion

CRM-Core – a dtd for encoding information [suggested by CRM-SIG]

E31 Document Archaeological report Wasteland excavation 2005 report P70_documents Wasteland_2005_excavation E7_Activity Dr. Diggey excavator C50435 sword 2005 Wasteland P70_documents damage_to_artifact_C50435 E11_Modification Dr. Diggey excavator C50435 sword P9_forms_part_of Wasteland_2005_excavation Encoding the information in CRM Core (Factoides)

E21 Person archaeologist Dr. Diggey P14 carried out by damage_to_artifact_C50435 E11 Modification excavator C50435 sword E82 Actor appellaton formal name mention of name Wasteland_excavation_2005_report#n2 Encoding the information in CRM Core (Factoides)

Conclusions and further work Possible now –TEI extended with a RDF-like CIDOC-CRM –TEI extended with CRM-Core records Future: –Make a mapping from TEI-elements to CRM –Make a mapping from the TEI-header into ooFRBR –Create an extension of the TEI definition –Write guidelines for CIDOC-CRM encoding of information in TEI documents –Convince the TEI users