Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic
Metadata It is added value to digital files for which it forms a container to identify them to enable easier access and navigation to control the entire compound document to enable archival storage to enable research work and publication of even critical editions, etc.
Compound Document The document consisting of interconnected metadata and data files the metadata are added descriptions (mostly pieces of text) the data are any external files produced by digitizing pieces of original documents (images, texts, sound files, even video files)
What is described? OBJECTS - of which the document consists and which build the document - which have their unchanging substance - whose representations can vary in their different occurrences - which can have some important additional characteristics
Object OISEAU BIRD PTÁK VOGEL Cock Kohout Hahn Eagle Orel Adler Penguin Tučňák Pinguin Falcon Sokol Falke Duck Kachna Ente
Objects They are defined by the creator or interpreter of the document They can be built from any sequence or amount of bits in metadata or data areas It should be established: which types of objects must be distinguished how they should be marked
Object OISEAU We have decided to have such an object (animal with wings, feathers, laying eggs) We have decided to mark anything having these characteristics as OISEAU We know that this object has different names in different languages (bird, pták, Vogel, птица, pasăre, …) We know that in reality only concrete birds appear (duck, cock, falcon, penguin, eagle, …)
Objects and contents Semantically poor content formal object (paragraph, heading, note, …) used for formatting languages built on these objects are used for output (HTML, MS WORD, …) PRESCRIPTIVE MARK-UP Semantically rich content content oriented object (author, flower, house, …) used for understanding languages built on these objects are used for description (MARC, TEI, EAD, DOBM, …) DESCRIPTIVE MARK-UP
SGML Standard Generalized Markup Language a general language to mark objects to be applied, it needs to become more concrete (this is made via DTD) thus, second level applications can be written these applications are used directly or they require additional definitions (DTDs) SGML applications: HTML, XML, TEI, …
DESIGNING OUR PROJECT
What do we need? Open communication Internal precision and cohesion of markup Multiple output, reuse of marked data, liberty to add new marked data Complex document control and management Open and flexible content-oriented description principle
What do we work with? For a manuscript having 300 pages, we work with: more than 1500 digital data files produced through digitization (Gallery, Preview, Internet, User, Excellent quality levels: 300x5 + images for covers, end-sheets,...) more than 300 description metadata files (each digitized piece of the original + files for bibliographic and technical descriptions + technological files) This means that the above mentioned requirements must be applied to a complex document consisting of hundreds of computer files, which play various roles.
Independency Metadata should be independent of display – pure values We must know: which features of objects to describe – we need DESCRIPTION RULES how to mark up these objects – we need RULES for MARK-UP how to formalize which objects and how will be described – DTD how to display the compound document – we need rules for display (transformation rules) If the platform is SGML or XML, we write DTD and XSL tools. type of document; place
place of publishing; publisher; date; addressee
description elements author type of document: postcard place: Hronov place of publishing: Hronov publisher: Karel Šefelín date: 1914 addressee: František Bittnar annotation: Streets of Hronov in 1914; postcard written by my great-grandmother to her husband making military service However, maybe there are better rules, e.g. AACR2 defining how to describe a postcard – we should take them or some approach largely applied than this proposal of ours.
how to mark the elements? In DTD: In Metadata File: Hronov
write postcard Hronov Karel Šefelín 1914 František Bittnar Streets of Hronov in 1914; postcard written by my great-grandmother to her husband making military service
publish XSL transformation of the XML files … in order to display them Index by a database tool and provide even a better access Link metadata with image data This is work for professionals
tools Simple browsing Internet access tools