and Transmission Standard overview – and case study Metadata Encoding and Transmission Standard overview – and case study Markus Enders, SUB Göttingen enders@sub.uni-goettingen.de
METS overview METS was derived from „Making of America“ format --> generalize format; usage for other media types Funded by Digital Library Federation (DLF) multiple structures are possible; type attribute can be "logical, physical" etc... nested div elements Editorial Board is steering the development helds “Mets Opening Days”
METS overview structMap central object mandatory <mets:div TYPE=”Monograph” LABEL=”From Hamburg to San Fransisco” ORDER=”1” ID="DMD1"> structMap <div> central object mandatory nested <div> store structure multiple structures (type attribute) multiple structures are possible; type attribute can be "logical, physical" etc... nested div elements
METS overview structLink structMap central object mandatory <mets:structLink> <mets:smLink xlink:from=”div1” xlink:to=”div2”> structLink structMap <div> central object mandatory nested <div> store structure multiple structures (type attribute) structLink: stores links between two <div> elements link between two div elements from different <structMap>
METS overview contains file groups structLink structMap (nested) files are contained in file groups basic technical metadata as attributes link from a <div> to one or more files structLink structMap <div> <fptr>: FileSec parallel or sequential file groups can be nested; Files are always contained in file Groups. files: attributes for checksums, size, mime type – further technical metadata can be stored in metadata section <fptr> = file pointer <par> or <seq>: several file pointers for each <div> possible; files can be parallel or sequential can link into a file: -- images: HTML coordinates -- byte-offsets -- XML IDs -- time codes (streaming media) link to streams <FileGrp> <file> link into a file
METS overview Descriptive metadata vs. Administrative Metadata metadata can be embedded or referenced XML or binary metadata extensions schemas used: MODS, DC, premis etc... m:n relationship between metadata and <div> od <file> Desc. MD extension schema Admin. MD Administrative Metadata: seperate sections: technical metadata, digital provenance metadata, rights metadata, source metadata METS does not come with an own metadata schema, but enables to plug in different extensions schemas extension schema techMD digiProvMD rightsMD sourceMD
METS overview structLink StructMap Desc. MD FileSec Admin. MD <div> extension schema FileSec Admin. MD several metadata sections for each <div> or <file> possible a single metadata section can be used by several <div> or <file> objects <FileGrp> extension schema <file> techMD digiProvMD rightsMD sourceMD
METS overview METS Header structLink StructMap Desc. MD FileSec <div> extension schema FileSec Admin. MD METS header contains information about the METS object (mets file), NOT about the content <FileGrp> extension schema <file> techMD digiProvMD rightsMD sourceMD
METS overview How does the linking work (in XML): XML IDs are used: each target must have a unique ID <mets:dmdSec ID="DMD1"> Metadata: DMDID and ADMID are of the type IDREFs <mets:div DMDID="DMD1 DMD2"> ID need only locally unique (within the same file) IDREFS: space separated pointers may point everywhere in the file: even from DMDID to <file>: file will validate Not a problem of METS data model but of XML representation File pointer: <mets:fptr FILEID="FN10081"/>
METS example (1) Digitization Centre Simple Document model (single structure) several content files per document (single TIFF image per page) bibliographic metadata logical structure for the document (table fo content) direct relationships between logical structure entities and content files This model was developed in mid 90ies, stored in XML with a proprietary metadata set
METS example (1) Digitization Centre Simple logical document model Logical structure <structMap> Content files <fileSec> Monograph 00000001.tif 00000002.tif Chapter 00000003.tif Chapter 00000004.tif 00000005.tif Max. eine Datei pro Seite; Namenskonvention bestimmt die Reihenfolge Chapter 00000006.tif Chapter 00000007.tif Chapter 00000008.tif
METS example (1) Digitization Centre Simple logical document model Logical structure <structMap> Content files <fileSec> Metadaten Monograph Metadaten 00000001.tif 00000002.tif Chapter 00000003.tif Chapter 00000004.tif 00000005.tif file can belong to several document structure entities Chapter 00000006.tif Chapter 00000007.tif Chapter 00000008.tif
METS example (1) Digitization Centre Simple logical document model Logical structure <METS:structMap TYPE="LOGICAL"> <METS:div TYPE="Monograph"DMDID="dmdlog0001"> <METS:div TYPE="TitlePage" ID="log0002"> <METS:fptr FILEID="bitonal0001"/> </METS:div> <METS:div TYPE="Dedication" ID="log0003"/> <METS:fptr FILEID="bitonal0002"/> ...... </METS:structMap> file can belong to several document structure entities
METS example (1) Digitization Centre Simple logical document model Metadata <METS:dmdSec ID="dmdlog0001"> <METS:mdWrap MDTYPE="MODS"> <METS:xmlData> <MODS:mods> ...... </MODS:mods> </METS:xmlData> </METS:mdWrap> </METS:dmdSec> MODS metadata embedded in METS
METS example (1) Digitization Centre Simple logical document model ContentFiles <METS:fileSec> <METS:fileGrp> <METS:file ID="bitonal0001" MIMETYPE="image/tiff"> <METS:FLocat LOCTYPE="URL" xlink:href="file://./00000001.tif"/> </METS:file> </METS:fileGrp> </METS:fileSec> Files are only referenced no metadata section for files; basic technical metadata is included as attributes: size, mimetype and checksum...
METS example (2) Digitization Centre Document model with two structures logical structure (TOC) physical structure (bound book, page) realtionships between structures Every structure entity has its own metadata section content files are linked to physical structure entities
METS example (2) Digitization Centre Document model with two structures Logical structure Phys. structure Content files Monograph Bound Book 00000001.tif Page 00000002.tif Chapter Page 00000003.tif Chapter Page 00000004.tif Page 00000005.tif page area: column Chapter page area 00000006.tif Chapter Page 00000007.tif Chapter Page 00000008.tif Page HiRes01.jpg Page Fulltext.xml
METS example (2) Digitization Centre Document model with two structures Map two structures <METS:structMap TYPE="LOGICAL"> <METS:div TYPE="Monograph" ID="log0001" DMDID="dmdlog0001"/> </METS:structMap> <METS:structMap TYPE="PHYSICAL"> <METS:div TYPE="BoundBook" ID="phys0001" DMDID="dmdphys0001"> <METS:div TYPE="page" ID="phys0002" DMDID="dmdphys0002"/> </METS:div> </METS:structMap>
METS example (2) Digitization Centre Document model with two structures Map two structures <METS:structLink TYPE="xxx"> <!--Monograph --> <METS:smLink from="log0001" to="phys0001"/> <!—title page--> <METS:smLink from="log0002" to="phys0002"/> </METS:structLink> link from logical to physical (pages)
METS example (2) Digitization Centre Document model with two structures Link to several files <METS:div TYPE="page" ID="phys0002" DMDID="dmdphys0002"> <METS:fptr FILEID="bitonal0001"/> <METS:fptr FILEID="hires0001"/> </METS:div> Link to page area files are neither sequential nor parallel, but alternative versions link to page area: COORDS attribute contains information where the column is <METS:div TYPE="column" ID="phys0003" DMDID="dmdphys0002"> <METS:fptr> <METS:area FILEID="bitonal00000001" COORDS="40x40x150x250"/> </METS:fptr> </METS:div>
METS example (2) Digitization Centre Document model with two structures Logical structure Phys. structure Content files Monograph Bound Book 00000001.tif Page 00000002.tif Chapter Page 00000003.tif Chapter Page 00000004.tif Page 00000005.tif Link to full text: single fulltext file (TEI) for the whole monograph Chapter page area 00000006.tif Chapter Page 00000007.tif Chapter Page 00000008.tif Page HiRes01.jpg Page Fulltext.xml
METS example (2) Digitization Centre Document model with two structures Link to fulltext (TEI): <METS:div TYPE="page"> <METS:fptr> <METS:area FILEID="teixml01" BEGIN="xx02" END"xx02"BETYPE="IDREF"/> </METS:fptr> </METS:div> <METS:div TYPE="page"> <METS:fptr> <METS:area FILEID="teixml01" BEGIN="xx02" END"xx02"BETYPE="IDREF"/> </METS:fptr> </METS:div> files are neither sequential nor parallel, but alternative versions link to page area: COORDS attribute contains information where the column is <TEI:p> <TEI:q id="xx01">....</TEI:q> <TEI:q id="xx02">....</TEI:q> <TEI:pb n="13"/> <TEI:q id="xx03">...</TEI:q> </TEI:p>
METS example (2) Digitization Centre Document model with two structures Fulltext is referenced, not embedded in METS file due to file sizes. METS file is about 2 – 3 MB fulltext is about 20 MB Use MODS for descriptive metadata for logical structure entities files are neither sequential nor parallel, but alternative versions link to page area: COORDS attribute contains information where the column is Own descriptive metadata schema for physical structure entites – storing page numbers
METS example (2) Digitization Centre Why did the GDZ choose METS: easily extendable: may start with image digitization and may add fulltext later complex structure needs to be stored Fulltext format not flexible enough: (1) TEI knows only one kind of structure (logical); does not know any pages (just page breaks). (2) no extensive metadata model --> fulltext files needs to be linked to a METS file
METS creation: By hand in XML editor (structMap the only required object) special tools for certain purposes, e.g: - conversion tools for web-archiving - ... At GDZ: GOOBI workflow management tool to do: General METS API, which implements the data model.
METS presentation: Depends on your METS file: - simple XSLT transformations - repository systems (ContentDM, Fedora etc.) - some page turners available (for digitized content)
METS-Profile Documentation Documentation is necessary: Describe objects and relationships in you document model: What objects are available What metadata are attached to those objects How are objects related to each other (trees) How to store unambiguous order? Are there non-hierarical relationships between objects? Which content files are available? How's the access granularity?
METS-Profile Documentation Documentation should not describe a format generally, but the precise usage of a packaging format. Example: How to inheirit relationships between two structure-trees? Chapter Page Page Need the column be linked to the chapter directly or is an indirect link sufficient? Page Column
METS-Profile Documentation Documentation should not describe a format generally, but the precise usage of a packaging format. Examples: How to link into fulltext files? Usage of BEGIN and END attributes How to store the order of <div> elements? What kind of <div> elements are available? Developing and sharing documentation encourages the usage of „complex document formats“ even for simple documents. documents can be enriched with additional information later on.
METS-Profile Documentation METS Profile describes the usage of METS for a special scenario: - what extension schemas are used? - what authority files? - usage of attributes and elements METS-profile schema available; profile is an XML file, which is not machine readable. „registry“ on METS website available ähnliche Dokumente:
http://www.loc.gov/mets ähnliche Dokumente: