Presentation is loading. Please wait.

Presentation is loading. Please wait.

A METS Application Profile for Historical Newspapers

Similar presentations


Presentation on theme: "A METS Application Profile for Historical Newspapers"— Presentation transcript:

1 A METS Application Profile for Historical Newspapers
Morgan Cundiff Network Development and MARC Standards Office Library of Congress

2 Outline XML and Standards Definition of METS
Definition of METS Profiles Use of MODS relatedItem element Draft METS Profile for Historical Newspapers Parting Thoughts

3 XML “XML has become the de-facto standard for representing metadata descriptions of resources on the Internet.” Jane Hunter Working towards MetaUtopia - A Survey of Current Metadata Research

4 The Importance of Standards
“In moving from dispersed digital collections to interoperable digital libraries, the most important activity we need to focus on is standards… most important is the wide variety of metadata standards [including] descriptive metadata… administrative metadata…, structural metadata, and terms and conditions metadata…” Howard Besser The Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries

5 What is METS? METS is an XML Schema designed for the purpose of creating XML document instances that express the hierarchical structure of digital library objects, the names and locations of the files that comprise those objects, and the associated metadata. METS can, therefore, be used as a tool for modeling real world objects, such as particular document types.

6 What are the 7 Sections of a METS Document?
<metsHdr/> <dmdSec/> <amdSec/> <fileSec/> <structMap/> <structLink/> <behaviorSec/> </mets>

7 The Descriptive Metadata Section with mdWrap
<mets> <dmdSec> <mdWrap> <xmlData> <!-- insert data from different namespace here --> </xmlData> </mdWrap> </dmdSec> <fileSec></fileSec> <structMap></structMap> </mets>

8 The Descriptive Metadata Section with MODS as extension schema
<mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods></mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap></mets:structMap> </mets:mets>

9 The Descriptive Metadata Section with MODS and relatedItem elements
<mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem type=“constituent”> <mods:relatedItem type=“constituent”></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap></mets:structMap> </mets:mets>

10 MODS relatedItem element
Child element to MODS relatedItem element has same content model as mods (titleInfo, name, subject, physicalDescription, note, etc) The relatedItem element makes it possible to create very rich analytic descriptions for contained works within a MODS records relatedItem element is repeatable and it can be nested recursively (thus making it possible to build a hierarchical tree structure) relatedItem elements make it possible to associate descriptive data with any structural element.

11 Use of MODS relatedItem element to express logical structure
<mods:mods> <mods:titleInfo> <mods:title>Baltimore Sun</mods:title> </mods:titleInfo> <mods:relatedItem type="constituent"> <mods:title>Sports</mods:title> <mods:relatedItem type="constituent"> <mods:title>O’s Split Beantown Twi-niter</mods:title> </mods:relatedItem> <mods:title>Chisox Nip Tribe</mods:title> </mods:mods>

12 METS document with two hierarchies (logical and physical)
<mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem> <mods:relatedItem></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap> <mets:div> <mets:div></mets:div> </mets:div> </mets:structMap> </mets:mets>

13 Linking in METS Documents (XML ID/IDREF links)
DescMD mods relatedItem AdminMD techMD sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

14 Linking in METS Documents (XML ID/IDREF links)
DescMD mods relatedItem AdminMD techMD sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

15 Linking in METS Documents (XML ID/IDREF links)
DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

16 Linking in METS Documents (XML ID/IDREF links)
DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

17 Linking in METS Documents (XML ID/IDREF links)
DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

18 Linking in METS Documents (XML ID/IDREF links)
DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

19 What is a METS Application Profile?
“METS Profiles are intended to describe a class of METS documents in sufficient detail to provide both document authors and programmers the guidance they require to create and process METS documents conforming with a particular profile.” A profile is expressed as an XML document. There is a schema for this purpose. The profile expresses the requirements that a METS document must satisfy. A sufficiently explicit METS Profile may be considered a “data standard”. Note: A METS Profile is a human-readable prose document and is not intended to be “machine actionable”.

20 METS Profile for Historical Newspapers [draft]
The METS Profile for Historical Newspapers specifies how METS documents representing digitized historical newspapers should be encoded. Note that the profile is to be used to represent a single issue of a newspaper. The profile uses MODS to express the logical structure of a newspaper issue, and uses the METS structMap to express the physical structure of the newspaper issue. [draft abstract] URL to find Profile and related documents:

21 METS Profile (features)
Represents one issue of a newspaper.

22 METS Profile (features)
The Profile presumes the use of alto files (or some equivalent) where the zones on the corresponding digital image (expressed as coordinates) are correlated to the corresponding logical entity (e.g. article or paragraph) and also to the corresponding OCR text.

23 METS Profile (features)
The Profile maintains a strict separation between logical entities and physical entities.

24 METS Profile (features)
The primary logical entities are issue, issue section, article, article section, illustration, and advertisement. The top-level MODS record describes the issue. The other primary logical entities (issue section, article, article section, illustration, and advertisement) are described in a heirarchy of MODS relatedItem elements.

25 METS Profile (features)
Logical structure is represented using MODS in the METS dmdSec. It is necessary to use the latest version (version 3.2) of MODS.

26 Hierarchy of Logical Entities
issue issue section article (or article-like entity) paragraph illustration (photograph, drawing, map, table) article section illustration advertisements article

27 METS Profile (features)
The primary logical entities are expressed as values of the MODS genre element.

28 Use of MODS relatedItem element to express logical structure
<mods:mods> <mods:titleInfo> <mods:title>Baltimore Sun</mods:title> <mods:genre>newspaper</genre> </mods:titleInfo> <mods:relatedItem type="constituent"> <mods:title>Sports</mods:title> <mods:genre>section</genre> <mods:relatedItem type="constituent"> <mods:title>O’s Split Beantown Twi-niter</mods:title> <mods:genre>article</mods:genre> <mods:title>Aparicio puts tag on Jensen to end 7th</mods:title> <mods:genre>photograph</genre> </mods:relatedItem> </mods:mods>

29 METS Profile (features)
The allowable genre values (for Profile compliance) are listed in Newspaper Genre Terms [draft].

30 METS Profile (features)
It is also possible to tag subparts of the primary logical entities. The typical example of this is tagging the paragraph. This is accomplished using the MODS part element.

31 METS Profile (features)
There are only three physical entities. They are: issue, page, and pageRegion.

32 METS Profile (features)
The physical entities are represented in the structMap section of the METS document as div types (div type="news:page"). There is only one structMap.

33 METS Profile (features)
Page regions are correlated to the corresponding logical entity by means of an IDREF link. Note that one or more page regions may correspond to a single logical entity. This makes it possible to make the necessary associations when the logical entity is split into more than one physical entity, e.g. when a paragraph is continued on the next column or an article is continued on a different page.

34 METS Profile (features)
Example document

35 Parting Thoughts Agreement on a data standard (such as a METS profile) will facilitate interoperability. Interoperability can be between any two agents (digital library applications, preservation repositories, search and retrieval systems, etc.) Newspaper community has a “quality vs. quantity” dilemma. Large volume of material to be digitized necessitates automatic processing. Automatic processing produces dirty data and less satisfying results. High quality processing (requiring more human intervention) is more expensive but produces far better results and pays dividends far into the future (the data will be used over and over without additional cost).

36

37


Download ppt "A METS Application Profile for Historical Newspapers"

Similar presentations


Ads by Google