Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 12 Why metadata? CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu.

Similar presentations


Presentation on theme: "Lecture 12 Why metadata? CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu."— Presentation transcript:

1 Lecture 12 Why metadata? CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel

2 Notes Carl Lagoze on Wednesday No Lab on Friday But Paul Ginsparg on 04/03 XML Schema & XSLT - later

3 Content – Data - Metadata
Content refers to digital library materials as information that is of interest to a user. Data emphasize bits and bytes to be processed by a computer. Metadata : data about data

4 Metadata – focus on description/discovery
data about data origins in library cataloguing, A&I databases now: an amplification of traditional bibliographic cataloguing practices in an electronic environment; now: any data used to aid the identification, description and location of networked electronic resources. actually, it is more

5 Metadata - broader descriptive: facilitating resource discovery and identification (record in OPAC system) administrative: facilitating resource management within a collection (loan record in OPAC system) structural: binding together the components of complex information objects (series title in record in OPAC system)

6 descriptive/discovery
Metadata - evolution descriptive/discovery library objects descriptive/discovery administrative structural library objects networked resources

7 Metadata Traditionally stored separately from the objects that it describes, For digital objects, sometimes is embedded in the objects (cf. KWF). Usually the metadata is a set of text fields. Textual metadata can be used to describe non-textual objects, e.g., software, images, music, …

8 Metadata – why? Some methods of information discovery search descriptive metadata about the objects. Generally, it enables digital library services: explicitly (discovery metadata) or implicitly (terms and conditions) helps to impose order on chaos enables automated discovery/manipulation of objects

9 Metadata – generation (traditional)
cataloguing rules object metadata record reference data

10 Metadata – generation (traditional)
Advantage: Human expertise leads to high-quality catalogs and indexes Disadvantages: Expensive ($50+ per record) Time consuming Requires cumbersome cataloguing rules Slow to adapt to new formats and types of digital objects Human cataloging and indexing is too expensive to apply to all but a small proportion of digital objects => automatic generation of metadata

11 Metadata – roots (Library cataloguing)
Anglo American Cataloguing Rules (AACR2) • rules for what goes into each field of a catalog record MARC format • an exchange format for catalog records "MARC Catalog" • catalog in MARC format, where content of each field follows AACR2

12 Citation: a monograph -- book!
Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.

13 MARC tags MARC subfield code MARC subfield MARC field MARC indicator

14 ISBN Title statement Imprint – location, publisher, year Collation Series Title

15 directory leader field terminator 001 field

16 MARC: the good news A great achievement: Developed in 1960s Magnetic tape exchange format for printing catalog records The dawn of computing: mixed upper and lower case variable length fields, repeated fields non-Roman scripts 100(?) million records with standard content and format Thousands of trained librarians (millions?)

17 MARC: the bad news A great problem: Not designed for computer algorithms One record per item (poor links between records) Tied to traditional materials and traditional practices Not Unicode 100 million records at $50+/record A classic legacy system!

18 Metadata –- simplicity/complexity
Variety of metadata formats for description/discovery: basic, proprietary, records used in global internet search services; simple attribute/value records such as the ROADS templates used in eLib subject services; unqualified Dublin Core (12 elements only) the more structured TEI and MARC formats; qualified Dublin Core detailed formats such as CIMI and EAD, typically applied to archival material.

19 Metadata –- one-size-fits-all/application-profiles
There is an evolution from a “one size fits all” concept for metadata towards: the use of a specific format depending on the purpose; the co-existence of formats in relation to an object; combining metadata elements from various formats; Choice of format can depend on: the functional purpose of the metadata –- [description/ discovery/location] ; [administration] ; [structuring] level of detail required to fulfill the purpose discipline/domain/audience of the objects that are described legacy issues interoperability requirements

20 Internet Commons Metadata – interoperability Commerce Home Pages
Whatever... Home Pages Geo Internet Commons Library Museums Scientific Data

21 Metadata – descriptive/other
There is an evolution towards the creation of standards for non-discovery related metadata formats: Preservation metadata [NedLib, CEDARS, …] (see OCLC overview document - Data Dictionary for Technical Metadata for Digital Still Images“ ( book e-commerce [ONIX] resource administration: Circulation Interchange Protocol (NCIP) Standard – see Electronic resources (cf. Adam Chandler)


Download ppt "Lecture 12 Why metadata? CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu."

Similar presentations


Ads by Google