Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description INF 384 C, Spring 2009
Outline Why create and follow metadata standards? What kinds of standards are there? How does this all work? How do standards evolve? INF 384 C, Spring 2009
The world of standards A standard is any agreed-upon means of doing something. Standards can be formally created and adopted or merely customary. With standards, products and processes have a certain level of consistency and predictability that can make production and use more efficient. INF 384 C, Spring 2009
Goals of metadata standards Metadata standards enable more reliable description. For example, by agreeing to use separate fields to indicate first names and last names of resource creators, displays of search results by author can be properly alphabetized and more easily read, no matter if first name or last name comes first in the display. Reliable description enables the sharing of data across different systems. INF 384 C, Spring 2009
Types of standards Elings and Waibel describe four types of metadata standards: Data structure (fields); MARC and EAD. Data content (values); AACR2 (RDA) and DACS. Data format; XML. Data exchange; Z39.50 and OAI. These are useful categories, but sometimes standards may straddle them. You could say, for example, that MARC reflects AACR2 and not the other way around (although MARC defines data fields in a technical sense, AACR2 defines the content with which the fields are populated and to some degree conceptually determines the MARC fields; in practice these two become functionally intertwined). INF 384 C, Spring 2009
Multiple standards at work A cataloger uses AACR2 to determine: That a book’s title should be part of its description. The wording, spelling, capitalization, and punctuation of the title. The cataloger uses MARC to record the title information in a consistent form that computers can process. INF 384 C, Spring 2009
Multiple standards at work Two computer networks can use Z39.50 to determine how to exchange their MARC catalog records. The result? A user at Library A can search Library B’s catalog and not discern a difference in the way that information is structured and presented. It just works. INF 384 C, Spring 2009
Developing and adopting standards Organizations agree to adopt standards because the benefits of creating products or services that work together can be great. However, developing standards and forging that agreement can be a difficult process. For metadata content standards, using them can be complicated, and there is plenty of room for interpretive flexibility.
Content standards: considerations Why are content standards so complicated? Because documents are various! Most content standards will try to implement a few basic guidelines supplemented by rules and options for special cases. Ideally, the basic guidelines will be based on clearly articulated goals and principles.
Example: RDA goals RDA has articulated a concrete set of descriptive goals and principles. A few goals: Enable description of any resource (not just printed materials). Align with the FRBR conceptual model (works, expressions, manifestations, resources) and its objectives (finding, selecting, understanding, and so on). Create content descriptions that can be used in multiple encodings and displays. Retain backward compatibility with existing records.
Example: RDA Principles One principle is that descriptions should reflect “the resource’s representation of itself.” This is a longstanding principle in library cataloging: where possible, description = transcription. This can be linked to the objective of finding known items: the catalog description should match how the item is known to others, which is most likely from the item itself.
Example: RDA guidelines This principle of transcription underlies the basic guideline for RDA titles, which is that the “title proper” or primary title should come from the preferred source of information, which for books is the title page. While the wording comes from the title page, though, the capitalization and punctuation are standardized for all titles. INF 384 C, Spring 2009
Example: RDA special cases What if... Some introductory words on the title page seem like they’re not really part of the title (e.g., Walt Disney Presents Sleeping Beauty)? The title is given in two languages (e.g., Canadian Literature/Litterature Canadienne)? There is a spelling mistake in the title? The document is a manifestation of a commonly known work but has a slightly different title than most manifestations (e.g., William Shakespeare’s Hamlet)? A subtitle appears under what seems to be the main title (e.g., Museum Informatics an introductory textbook)? The title is over one paragraph long? INF 384 C, Spring 2009
Keeping standards relevant Standards are immediately out of date, of course. RDA has been in development since 2004, as part of a cooperative effort by U.S., U.K., Canadian, and Australian library associations. These are tremendous efforts! Particular institutions, such as the Library of Congress, will issue their own rules for interpreting the standards, which smaller organizations (such as the University of Texas) may or may not choose to adopt. INF 384 C, Spring 2009
Your mission Complete your subject classification for next week: introduction, classified structure, alphabetical structure, and reflective essay. A few notes on assignments, based on the individual conferences, follow... INF 384 C, Spring 2009
A few assignment notes Brevity is nice for concept labels, but it’s more important to specify the precise extent of the concept clearly. If you mean “taking pictures with a digital camera,” don’t use the label “digital camera.” INF 384 C, Spring 2009
Equivalence If you’ve identified several synonymous terms for a concept, select one term for the label. You can mention the others in a usage note in the alphabetical structure. Example Cockroaches Water bugs is a synonym for this term. Class documents that refer to water bugs here. INF 384 C, Spring 2009
Non-subject concepts Don’t include document attributes that aren’t subjects, such as forms or genres (blogs, articles, books, diaries...). You are creating a representation of a subject that can be used to organize documents; you are not describing the types of documents in which users might be interested. Include in your classification: terms for concepts that relate to gardening, such as types of plants (grasses, cacti, shrubs). Do not include in your classification: Document types that list such plants (plant databases, seed catalogs). However, you might use your classification to categorize a cactus database with the Cacti concept... INF 384 C, Spring 2009