Download presentation
Presentation is loading. Please wait.
Published byRalf Malone Modified over 9 years ago
1
An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006
2
10/17/06L566 Fall 20062 Topics we’ll cover Choosing descriptive metadata standards Choosing controlled vocabularies Using controlled vocabularies to enhance searching and browsing Wrapping it all up
3
Choosing descriptive metadata standards
4
10/17/06L566 Fall 20064 Descriptive metadata Enables users to find relevant materials Used by many different knowledge domains Many potential representations Controlled by Data structure standards Data content standards Syntax encoding schemes Vocabulary encoding schemes
5
10/17/06L566 Fall 20065 Some data structure standards Dublin Core (DC) Unqualified (simple) Qualified MAchine Readable Cataloging (MARC) MARC in XML (MARCXML) Metadata Object Description Schema (MODS)
6
10/17/06L566 Fall 20066 How do I pick one? (1) Institution Nature of holding institution Resources available for metadata creation What others in the community are doing Formats supported by your delivery software The standard Purpose Structure Context History
7
10/17/06L566 Fall 20067 How do I pick one? (2) Materials Genre Format Likely audiences What metadata already exists for these materials Project goals Robustness needed for the given materials and users Describing multiple versions Mechanisms for providing relationships between records Plan for interoperability, including repeatability of elements More information on handouthandout
8
10/17/06L566 Fall 20068 Dublin Core (DC) 15-element set National and international standard 2001: Released as ANSI/NISO Z39.85ANSI/NISO Z39.85 2003: Released as ISO 15836ISO 15836 Maintained by the Dublin Core Metadata Initiative (DCMI) Other players DCMI Working Groups DC Usage Board
9
10/17/06L566 Fall 20069 DCMI mission The Dublin Core Metadata Initiative provides simple standards to facilitate the finding, sharing and management of information. DCMI does this by: Developing and maintaining international standards for describing resources Supporting a worldwide community of users and developers Promoting widespread use of Dublin Core solutions
10
10/17/06L566 Fall 200610 DC Principles “Core” across all knowledge domains No element required All elements repeatable 1:1 principle
11
10/17/06L566 Fall 200611 DCMI Abstract Model Released in 2005 “A reference model against which particular DC encoding guidelines can be compared” Heavily influenced by RDF thinking New XML and RDF encodings under development to conform to the abstract model Two schools of thought on its development Clarifies model underlying the metadata standard Overly complicates a standard intended to be simple
12
10/17/06L566 Fall 200612 DC encodings HTML XML RDF [Spreadsheets] [Databases]
13
10/17/06L566 Fall 200613 Content/value standards for DC None required Some elements recommend a content or value standard as a best practice Relation Source Subject Type Coverage Date Format Language Identifier
14
10/17/06L566 Fall 200614 Some limitations of simple DC Can’t indicate a main title vs. other subordinate titles No method for specifying creator roles W3CDTF format can’t indicate date ranges or uncertainty Can’t by itself provide robust record relationships
15
10/17/06L566 Fall 200615 Good times to use DC Cross-collection searching Cross-domain discovery Metadata sharing Describing some types of simple resources Metadata creation by novices
16
DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML Field labelsText Reliance on AACR None Common method of creation By novices, by specialists, and by derivation
17
10/17/06L566 Fall 200617 Qualified Dublin Core (QDC) Adds some increased specificity to Unqualified Dublin Core Same governance structure as DC Same encodings as DC Same content/value standards as DC Listed in DMCI TermsDMCI Terms Additional principles Extensibility Dumb-down principle
18
10/17/06L566 Fall 200618 Types of DC qualifiers Additional elements Element refinements Encoding schemes Vocabulary encoding schemes Syntax encoding schemes
19
10/17/06L566 Fall 200619 DC qualifier status Recommended Conforming Obsolete Registered
20
10/17/06L566 Fall 200620 Limitations of QDC Widely misunderstood No method for specifying creator roles W3CDTF format can’t indicate date ranges or uncertainty Split across 3 XML schemas
21
10/17/06L566 Fall 200621 Best times to use QDC More specificity needed than simple DC, but not a fundamentally different approach to description Want to share DC with others, but need a few extensions for your local environment Describing some types of simple resources Metadata creation by novices
22
DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML Field labelsText Reliance on AACR None Common method of creation By novices, by specialists, and by derivation
23
10/17/06L566 Fall 200623 MAchine Readable Cataloging (MARC) Format for the records in IUCAT, WorldCat and other library catalogs Used for library metadata since 1960s Adopted as national standard in 1971 Adopted as international standard in 1973 Maintained by: Network Development and MARC Standards Office at the Library of Congress Standards and the Support Office at the National Library of Canada
24
10/17/06L566 Fall 200624 More about MARC Actually a family of MARC standards throughout the world U.S. & Canada use MARC21 MARC Bibliographic is for descriptive metadata Structured as a binary interchange format ANSI/NISO Z39.2 ISO 2709 Field names Numeric fields Alphabetic subfields
25
10/17/06L566 Fall 200625 Content/value standards for MARC None required by the format itself But US record creation practice relies heavily on: AACR2r ISBD LCNAF LCSH
26
10/17/06L566 Fall 200626 Limitations of MARC Use of all its potential is time-consuming OPACs don’t make full use of all possible data OPACs virtually the only systems to use MARC data Requires highly-trained staff to create Local practice differs greatly
27
10/17/06L566 Fall 200627 Good times to use MARC Integration with other records in OPAC Resources are like those traditionally found in library catalogs Maximum compatibility with other libraries is needed Have expert catalogers for metadata creation
28
DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] Field labelsText Numeric Reliance on AACR None Strong Common method of creation By novices, by specialists, and by derivation By specialists
29
10/17/06L566 Fall 200629 MARC in XML (MARCXML) Copies the exact structure of MARC21 in an XML syntax Numeric fields Alphabetic subfields Implicit assumption that content/value standards are the same as in MARC
30
10/17/06L566 Fall 200630 Limitations of MARCXML Not appropriate for direct data entry Extremely verbose syntax Full content validation requires tools external to XML Schema conformance
31
10/17/06L566 Fall 200631 Good times to use MARCXML As a transition format between a MARC record and another XML-encoded metadata format Materials lend themselves to library-type description Need more robustness than DC offers Want XML representation to store within larger digital object but need lossless conversion to MARC
32
DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] XML Field labelsText Numeric Reliance on AACR None Strong Common method of creation By novices, by specialists, and by derivation By specialists By derivation
33
10/17/06L566 Fall 200633 Metadata Object Description Schema (MODS) Developed and managed by the Library of Congress Network Development and MARC Standards Office First released for trial use June 2002 MODS 3.2 released June 2006 “Schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.”
34
10/17/06L566 Fall 200634 Differences between MODS and MARC MODS is “MARC-like” but intended to be simpler Textual tag names Encoded in XML Some specific changes Some regrouping of elements Removes some elements Adds some elements
35
10/17/06L566 Fall 200635 Content/value standards for MODS Some elements indicate a given content/value standard should be used Generally follows MARC/AACR2/ISBD conventions But not all enforced by the MODS XML schema Authority attribute available on some elements
36
10/17/06L566 Fall 200636 Limitations of MODS No lossless round-trip conversion from and to MARC Still largely implemented by library community only Some semantics of MARC lost Format still growing to meet the needs of the digital library community
37
10/17/06L566 Fall 200637 Good times to use MODS Materials lend themselves to library-type description Want to reach both library and non-library audiences Need more robustness than DC offers Want XML representation to store within larger digital object
38
DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] XML Field labelsText Numeric Text Reliance on AACR None Strong Implied Common method of creation By novices, by specialists, and by derivation By specialists By derivation By specialists and by derivation
39
10/17/06L566 Fall 200639 Picking a format Consider all options Match format to the types of discovery you want to support Your choice has to fit in your larger technological infrastructure Realize the constraints you’re operating under Or, expand infrastructure! Don’t have to choose just one, can use several for different purposes
40
10/17/06L566 Fall 200640 Mapping between metadata formats Also called “crosswalking” To create “views” of metadata for specific purposes Mapping from robust format to more general format is common Mapping from general format to more robust format is ineffective
41
10/17/06L566 Fall 200641 Types of mapping logic Mapping the complete contents of one field to another Splitting multiple values in a single local field into multiple fields in the target schema Translating anomalous local practices into a more generally useful value Splitting data in one field into two or more fields Transforming data values Boilerplate values to include in output schema
42
10/17/06L566 Fall 200642 Common mapping pitfalls Cramming in too much information Leaving in trailing punctuation Missing context of records Meaningless placeholder data ALWAYS remember the purpose of the metadata you are creating!
43
10/17/06L566 Fall 200643 No, really, which one do I pick? It depends. Sorry. Be as robust as you can afford Plan for future uses of the metadata you create Leverage existing expertise as much as possible Focus on content and value standards as much as possible
44
10/17/06L566 Fall 200644 More information Dublin Core DC Element Set version 1.1 DC Element Set version 1.1 DCMI Metadata Terms DCMI Metadata Terms MODS MARC MARCXML
45
Break time!
46
Choosing controlled vocabularies
47
10/17/06L566 Fall 200647 Some characteristics of CVs Also known as “vocabulary encoding schemes” Enumerated lists of all possible choices for a field value Often organized into a syndetic structure Usually intended to be human-readable
48
10/17/06L566 Fall 200648 CVs in libraries Many library CVs grow constantly with catalogers contributing new terms Many library CVs use content standards to dictate the form of headings Fields that use CVs are said to be under “authority control”
49
10/17/06L566 Fall 200649 Traditional uses of CVs in library catalog records Collocation Disambiguation Interoperability BROWSING! (Although this isn’t used much in libraries…)
50
10/17/06L566 Fall 200650 Other considerations Human cataloging using CVs is expensive Developing and maintaining CVs is expensive Current library systems usually rely on the same string being present in all records rather than true relational structures linking records to CV terms
51
10/17/06L566 Fall 200651 When a controlled vocabulary is useful User browsing of a small number of categories each with a large number of members When many different things have the same label When recall is a priority for a given access point
52
10/17/06L566 Fall 200652 Some common fields using CVs Names Places “Subjects”
53
10/17/06L566 Fall 200653 Names Seeking works by or about a certain individual is frequent Individuals are often known by many different names Many different individuals have the same name Name authority lists often create uniqueness by adding qualifiers Some example vocabularies: Library of Congress Name Authority File (LCNAF) Getty Union List of Artists’ Names (ULAN)
54
10/17/06L566 Fall 200654 Places Common in libraries to control place names in subjects, but not publication places Many different places with the same name Often organized hierarchically Commonly used vocabularies: Library of Congress Subject Headings (LCSH) Getty Thesaurus of Geographic Names (TGN) GEONet Names Server
55
10/17/06L566 Fall 200655 “Subjects” Libraries traditionally group topic, location, genre, form, time period and other related concepts all under “subject” Often organized into a rich syndetic structure General rule is to apply the most specific heading applicable Involves subjective judgment on the part of the individual assigning the heading
56
10/17/06L566 Fall 200656 Deciding which fields to place under authority control Consider your budgetary restraints Learn about the functionalities possible in your system Identify appropriate vocabularies that meet defined needs Develop a clear plan for how the fields with controlled values will be used
57
Using controlled vocabularies to enhance searching and browsing
58
10/17/06L566 Fall 200658 Case Study: Cushman CollectionCushman Collection Funded with an Institute of Museum & Library Services (IMLS) grant ~15,000 color slides taken between 1938- 1969 Cushman provided a significant amount of description description Additional metadata created to enhance genre, subject and geographic access
59
10/17/06L566 Fall 200659 Metadata for the Cushman Collection Cushman’s description Dates Location Names TGM I – LC Thesaurus for Graphic Materials: Subject Terms TGM II - LC Thesaurus for Graphic Materials: Genre & Physical Characteristics TGN – Getty Thesaurus of Geographic Names We wanted to use this high-quality metadata to improve on past search systems
60
10/17/06L566 Fall 200660 TGM I: Subject Terms Strengths and Weaknesses Strengths include: Pre-defined relationships between concepts Some lead-in vocabulary Weaknesses include: Syndetic relationship lacking for new terms Language not user-friendly Not enough lead-in vocabulary Form and number of top-level categories not useful for a browse structure
61
10/17/06L566 Fall 200661 User studies performed Two types Group walkthroughs of prototypes Task scenario study Some functionality suggested by the studies Refinement while searching Search suggestions Faceted browsing Browsing on subject terms at all levels CV interaction
62
10/17/06L566 Fall 200662 Browsing Image Collections Research shows: Browsing is exploratory (Bawden) Guided, flexible browsing in context works (Flamenco and SI Art Image Browser projects) Our usability studies show: Structure is important Contents should be easily exposed Flexible and combinatorial browsing is desired Browsing cultivates searching
63
10/17/06L566 Fall 200663 Searching Image Collections Research shows: Using thesaurus structure helps searching (Greenberg) Automatic expansion of synonyms and narrower terms User-initiated expansion of broader and related terms Our Usability studies show: Referencing an A-Z list with no lead-in terms for searching is NOT helpful at all Concerns about word choice Iterative reformulation of queries in context is desired
64
10/17/06L566 Fall 200664 Cushman Specifications: Browsing Date Genre Subjects (hierarchical) Retrieval of all records with narrower terms Location (hierarchical) Combination of categories
65
10/17/06L566 Fall 200665 Cushman Specifications: Searching Integrated search against BOTH “free-text” descriptions and thesaurus Integrated search Mapping from lead-in vocabulary Retrieval of all records with narrower terms User-initiated broadening and narrowing User-initiated
66
Wrapping it all up
67
10/17/06L566 Fall 200667 What next? After choosing metadata standards and controlled vocabularies Figure out where metadata creation fits in the overall workflow Write metadata creation guidelines Design and implement a metadata creation process
68
10/17/06L566 Fall 200668 And there’s more Other types of metadata Content markup Technical metadata Rights metadata Preservation metadata Structural metadata Specialized metadata standards When to create a local metadata format
69
10/17/06L566 Fall 200669 In a grant proposal (1) Give specific information on all the decisions you’ve made Metadata standards Controlled vocabularies Metadata creation workflow Discovery functionality the metadata will support Describe what metadata already exists for these materials
70
10/17/06L566 Fall 200670 In a grant proposal (2) Indicate who will do the metadata creation work Give reasonable cost estimates The more planning you do, the more likely you are to Receive funding Complete the project on schedule Complete the project within your budget
71
10/17/06L566 Fall 200671 That’s all for today! jenlrile@indiana.edu These presentation slides: Handout:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.