Presentation is loading. Please wait.

Presentation is loading. Please wait.

An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Similar presentations


Presentation on theme: "An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006."— Presentation transcript:

1 An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006

2 10/17/06L566 Fall 20062 Topics we’ll cover Choosing descriptive metadata standards Choosing controlled vocabularies Using controlled vocabularies to enhance searching and browsing Wrapping it all up

3 Choosing descriptive metadata standards

4 10/17/06L566 Fall 20064 Descriptive metadata Enables users to find relevant materials Used by many different knowledge domains Many potential representations Controlled by  Data structure standards  Data content standards  Syntax encoding schemes  Vocabulary encoding schemes

5 10/17/06L566 Fall 20065 Some data structure standards Dublin Core (DC)  Unqualified (simple)  Qualified MAchine Readable Cataloging (MARC) MARC in XML (MARCXML) Metadata Object Description Schema (MODS)

6 10/17/06L566 Fall 20066 How do I pick one? (1) Institution  Nature of holding institution  Resources available for metadata creation  What others in the community are doing  Formats supported by your delivery software The standard  Purpose  Structure  Context  History

7 10/17/06L566 Fall 20067 How do I pick one? (2) Materials  Genre  Format  Likely audiences  What metadata already exists for these materials Project goals  Robustness needed for the given materials and users  Describing multiple versions  Mechanisms for providing relationships between records  Plan for interoperability, including repeatability of elements More information on handouthandout

8 10/17/06L566 Fall 20068 Dublin Core (DC) 15-element set National and international standard  2001: Released as ANSI/NISO Z39.85ANSI/NISO Z39.85  2003: Released as ISO 15836ISO 15836 Maintained by the Dublin Core Metadata Initiative (DCMI) Other players  DCMI Working Groups  DC Usage Board

9 10/17/06L566 Fall 20069 DCMI mission The Dublin Core Metadata Initiative provides simple standards to facilitate the finding, sharing and management of information. DCMI does this by:  Developing and maintaining international standards for describing resources  Supporting a worldwide community of users and developers  Promoting widespread use of Dublin Core solutions

10 10/17/06L566 Fall 200610 DC Principles “Core” across all knowledge domains No element required All elements repeatable 1:1 principle

11 10/17/06L566 Fall 200611 DCMI Abstract Model Released in 2005 “A reference model against which particular DC encoding guidelines can be compared” Heavily influenced by RDF thinking New XML and RDF encodings under development to conform to the abstract model Two schools of thought on its development  Clarifies model underlying the metadata standard  Overly complicates a standard intended to be simple

12 10/17/06L566 Fall 200612 DC encodings HTML XML RDF [Spreadsheets] [Databases]

13 10/17/06L566 Fall 200613 Content/value standards for DC None required Some elements recommend a content or value standard as a best practice  Relation  Source  Subject  Type  Coverage  Date  Format  Language  Identifier

14 10/17/06L566 Fall 200614 Some limitations of simple DC Can’t indicate a main title vs. other subordinate titles No method for specifying creator roles W3CDTF format can’t indicate date ranges or uncertainty Can’t by itself provide robust record relationships

15 10/17/06L566 Fall 200615 Good times to use DC Cross-collection searching Cross-domain discovery Metadata sharing Describing some types of simple resources Metadata creation by novices

16 DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML Field labelsText Reliance on AACR None Common method of creation By novices, by specialists, and by derivation

17 10/17/06L566 Fall 200617 Qualified Dublin Core (QDC) Adds some increased specificity to Unqualified Dublin Core Same governance structure as DC Same encodings as DC Same content/value standards as DC Listed in DMCI TermsDMCI Terms Additional principles  Extensibility  Dumb-down principle

18 10/17/06L566 Fall 200618 Types of DC qualifiers Additional elements Element refinements Encoding schemes  Vocabulary encoding schemes  Syntax encoding schemes

19 10/17/06L566 Fall 200619 DC qualifier status Recommended Conforming Obsolete Registered

20 10/17/06L566 Fall 200620 Limitations of QDC Widely misunderstood No method for specifying creator roles W3CDTF format can’t indicate date ranges or uncertainty Split across 3 XML schemas

21 10/17/06L566 Fall 200621 Best times to use QDC More specificity needed than simple DC, but not a fundamentally different approach to description Want to share DC with others, but need a few extensions for your local environment Describing some types of simple resources Metadata creation by novices

22 DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML Field labelsText Reliance on AACR None Common method of creation By novices, by specialists, and by derivation

23 10/17/06L566 Fall 200623 MAchine Readable Cataloging (MARC) Format for the records in IUCAT, WorldCat and other library catalogs Used for library metadata since 1960s  Adopted as national standard in 1971  Adopted as international standard in 1973 Maintained by:  Network Development and MARC Standards Office at the Library of Congress  Standards and the Support Office at the National Library of Canada

24 10/17/06L566 Fall 200624 More about MARC Actually a family of MARC standards throughout the world  U.S. & Canada use MARC21  MARC Bibliographic is for descriptive metadata Structured as a binary interchange format  ANSI/NISO Z39.2  ISO 2709 Field names  Numeric fields  Alphabetic subfields

25 10/17/06L566 Fall 200625 Content/value standards for MARC None required by the format itself But US record creation practice relies heavily on:  AACR2r  ISBD  LCNAF  LCSH

26 10/17/06L566 Fall 200626 Limitations of MARC Use of all its potential is time-consuming OPACs don’t make full use of all possible data OPACs virtually the only systems to use MARC data Requires highly-trained staff to create Local practice differs greatly

27 10/17/06L566 Fall 200627 Good times to use MARC Integration with other records in OPAC Resources are like those traditionally found in library catalogs Maximum compatibility with other libraries is needed Have expert catalogers for metadata creation

28 DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] Field labelsText Numeric Reliance on AACR None Strong Common method of creation By novices, by specialists, and by derivation By specialists

29 10/17/06L566 Fall 200629 MARC in XML (MARCXML) Copies the exact structure of MARC21 in an XML syntax  Numeric fields  Alphabetic subfields Implicit assumption that content/value standards are the same as in MARC

30 10/17/06L566 Fall 200630 Limitations of MARCXML Not appropriate for direct data entry Extremely verbose syntax Full content validation requires tools external to XML Schema conformance

31 10/17/06L566 Fall 200631 Good times to use MARCXML As a transition format between a MARC record and another XML-encoded metadata format Materials lend themselves to library-type description Need more robustness than DC offers Want XML representation to store within larger digital object but need lossless conversion to MARC

32 DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] XML Field labelsText Numeric Reliance on AACR None Strong Common method of creation By novices, by specialists, and by derivation By specialists By derivation

33 10/17/06L566 Fall 200633 Metadata Object Description Schema (MODS) Developed and managed by the Library of Congress Network Development and MARC Standards Office First released for trial use June 2002 MODS 3.2 released June 2006 “Schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.”

34 10/17/06L566 Fall 200634 Differences between MODS and MARC MODS is “MARC-like” but intended to be simpler Textual tag names Encoded in XML Some specific changes  Some regrouping of elements  Removes some elements  Adds some elements

35 10/17/06L566 Fall 200635 Content/value standards for MODS Some elements indicate a given content/value standard should be used  Generally follows MARC/AACR2/ISBD conventions  But not all enforced by the MODS XML schema Authority attribute available on some elements

36 10/17/06L566 Fall 200636 Limitations of MODS No lossless round-trip conversion from and to MARC Still largely implemented by library community only Some semantics of MARC lost Format still growing to meet the needs of the digital library community

37 10/17/06L566 Fall 200637 Good times to use MODS Materials lend themselves to library-type description Want to reach both library and non-library audiences Need more robustness than DC offers Want XML representation to store within larger digital object

38 DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] XML Field labelsText Numeric Text Reliance on AACR None Strong Implied Common method of creation By novices, by specialists, and by derivation By specialists By derivation By specialists and by derivation

39 10/17/06L566 Fall 200639 Picking a format Consider all options Match format to the types of discovery you want to support Your choice has to fit in your larger technological infrastructure  Realize the constraints you’re operating under  Or, expand infrastructure! Don’t have to choose just one, can use several for different purposes

40 10/17/06L566 Fall 200640 Mapping between metadata formats Also called “crosswalking” To create “views” of metadata for specific purposes Mapping from robust format to more general format is common Mapping from general format to more robust format is ineffective

41 10/17/06L566 Fall 200641 Types of mapping logic Mapping the complete contents of one field to another Splitting multiple values in a single local field into multiple fields in the target schema Translating anomalous local practices into a more generally useful value Splitting data in one field into two or more fields Transforming data values Boilerplate values to include in output schema

42 10/17/06L566 Fall 200642 Common mapping pitfalls Cramming in too much information Leaving in trailing punctuation Missing context of records Meaningless placeholder data ALWAYS remember the purpose of the metadata you are creating!

43 10/17/06L566 Fall 200643 No, really, which one do I pick? It depends. Sorry. Be as robust as you can afford Plan for future uses of the metadata you create Leverage existing expertise as much as possible Focus on content and value standards as much as possible

44 10/17/06L566 Fall 200644 More information Dublin Core  DC Element Set version 1.1 DC Element Set version 1.1  DCMI Metadata Terms DCMI Metadata Terms MODS MARC MARCXML

45 Break time!

46 Choosing controlled vocabularies

47 10/17/06L566 Fall 200647 Some characteristics of CVs Also known as “vocabulary encoding schemes” Enumerated lists of all possible choices for a field value Often organized into a syndetic structure Usually intended to be human-readable

48 10/17/06L566 Fall 200648 CVs in libraries Many library CVs grow constantly with catalogers contributing new terms Many library CVs use content standards to dictate the form of headings Fields that use CVs are said to be under “authority control”

49 10/17/06L566 Fall 200649 Traditional uses of CVs in library catalog records Collocation Disambiguation Interoperability BROWSING! (Although this isn’t used much in libraries…)

50 10/17/06L566 Fall 200650 Other considerations Human cataloging using CVs is expensive Developing and maintaining CVs is expensive Current library systems usually rely on the same string being present in all records rather than true relational structures linking records to CV terms

51 10/17/06L566 Fall 200651 When a controlled vocabulary is useful User browsing of a small number of categories each with a large number of members When many different things have the same label When recall is a priority for a given access point

52 10/17/06L566 Fall 200652 Some common fields using CVs Names Places “Subjects”

53 10/17/06L566 Fall 200653 Names Seeking works by or about a certain individual is frequent Individuals are often known by many different names Many different individuals have the same name Name authority lists often create uniqueness by adding qualifiers Some example vocabularies:  Library of Congress Name Authority File (LCNAF)  Getty Union List of Artists’ Names (ULAN)

54 10/17/06L566 Fall 200654 Places Common in libraries to control place names in subjects, but not publication places Many different places with the same name Often organized hierarchically Commonly used vocabularies:  Library of Congress Subject Headings (LCSH)  Getty Thesaurus of Geographic Names (TGN)  GEONet Names Server

55 10/17/06L566 Fall 200655 “Subjects” Libraries traditionally group topic, location, genre, form, time period and other related concepts all under “subject” Often organized into a rich syndetic structure General rule is to apply the most specific heading applicable Involves subjective judgment on the part of the individual assigning the heading

56 10/17/06L566 Fall 200656 Deciding which fields to place under authority control Consider your budgetary restraints Learn about the functionalities possible in your system Identify appropriate vocabularies that meet defined needs Develop a clear plan for how the fields with controlled values will be used

57 Using controlled vocabularies to enhance searching and browsing

58 10/17/06L566 Fall 200658 Case Study: Cushman CollectionCushman Collection Funded with an Institute of Museum & Library Services (IMLS) grant ~15,000 color slides taken between 1938- 1969 Cushman provided a significant amount of description description Additional metadata created to enhance genre, subject and geographic access

59 10/17/06L566 Fall 200659 Metadata for the Cushman Collection Cushman’s description  Dates  Location  Names TGM I – LC Thesaurus for Graphic Materials: Subject Terms TGM II - LC Thesaurus for Graphic Materials: Genre & Physical Characteristics TGN – Getty Thesaurus of Geographic Names We wanted to use this high-quality metadata to improve on past search systems

60 10/17/06L566 Fall 200660 TGM I: Subject Terms Strengths and Weaknesses Strengths include:  Pre-defined relationships between concepts  Some lead-in vocabulary Weaknesses include:  Syndetic relationship lacking for new terms  Language not user-friendly  Not enough lead-in vocabulary  Form and number of top-level categories not useful for a browse structure

61 10/17/06L566 Fall 200661 User studies performed Two types  Group walkthroughs of prototypes  Task scenario study Some functionality suggested by the studies  Refinement while searching  Search suggestions  Faceted browsing  Browsing on subject terms at all levels  CV interaction

62 10/17/06L566 Fall 200662 Browsing Image Collections Research shows:  Browsing is exploratory (Bawden)  Guided, flexible browsing in context works (Flamenco and SI Art Image Browser projects) Our usability studies show:  Structure is important  Contents should be easily exposed  Flexible and combinatorial browsing is desired  Browsing cultivates searching

63 10/17/06L566 Fall 200663 Searching Image Collections Research shows:  Using thesaurus structure helps searching (Greenberg) Automatic expansion of synonyms and narrower terms User-initiated expansion of broader and related terms Our Usability studies show:  Referencing an A-Z list with no lead-in terms for searching is NOT helpful at all  Concerns about word choice  Iterative reformulation of queries in context is desired

64 10/17/06L566 Fall 200664 Cushman Specifications: Browsing Date Genre Subjects (hierarchical)  Retrieval of all records with narrower terms Location (hierarchical) Combination of categories

65 10/17/06L566 Fall 200665 Cushman Specifications: Searching Integrated search against BOTH “free-text” descriptions and thesaurus Integrated search Mapping from lead-in vocabulary Retrieval of all records with narrower terms User-initiated broadening and narrowing User-initiated

66 Wrapping it all up

67 10/17/06L566 Fall 200667 What next? After choosing metadata standards and controlled vocabularies  Figure out where metadata creation fits in the overall workflow  Write metadata creation guidelines  Design and implement a metadata creation process

68 10/17/06L566 Fall 200668 And there’s more Other types of metadata  Content markup  Technical metadata  Rights metadata  Preservation metadata  Structural metadata Specialized metadata standards When to create a local metadata format

69 10/17/06L566 Fall 200669 In a grant proposal (1) Give specific information on all the decisions you’ve made  Metadata standards  Controlled vocabularies  Metadata creation workflow  Discovery functionality the metadata will support Describe what metadata already exists for these materials

70 10/17/06L566 Fall 200670 In a grant proposal (2) Indicate who will do the metadata creation work Give reasonable cost estimates The more planning you do, the more likely you are to  Receive funding  Complete the project on schedule  Complete the project within your budget

71 10/17/06L566 Fall 200671 That’s all for today! jenlrile@indiana.edu These presentation slides: Handout:


Download ppt "An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006."

Similar presentations


Ads by Google