An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Slides:



Advertisements
Similar presentations
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Advertisements

DC Architecture WG meeting Monday Sept 12 Slot 1: Slot 2: Location: Seminar Room 4.1.E01.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Catherine Worrall Slide Library Co-ordinator, University College Falmouth.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
A Registry for controlled vocabularies at the Library of Congress
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and Searching of an Online Photograph Collection Michelle Dalmau,
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
Introduction to Metadata for Cultural Heritage Organizations Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Metadata for Visual Resources Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Jenn Riley Metadata Librarian IU Digital Library Program New Developments in Cataloging.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Introduction to Metadata for Cultural Heritage Organizations Jenn Riley Metadata Librarian Indiana University Digital Library Program For technical support:
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Metadata for Music: Understanding the Landscape Jenn Riley Indiana University Digital Library Program.
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
PACSCL Consortial Survey Initiative Group Training Session February 12, 2008 at The Historical Society of Pennsylvania.
Introduction to metadata
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Libraries and Museums Jenn Riley Metadata Librarian Indiana University Digital Library.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
A Whirlwind Tour Through Part of the Metadata Landscape Jenn Riley Metadata Librarian IU Digital Library Program.
Introduction to the Semantic Web and Linked Data
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Appropriate representation of the resource through metadata Metadata as a view of the resource Standards promote interoperability Appropriate formats Appropriate.
Jenn Riley Metadata Librarian IU Digital Library Program
Metadata (and cataloging?) Jenn Riley Metadata Librarian IU Digital Library Program.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Possibilities for Social Tagging in a VR Collection Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Metadata for your Digital Collections Jenn Riley Metadata Librarian IU Digital Library Program.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
OAI metadata: why and how Jenn Riley Metadata Librarian Indiana University.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Metadata Workflows. Metadata Specialist Scenario The typical digital library development situation facing the metadata specialist: –We have some functional.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
Metadata Standards - Types
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Introduction to Metadata
Ontologies for music from a digital library practitioner’s perspective
Attributes and Values Describing Entities.
A Whirlwind Tour Through Part of the Metadata Landscape
PREMIS Tools and Services
Introduction to Metadata
Some Options for Non-MARC Descriptive Metadata
Attributes and Values Describing Entities.
Presentation transcript:

An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006

10/17/06L566 Fall Topics we’ll cover Choosing descriptive metadata standards Choosing controlled vocabularies Using controlled vocabularies to enhance searching and browsing Wrapping it all up

Choosing descriptive metadata standards

10/17/06L566 Fall Descriptive metadata Enables users to find relevant materials Used by many different knowledge domains Many potential representations Controlled by  Data structure standards  Data content standards  Syntax encoding schemes  Vocabulary encoding schemes

10/17/06L566 Fall Some data structure standards Dublin Core (DC)  Unqualified (simple)  Qualified MAchine Readable Cataloging (MARC) MARC in XML (MARCXML) Metadata Object Description Schema (MODS)

10/17/06L566 Fall How do I pick one? (1) Institution  Nature of holding institution  Resources available for metadata creation  What others in the community are doing  Formats supported by your delivery software The standard  Purpose  Structure  Context  History

10/17/06L566 Fall How do I pick one? (2) Materials  Genre  Format  Likely audiences  What metadata already exists for these materials Project goals  Robustness needed for the given materials and users  Describing multiple versions  Mechanisms for providing relationships between records  Plan for interoperability, including repeatability of elements More information on handouthandout

10/17/06L566 Fall Dublin Core (DC) 15-element set National and international standard  2001: Released as ANSI/NISO Z39.85ANSI/NISO Z39.85  2003: Released as ISO 15836ISO Maintained by the Dublin Core Metadata Initiative (DCMI) Other players  DCMI Working Groups  DC Usage Board

10/17/06L566 Fall DCMI mission The Dublin Core Metadata Initiative provides simple standards to facilitate the finding, sharing and management of information. DCMI does this by:  Developing and maintaining international standards for describing resources  Supporting a worldwide community of users and developers  Promoting widespread use of Dublin Core solutions

10/17/06L566 Fall DC Principles “Core” across all knowledge domains No element required All elements repeatable 1:1 principle

10/17/06L566 Fall DCMI Abstract Model Released in 2005 “A reference model against which particular DC encoding guidelines can be compared” Heavily influenced by RDF thinking New XML and RDF encodings under development to conform to the abstract model Two schools of thought on its development  Clarifies model underlying the metadata standard  Overly complicates a standard intended to be simple

10/17/06L566 Fall DC encodings HTML XML RDF [Spreadsheets] [Databases]

10/17/06L566 Fall Content/value standards for DC None required Some elements recommend a content or value standard as a best practice  Relation  Source  Subject  Type  Coverage  Date  Format  Language  Identifier

10/17/06L566 Fall Some limitations of simple DC Can’t indicate a main title vs. other subordinate titles No method for specifying creator roles W3CDTF format can’t indicate date ranges or uncertainty Can’t by itself provide robust record relationships

10/17/06L566 Fall Good times to use DC Cross-collection searching Cross-domain discovery Metadata sharing Describing some types of simple resources Metadata creation by novices

DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML Field labelsText Reliance on AACR None Common method of creation By novices, by specialists, and by derivation

10/17/06L566 Fall Qualified Dublin Core (QDC) Adds some increased specificity to Unqualified Dublin Core Same governance structure as DC Same encodings as DC Same content/value standards as DC Listed in DMCI TermsDMCI Terms Additional principles  Extensibility  Dumb-down principle

10/17/06L566 Fall Types of DC qualifiers Additional elements Element refinements Encoding schemes  Vocabulary encoding schemes  Syntax encoding schemes

10/17/06L566 Fall DC qualifier status Recommended Conforming Obsolete Registered

10/17/06L566 Fall Limitations of QDC Widely misunderstood No method for specifying creator roles W3CDTF format can’t indicate date ranges or uncertainty Split across 3 XML schemas

10/17/06L566 Fall Best times to use QDC More specificity needed than simple DC, but not a fundamentally different approach to description Want to share DC with others, but need a few extensions for your local environment Describing some types of simple resources Metadata creation by novices

DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML Field labelsText Reliance on AACR None Common method of creation By novices, by specialists, and by derivation

10/17/06L566 Fall MAchine Readable Cataloging (MARC) Format for the records in IUCAT, WorldCat and other library catalogs Used for library metadata since 1960s  Adopted as national standard in 1971  Adopted as international standard in 1973 Maintained by:  Network Development and MARC Standards Office at the Library of Congress  Standards and the Support Office at the National Library of Canada

10/17/06L566 Fall More about MARC Actually a family of MARC standards throughout the world  U.S. & Canada use MARC21  MARC Bibliographic is for descriptive metadata Structured as a binary interchange format  ANSI/NISO Z39.2  ISO 2709 Field names  Numeric fields  Alphabetic subfields

10/17/06L566 Fall Content/value standards for MARC None required by the format itself But US record creation practice relies heavily on:  AACR2r  ISBD  LCNAF  LCSH

10/17/06L566 Fall Limitations of MARC Use of all its potential is time-consuming OPACs don’t make full use of all possible data OPACs virtually the only systems to use MARC data Requires highly-trained staff to create Local practice differs greatly

10/17/06L566 Fall Good times to use MARC Integration with other records in OPAC Resources are like those traditionally found in library catalogs Maximum compatibility with other libraries is needed Have expert catalogers for metadata creation

DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] Field labelsText Numeric Reliance on AACR None Strong Common method of creation By novices, by specialists, and by derivation By specialists

10/17/06L566 Fall MARC in XML (MARCXML) Copies the exact structure of MARC21 in an XML syntax  Numeric fields  Alphabetic subfields Implicit assumption that content/value standards are the same as in MARC

10/17/06L566 Fall Limitations of MARCXML Not appropriate for direct data entry Extremely verbose syntax Full content validation requires tools external to XML Schema conformance

10/17/06L566 Fall Good times to use MARCXML As a transition format between a MARC record and another XML-encoded metadata format Materials lend themselves to library-type description Need more robustness than DC offers Want XML representation to store within larger digital object but need lossless conversion to MARC

DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] XML Field labelsText Numeric Reliance on AACR None Strong Common method of creation By novices, by specialists, and by derivation By specialists By derivation

10/17/06L566 Fall Metadata Object Description Schema (MODS) Developed and managed by the Library of Congress Network Development and MARC Standards Office First released for trial use June 2002 MODS 3.2 released June 2006 “Schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.”

10/17/06L566 Fall Differences between MODS and MARC MODS is “MARC-like” but intended to be simpler Textual tag names Encoded in XML Some specific changes  Some regrouping of elements  Removes some elements  Adds some elements

10/17/06L566 Fall Content/value standards for MODS Some elements indicate a given content/value standard should be used  Generally follows MARC/AACR2/ISBD conventions  But not all enforced by the MODS XML schema Authority attribute available on some elements

10/17/06L566 Fall Limitations of MODS No lossless round-trip conversion from and to MARC Still largely implemented by library community only Some semantics of MARC lost Format still growing to meet the needs of the digital library community

10/17/06L566 Fall Good times to use MODS Materials lend themselves to library-type description Want to reach both library and non-library audiences Need more robustness than DC offers Want XML representation to store within larger digital object

DC [record]record QDC [record]record [collection]collection MARC [record]record [collection]collection MARCXML [record]record MODS [record]record [collection]collection Record format XML RDF (X)HTML XML RDF (X)HTML ISO 2709 [ANSI Z39.2] XML Field labelsText Numeric Text Reliance on AACR None Strong Implied Common method of creation By novices, by specialists, and by derivation By specialists By derivation By specialists and by derivation

10/17/06L566 Fall Picking a format Consider all options Match format to the types of discovery you want to support Your choice has to fit in your larger technological infrastructure  Realize the constraints you’re operating under  Or, expand infrastructure! Don’t have to choose just one, can use several for different purposes

10/17/06L566 Fall Mapping between metadata formats Also called “crosswalking” To create “views” of metadata for specific purposes Mapping from robust format to more general format is common Mapping from general format to more robust format is ineffective

10/17/06L566 Fall Types of mapping logic Mapping the complete contents of one field to another Splitting multiple values in a single local field into multiple fields in the target schema Translating anomalous local practices into a more generally useful value Splitting data in one field into two or more fields Transforming data values Boilerplate values to include in output schema

10/17/06L566 Fall Common mapping pitfalls Cramming in too much information Leaving in trailing punctuation Missing context of records Meaningless placeholder data ALWAYS remember the purpose of the metadata you are creating!

10/17/06L566 Fall No, really, which one do I pick? It depends. Sorry. Be as robust as you can afford Plan for future uses of the metadata you create Leverage existing expertise as much as possible Focus on content and value standards as much as possible

10/17/06L566 Fall More information Dublin Core  DC Element Set version 1.1 DC Element Set version 1.1  DCMI Metadata Terms DCMI Metadata Terms MODS MARC MARCXML

Break time!

Choosing controlled vocabularies

10/17/06L566 Fall Some characteristics of CVs Also known as “vocabulary encoding schemes” Enumerated lists of all possible choices for a field value Often organized into a syndetic structure Usually intended to be human-readable

10/17/06L566 Fall CVs in libraries Many library CVs grow constantly with catalogers contributing new terms Many library CVs use content standards to dictate the form of headings Fields that use CVs are said to be under “authority control”

10/17/06L566 Fall Traditional uses of CVs in library catalog records Collocation Disambiguation Interoperability BROWSING! (Although this isn’t used much in libraries…)

10/17/06L566 Fall Other considerations Human cataloging using CVs is expensive Developing and maintaining CVs is expensive Current library systems usually rely on the same string being present in all records rather than true relational structures linking records to CV terms

10/17/06L566 Fall When a controlled vocabulary is useful User browsing of a small number of categories each with a large number of members When many different things have the same label When recall is a priority for a given access point

10/17/06L566 Fall Some common fields using CVs Names Places “Subjects”

10/17/06L566 Fall Names Seeking works by or about a certain individual is frequent Individuals are often known by many different names Many different individuals have the same name Name authority lists often create uniqueness by adding qualifiers Some example vocabularies:  Library of Congress Name Authority File (LCNAF)  Getty Union List of Artists’ Names (ULAN)

10/17/06L566 Fall Places Common in libraries to control place names in subjects, but not publication places Many different places with the same name Often organized hierarchically Commonly used vocabularies:  Library of Congress Subject Headings (LCSH)  Getty Thesaurus of Geographic Names (TGN)  GEONet Names Server

10/17/06L566 Fall “Subjects” Libraries traditionally group topic, location, genre, form, time period and other related concepts all under “subject” Often organized into a rich syndetic structure General rule is to apply the most specific heading applicable Involves subjective judgment on the part of the individual assigning the heading

10/17/06L566 Fall Deciding which fields to place under authority control Consider your budgetary restraints Learn about the functionalities possible in your system Identify appropriate vocabularies that meet defined needs Develop a clear plan for how the fields with controlled values will be used

Using controlled vocabularies to enhance searching and browsing

10/17/06L566 Fall Case Study: Cushman CollectionCushman Collection Funded with an Institute of Museum & Library Services (IMLS) grant ~15,000 color slides taken between Cushman provided a significant amount of description description Additional metadata created to enhance genre, subject and geographic access

10/17/06L566 Fall Metadata for the Cushman Collection Cushman’s description  Dates  Location  Names TGM I – LC Thesaurus for Graphic Materials: Subject Terms TGM II - LC Thesaurus for Graphic Materials: Genre & Physical Characteristics TGN – Getty Thesaurus of Geographic Names We wanted to use this high-quality metadata to improve on past search systems

10/17/06L566 Fall TGM I: Subject Terms Strengths and Weaknesses Strengths include:  Pre-defined relationships between concepts  Some lead-in vocabulary Weaknesses include:  Syndetic relationship lacking for new terms  Language not user-friendly  Not enough lead-in vocabulary  Form and number of top-level categories not useful for a browse structure

10/17/06L566 Fall User studies performed Two types  Group walkthroughs of prototypes  Task scenario study Some functionality suggested by the studies  Refinement while searching  Search suggestions  Faceted browsing  Browsing on subject terms at all levels  CV interaction

10/17/06L566 Fall Browsing Image Collections Research shows:  Browsing is exploratory (Bawden)  Guided, flexible browsing in context works (Flamenco and SI Art Image Browser projects) Our usability studies show:  Structure is important  Contents should be easily exposed  Flexible and combinatorial browsing is desired  Browsing cultivates searching

10/17/06L566 Fall Searching Image Collections Research shows:  Using thesaurus structure helps searching (Greenberg) Automatic expansion of synonyms and narrower terms User-initiated expansion of broader and related terms Our Usability studies show:  Referencing an A-Z list with no lead-in terms for searching is NOT helpful at all  Concerns about word choice  Iterative reformulation of queries in context is desired

10/17/06L566 Fall Cushman Specifications: Browsing Date Genre Subjects (hierarchical)  Retrieval of all records with narrower terms Location (hierarchical) Combination of categories

10/17/06L566 Fall Cushman Specifications: Searching Integrated search against BOTH “free-text” descriptions and thesaurus Integrated search Mapping from lead-in vocabulary Retrieval of all records with narrower terms User-initiated broadening and narrowing User-initiated

Wrapping it all up

10/17/06L566 Fall What next? After choosing metadata standards and controlled vocabularies  Figure out where metadata creation fits in the overall workflow  Write metadata creation guidelines  Design and implement a metadata creation process

10/17/06L566 Fall And there’s more Other types of metadata  Content markup  Technical metadata  Rights metadata  Preservation metadata  Structural metadata Specialized metadata standards When to create a local metadata format

10/17/06L566 Fall In a grant proposal (1) Give specific information on all the decisions you’ve made  Metadata standards  Controlled vocabularies  Metadata creation workflow  Discovery functionality the metadata will support Describe what metadata already exists for these materials

10/17/06L566 Fall In a grant proposal (2) Indicate who will do the metadata creation work Give reasonable cost estimates The more planning you do, the more likely you are to  Receive funding  Complete the project on schedule  Complete the project within your budget

10/17/06L566 Fall That’s all for today! These presentation slides: Handout: