An Inquiry and Analysis of Metadata Utilization A Case Study of MARC 2005 ASIS&T Annual Meeting, November 1, 2005, Charlotte, North Carolina William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603
Moen ASIS&T Charlotte, NC-- November 1, Two quality criteria Fullness/completeness Usefulness
Moen ASIS&T Charlotte, NC-- November 1, Context for the initial analysis Z39.50 Interoperability Testbed project A Institute of Museum and Library Services National Leadership Grant Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing Interoperability across library online catalogs Indexing of MARC records to support searching Richness of MARC content designation available Inform indexing guidelines and policies
Moen ASIS&T Charlotte, NC-- November 1, Indexing & MARC Indexing Guidelines to Support Z39.50 Profile Searches (available on Z-Interop website) Identified all MARC 21 fields/subfields that can contain author, title, or subject data Author-related fields/subfields : 119 AuthorTitle-related fields/subfields: 21 Title-related fields/subfields: 253 Subject-related fields/subfields: 144
Moen ASIS&T Charlotte, NC-- November 1, Z-Interop test dataset Books: 91% Cartographic Materials: < 1% Electronic resources: < 1% Archival/Mixed Materials: <1% Sound recordings: 4% Visual Materials: 1% Serials: 3% Approximately 1% sample of MARC records from OCLC’s WorldCat database Weighted sampling based on number of libraries “holding” the object represented by the record 419,657 total MARC records 89% of records “full level” cataloging Formats represented in test dataset
Moen ASIS&T Charlotte, NC-- November 1, MARC 21 content designation MARC 21 Field Groups Currently Defined ObsoleteTotalMARC 1972 (Books Format Only) 00x6173 0xx xx xx xx xx xx xx xx xx TOTAL
Moen ASIS&T Charlotte, NC-- November 1, Content designation in dataset MARC 21 Field Groups Currently Defined ObsoleteUnlikely Used Total 00x6006 0xx xx xx xx xx xx xx xx xx TOTAL
Moen ASIS&T Charlotte, NC-- November 1, Summary frequency results Frequency# of Fields/Subfields% of All Occurrences > 600, % 500,000 > 599,99900% 400,000 > 499, % 300,000 > 399, % 200,000 > 299, % 100,000 > 199, % TOTAL3679.5% Total number of fields/subfields occurring in dataset = 13,849,499 Only 4% of all fields/subfields account for 80% of all occurrences or 96% of all fields/subfields account for 20% of all occurrences
Moen ASIS&T Charlotte, NC-- November 1, Characteristics of top 36 Most frequently occurring: 650 $a [Subject data] 2 nd most frequently occurring: 040 $d [Cataloging source] 3 rd & 4 th most frequently occurring: 260 $a & $b [Publication information] 5 th most frequently occurring: 245 $a [Title] Contain data useful to end users: 28 Contain control numbers, etc.: 5 Contain data useful to catalogers: 3 Top 36 fields/subfields
Moen ASIS&T Charlotte, NC-- November 1, Implications for indexing 537 fields/subfields contain author, title, subject data 381 of these actually occur in Z-Interop dataset Total occurrences of the 381 = 4,397, of the 381 (5%) account for 80% of all occurrences 9 of 19 are subject-related 5 of 19 are author-related 5 of 19 are title-related Preliminary testing using only 19 indexed fields: 95% - 100% of correct records retrieved!
Moen ASIS&T Charlotte, NC-- November 1, The MCDU Project The M ARC C ontent D esignation U tilization Project What is the extent of catalogers’ use of content designation available in MARC 21? Develop and implement systematic methods, procedures, and software tools to produce reliable and valid analysis of MARC 21 content designation use MARC record as artifact of cataloging enterprise FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE…
Moen ASIS&T Charlotte, NC-- November 1, The MCDU dataset & analysis 56 million MARC records – all WorldCat bib records Parsed and stored in MySQL 20 databases LC and Non-LC created records 10 databases each based on type of record/format Frequency counts of all fields/subfields Non-LC Book Format field occurrence results
Moen ASIS&T Charlotte, NC-- November 1, Making sense of the numbers The numbers don’t stand on their own – contextualizing, qualifying, exploring, understanding Metadata quality – Fullness/completeness Identify core elements of bibliographic records based on the analysis of format-specific samples and compare with existing recommendations for core records Metadata quality – Usefulness Comparing the FRBR conceptual framework’s user tasks, MARC content designation supporting those tasks, and utilization of that content designation in the records
Moen ASIS&T Charlotte, NC-- November 1, References MARC Content Designation Utilization Project Z39.50 Interoperability Testbed