MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX South Central Unicorn Users Group Annual Conference, October 17, 2003 Austin, Texas
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Overview Context for the analysis -- interoperability Findings from the analysis Indexing and MARC Discussion
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Context for the analysis Interoperability across library online catalogs Indexing of MARC records to support searching Richness of MARC content designation available Indexing guidelines prepared for the Z39.50 Interoperability Testbed (Z-Interop) Implications for indexing guidelines and policies
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Interoperability Systems and organizations will interoperate! One should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise opportunities for exchange and re-use of information, whether internally or externally. Paul Miller, 2000
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Factors affecting interoperability Multiple and disparate systems operating systems, information retrieval systems, etc. Multiple protocols Z39.50, HTTP, SOAP, etc. Multiple data formats, syntax, metadata schemes MARC 21, UNIMARC, XML, ISBD/AACR2-based, Dublin Core Multiple vocabularies, ontologies, disciplines LCSH, MESH, AAT Multiple languages and character sets Indexing, word normalization, and word extraction policies
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Information communities Community agreements exist (e.g., standards, rules, etc.) Interoperability factors reduced Interoperability more easily achieved Do we need additional agreements regarding indexing policies to improve interoperability? Libraries as Focal Community Relative homogeneity of data and systems Standards-based MARC records Content and structure prescribed by AACR Commonly understood access points Use of controlled vocabularies
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Interoperability testbed project Realizing the Vision of Networked Access to Library Resources: An Applied Research and Demonstration Project to Establish and Operate a Z39.50 Interoperability Testbed A Institute of Museum and Library Services National Leadership Grant Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE…
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Threats to Z39.50 interoperability Differences in implementation of the standard Differences in local information retrieval systems Search functionality Indexing policies These threats can be addressed by Z39.50 specifications and configuration (i.e., profiles) Enhancing local information retrieval systems Recommendations for local indexing decisions
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Components of the testbed Test dataset 400,000+ MARC 21 records from OCLC’s WorldCat Z39.50 reference implementations Z-client (Bookwhere), Z-server & information retrieval system (Sirsi Unicorn) Test scenarios & searches Searches with known result records from dataset Benchmarks Results of test searches using reference implementations
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, MARC Record structure for encoding data for machine processing Standard structure (ANSI/NISO Z39.2/ISO 2709) Leader Directory map 3-digit tag to identify a field 2 indicator values to provide additional processing information 1 or more delimiters/codes to identify subfields Content designation: Semantics MARC $a [title] $h [format] : $b [subtitle] Rules Anglo-American Cataloguing Rules and others
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, MARC 21 content designation MARC 21 Field Groups Currently Defined ObsoleteTotalMARC 1972 (Books Format Only) 00x6173 0xx xx xx xx xx xx xx xx xx TOTAL
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Z-Interop test dataset Books: 91% Cartographic Materials: < 1% Electronic resources: < 1% Archival/Mixed Materials: <1% Sound recordings: 4% Visual Materials: 1% Serials: 3% Approximately 1% sample of MARC records from OCLC’s WorldCat database Weighted sampling based on number of libraries “holding” the object represented by the record 419,657 total MARC records 89% of records “full level” cataloging Formats represented in test dataset
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, MARC record LDR01019cam ^ 001 ocm ^ 003 OCoLC^ ^ s1963 nyu b eng ^ 010 $a ^ 040 $aDLC $cDLC ^ $aHV700.5 $b.N37 ^ $a362.7/3 ^ $aNational Study Service. ^ $aIllegitimacy and adoption in Maine : $breport of a study made for the Maine Committee on Children and Youth. ^ 260 $a[New York], $c1963. ^ 300 $a24 p. ; $c28 cm. ^ 500 $aCover title. ^ 504 $aBibliographical footnotes. ^ $aIllegitimacy $zMaine. ^ $aAdoption $zMaine. ^ $aMaine. $bCommittee on Children and Youth. ^
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Decomposing MARC Records OCLC # Tag1 st Ind 2 nd Ind SubFldFld Pos SubFld Pos Word Pos Word Ocm OCoLC 31102a1111 National 31102a1112 Study 31102a1113 Service a1211 Illegitimacy a1212 and a1213 Adoption b1221 Report 36500a1711 Illegitimacy 36500z1721 Maine 400,000 MARC21 records = 33 million decomposed records
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Content designation in dataset MARC 21 Field Groups Currently Defined ObsoleteUnlikely Used Total 00x6006 0xx xx xx xx xx xx xx xx xx TOTAL
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Summary frequency results Frequency# of Fields/Subfields% of All Occurrences > 600, % 500,000 > 599,99900% 400,000 > 499, % 300,000 > 399, % 200,000 > 299, % 100,000 > 199, % TOTAL3679.5% Total number of fields/subfields occurring in dataset = 13,849,499 Only 4% of all fields/subfields account for 80% of all occurrences or 96% of all fields/subfields account for 20% of all occurrences
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Characteristics of top 36 Most frequently occurring: 650 $a [Subject data] 2 nd most frequently occurring: 040 $d [Cataloging source] 3 rd & 4 th most frequently occurring: 260 $a & $b [Publication information] 5 th most frequently occurring: 245 $a [Title] Contain data useful to end users: 28 Contain control numbers, etc.: 5 Contain data useful to catalogers: 3
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Indexing & MARC Indexing Guidelines to Support Z39.50 Profile Searches Indexing Guidelines to Support Z39.50 Profile Searches Identified all MARC 21 fields/subfields that may contain author, title, or subject data Author-related fields/subfields : 119 AuthorTitle-related fields/subfields: 21 Title-related fields/subfields: 253 Subject-related fields/subfields: fields/subfields contain author, title, subject data Usefulness of indexing all possible fields?
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Occurrences in test dataset 381 occur one or more times in Z-Interop dataset Author, title, or subject fields/subfields in Z-Interop dataset Author-related fields/subfields : 86 AuthorTitle-related fields/subfields: 16 Title-related fields/subfields: 178 Subject-related fields/subfields: of the 381 (5%) account for 80% of all occurrences 9 of 19 are subject-related 5 of 19 are author-related 5 of 19 are title-related The 19 fields/subfields
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, Implications for indexing What difference does indexing decisions make? Preliminary testing using the 19 fields/subfields: 95% - 100% of correct records retrieved! How much time would be saved in setting up indexing policies? Is there a systematic method to identify the “best” fields/subfields to index? Per format of materials? Per user (librarians and end users) needs? Good enough search results?
Moen South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, References Z39.50 Interoperability Testbed Indexing Guidelines to Support Z39.50 Profile Searches delines1Feb2002.pdf delines1Feb2002.pdf