MARC Content Designation and Utilization Future of MARC: Challenges and Opportunities of 21 st Century Cataloging William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Research funded by a National Leadership Grant from the Institute for Museum and Library Services. Additional support provided by the University of North Texas School of Library and Information Sciences and the Texas Center for Digital Knowledge. Inquiry and Analysis
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA2 Discussion of the future of MARC is only partially about MARC The broader digital information landscape Technologies Cataloging practices The possible diminishing market share of: Libraries in the information marketplace Library catalogs as a resource discovery tool To start…
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA3 Calhoun’s report Today, a large and growing number of students and scholars routinely bypass library catalogs in favor of other discovery tools, and the catalog represents a shrinking proportion of the universe of scholarly information. The catalog is in decline, its processes and structures are unsustainable, and change needs to be swift. Today’s research library catalogs—even those that include records for thousands of scholarly e-journals and databases—reflect only a small portion of the expanding universe of scholarly information. Library catalogs manage description and access for mostly published resources—tangible materials such as books, serials, and audiovisual media, plus licensed materials such as abstracting and indexing services, full text databases, and electronic journals and books… In contrast, the stuff of cultural heritage collections, digital assets, pre- print services and the open Web, research labs, and learning management systems remain for the most part outside the scope of the catalog.
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA4 Record format Defined by ISO 2709/ANSI Z39.2 Structural elements of the format Metadata scheme Defined by MARC 21 Fields, subfields, indicators and their semantics When we say MARC?
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA5 Requirements for a record format / metadata scheme Responding to recent developments Looking at empirical data Approaching MARC’s future
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA6 Goldsmith & Knudson’s Requirements LANL’s DL Repository Granularity lossless data mapping without losing the finer shades of meaning intrinsic to the original data Transparency necessary for seamless data interchange, requiring a standard widely known throughout the digital library community. Extensibility in order to permit changes to the general structure without breaking the whole or requiring reprocessing of already ingested materials. Tennant’s Requirements for Bibliographic Infrastructure XML-based format Modularity Hierarchy support Community-supported tool sets And others… Thinking about requirements
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA7 McCallum’s 10 format attributes for MARC Forward XML Granularity Versatility Extensibility Modularity Hierarchy support Crosswalks Tools Cooperative management Pervasive Thinking about requirements
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA8 Functional Requirements for Bibliographic Records IFLA Study Group on Functional Requirements for Bibliographic Records, –“A conceptual model for the bibliographic universe” (B. Tillett, 2003). Recent developments The aim of the study was to produce a framework that would provide a clear, precisely stated, and commonly shared understanding of what it is that the bibliographic record aims to provide information about, and what it is that we expect the record to achieve in terms of answering user needs.
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA9 Based on Entity-Relationship modeling –Entity – something that can be described –Attributes – the features of the entity that characterize it –Relationships between entities Three groups of entities in model –Group 1: Products of intellectual or artistic endeavor –Group 2: Entities responsible for the intellectual or artistic content, the physical production, etc. –Group 3: Entities that serve as the subjects of intellectual or artistic endeavor Remember: what it is that the bibliographic record aims to provide information about The FRBR model
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA10 FRBR – Group 1 Entities
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA11 FRBR -- Group 2 Entities
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA12 FRBR – Group Three Entities
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA13 Remember: what it is that we expect the record to achieve in terms of answering user needs Four user tasks: –Find: Discovering if something exists by searching one or more attributes –Identify: Examine retrieved records to determine the items that met user’s search request –Select: Examine retrieved records for those that meet other user needs/requirements –Obtain: Using data in retrieved records to gain physical access to the described object FRBR user tasks
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA14 Introduces new terminology and conceptual model incorporated in: –RDA –Statement on cataloging principles Assisting in understanding better the range of relationships in the bibliographic universe –Collocation function of the catalog –Improve linking mechanisms Implementation in catalogs to improve user experience Impact on cataloging and catalogs
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA15 Revision of the Anglo-American Cataloguing Rules No AARC 3 Resource Description and Access (RDA) Focus on guidelines for content creation Separation from syntax or record format Designing the future -- Library Systems and Data Formats (wiki) Grassroots effort to address next generation library catalog and data format Recent developments
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA16 Essential in library applications Variety of metadata schemes Variety of functions and services supported Increasing use of machine-generated metadata Role of handcrafted metadata needs continuing review and assessment Research on use of metadata schemes can provide empirical data for decisions Metadata
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA17 Metadata creation as process Resulting metadata records as artifacts of the process Artifact reflects decisions, policies… Artifact can be investigated to understand metadata utilization decisions Decisions to use or not use available metadata elements Metadata record as artifact
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA18 Library catalogers create metadata – bibliographic records Follow cataloging rules and other standards to create the bibliographic data Encode the bibliographic data into MARC records MARC – communications format and metadata scheme Approximately 2,000 structures for encoding data Metadata – rules & practice
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA19 Richness of MARC MARC 21 Field Groups Currently Defined (in MARC 21 or OCLC MARC Bibliographic Format) MARC x63 0xx xx7640 2xx xx1554 4xx4537 5xx3448 6xx xx xx xx16 TOTAL
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA20 Given the cataloging rules… Given the detailed structuring of bibliographic data in MARC records… Given training of the catalogers… Given local policies and practices… What can we learn by examining a large set of MARC bibliographic records? What do catalogers use?
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA21 Standard record structure for exchange of descriptive and other types of metadata Evolved since late 1960s as key mechanism for sharing metadata among libraries Metadata record with approximately 2,000 elements available Approximately 200 fields Approximately 1800 subfields or other structures To what extent is the richness/complexity exploited and to what purpose? See Goldsmith and Knudson regarding Los Alamos Research Library choice of a metadata scheme Why study MARC utilization? Although often disparaged or dismissed in the library community, the MARC standard, notably the MARCXML standard, provides surprising flexibility and robustness for mapping disparate metadata to a vendor- neutral format for storage, exchange, and downstream use.
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA22 Occurrence summary Frequency# of Fields/ Subfields % of All Occurrences > 600, % 500,000 > 599,99900% 400,000 > 499, % 300,000 > 399, % 200,000 > 299, % 100,000 > 199, % TOTAL3679.5% Only 4% of all fields/subfields account for 80% of all occurrences 96% of all fields/subfields account for only 20% of all occurrences
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA23 MARC Content Designation Utilization Provide empirical evidence of catalogers’ use of MARC content designation Identify commonly used elements of bibliographic records Contribute to community discussion about core elements in MARC bibliographic records Explore the evolution of MARC content designation Develop research approach to understand the factors influencing levels of MARC content designation use The MCDU Project
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA24 Reports containing results of analysis of utilization Reports addressing commonly used elements Across formats In context of national recommendations (e.g., BIBCO) In context of FRBR user tasks HistoriMARC Database of MARC historical information about evolution of fields/subfields, etc. Enable analysis of patterns of adoption and utilization A methodology to understand factors influencing catalogers’ use of MARC Software tools and methods for others to use Project deliverables
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA25 56,177,383 MARC 21 Bibliographic Records from OCLC WorldCat Decomposed the records to store in MySQL Parsing Tool 82 hours to process and load records 295 GB final database size (with indexing) Structuring of decomposed records align with analytical questions Dataset and preparation
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA26 Analysis required determining frequency counts by format of material (ten) Concern about significant differences in patterns of utilization between Library of Congress and OCLC member cataloging Partitioned decomposed data into 20 databases Based on source of cataloging Based on format of material Additional data preparation
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA27 Number% %Total MCDU Project Dataset56,177, LC-Created RecordsNon-LC-Created Records MCDU Project Dataset by LC/nonLC8,713, ,463, ,177,383 Books Records7,595, ,546, ,142,087 Cartographic Materials242, , ,774 Electronic Resources39, , ,760 Continuing Resources388, ,193, ,581,341 Manuscripts11, ,390, ,402,441 Music109, ,167, ,276,903 Sound Recordings241, ,702, ,944,282 Projected Media22, ,415, ,437,694 Graphic Materials62, , ,026 Three-Dimensional Objects and Realia , ,075
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA28 General profile of the dataset (e.g.): What is the distribution of records by Type of Record? What is the distribution of records by Encoding Level? Occurrences of content designation structures: What is the number of total occurrences of all control and data fields and how many unique field tags are used? In how many and in what percentage of records is each unique field/subfield combination used at least once? Categories of questions
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA29 7,595,887 LC-created records in dataset Type of Record: Book, Pamphlets, and Printed Sheets Total number of unique fields occurring: 167 Number of fields accounting for 80% of occurrences: 14 fields (8.3%) Number of fields accounting for 90% of occurrences: 21 fields (12.6%) Approximately 110 fields (66%) occur in less than 1% of all records [Note: Fields are cataloger-supplied, not system-supplied ] Example results
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA30 Field Tag Number of Records Where Each Field is Used at Least Once Number of Total Occurrences of Each Field Cumulative Total Percentage of Field Occurrences 6505,387,28211,778, % 0087,595, % 2457,595, % 0107,595, % 3007,586,2647,586, % 2607,585,9267,585, % 0507,027,0277,095, % 1005,626,0115,626, % 5003,264,2974,582, % 0203,845,934`4,235, % 0824,034,8884,036, % 0433,665,6243,665, % 5043,373,2973,403, % 7002,312,7123,240, % ,5632,327, %
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA31 Frequency counts provide raw but informative data Threshold – concept to delineate a change in trend in utilization Determining commonly occurring elements Comparing to recommended core records Comparing to recommendations for national level records Comparing the FRBR user tasks data Making sense of numbers
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA32 FRBR describes four user tasks Find Identify Select Obtain Are library catalogers providing data to support FRBR tasks? Delsey mapped these tasks to MARC CDS for FRBR entities Element use and FRBR tasks
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA33 MARC 21 fields/subfields that can contain author, title, or subject data Author-related fields/subfields : 119 AuthorTitle-related fields/subfields: 21 Title-related fields/subfields: 253 Subject-related fields/subfields: 144 In FRBR context, Delsey identified: Approximately 460 fields/subfields can support this task for the FRBR entities In MCDU dataset, only 59 (13%) of these occur at or above the threshold of use in OCLC book records FRBR user task: Find (search)
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA34 What is needed in a bibliographic record? Support for the four user tasks? In context of FRBR, what does it mean to support a user task? Management of information resources? How do your systems use the infrequently used data? What about the 62% of all fields used in less than 1% of the records? Questions for consideration?
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA35 Can you argue persuasively for the cost/benefit of your existing practice? Should the focus be on high-value, high- impact, high-quality data in a few fields/subfields? Can you identify these few fields/subfields? What would it mean for costs of cataloging? What would this mean for training? Can MCDU results inform your local practices? Questions for consideration?
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA36 Select the appropriate metadata scheme. Use level of description and schema (DC, LOM, VRA Core, etc,) appropriate to the bibliographic resource. Don’t apply MARC, AACR2, and LCSH to everything. Consider …abandoning the use of controlled vocabularies [LCSH, MESH, etc] for topical subjects in bibliographic records. Manually enrich metadata in important areas Enhance name, main title, series titles, and uniform titles for prolific authors in music, literature, and special collections. Automate Metadata Creation Encourage the creation of metadata by vendors, and its ingestion into our catalog as early as possible in the process. Import enhanced metadata whenever, wherever it is available from vendors and other sources. Rethinking How We Provide Bibliographic Services for the University of California (December 2005) New cataloging practices?
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA37 Within library community… Influence of FRBR concepts and model for metadata Resource Description and Access (RDA) Re-examination of library catalog and its position within the landscape of resource discovery tools Development of a bibliographic metadata element set Next generation “MARC” Confluence for change
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA38 MARC Content Designation Utilization Project Moen and Benardino. (2003). Assessing Metadata Utilization: An Analysis of MARC Content Designation Use df.pdf df.pdf Goldsmith and Knudson Repository Librarian and the Next Crusade: The Search for a Common Standard for Digital Repository Metadata Roy Tennant. (2004). A Bibliographic Metadata Infrastructure for the Twenty-first Century Sally H. McCallum. (2006). MARC Forward. References
MCDU ProjectMassachusetts Library Association Conference -- May Sturbridge, MA39 Designing the future -- Library Systems and Data Formats Barbara Tillett. (2003). What is FRBR? A Conceptual Model for the Bibliographic Universe. Karen Calhoun. (2006). The Changing Nature of the Catalog and its Integration with Other Discovery Tools Bibliographic Services Task Force. (2005). Rethinking How We Provide Bibliographic Services for the University of California References