Society of American Archivists Research Forum 18 August 2015 A Deep Dive into the Archival MARC Records in WorldCat (and ArchiveGrid) Jackie Dooley Program.

Slides:



Advertisements
Similar presentations
MARC 101 for Non-Catalogers Colorado Horizon Users Group Meeting Philip S. Miller Library Castle Rock, CO May 29, 2007.
Advertisements

RDA and DACS: Using a MARC-EAD Crosswalk to Improve Access to Special Collections Resources, a Project at UWG GUGM May 15, 2014 Presenters: Blynne Olivieri.
Lis512 lecture 4 the MARC format structure, leader, directory.
Shared Print Management Metadata Guidelines Pilot Project OCLC project to develop and test recommendations for how libraries could use Worldcat.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Metadata for Digital Content Jane Mandelbaum, Ann Della Porta, Rebecca Guenther.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
5 th September 2003Diane Tough Content Creation at the NHM or The evolving catalogue!
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The Library Cataloging Tradition
LSTA Digital Imaging Grants Presentation Projects Workshop September 13, 2002 Wendy Sistrunk Music Catalog Librarian University of Missouri—Kansas City.
Introduction to MARC Cataloguing Part 2 Presenters: Irma Sauvola: Part 1 Dan Smith: Part 2.
Opening up the bibliography for the future The Danish Scenario: taking Danish National Bibliography reuse to the next level Carsten H. Andersen Director.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
October 23, Expanding the Serials Family Continuing resources in the library catalogue.
Sage Library Consortium Cataloging-in-Publication MARC record conversion.
CATALOGING NON- TRADITIONAL (MOSTLY ONLINE) MATERIALS The Whys and Hows.
OCLC Local Holdings Records (LHRs) for the UCs CAMCIG Training October 20, 2009 Presenter: Sara Shatford Layne.
Mark Sullivan University of Florida Libraries Digital Library of the Caribbean.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
BEYOND THE OPAC: FUTURE DIRECTIONS FOR WEB-BASED CATALOGUES Martha M. Yee September 11, 2006 draft.
© WRLC November 2005 Research Commons Supporting Scholarship in the 21st Century.
Estonian Web and Bibliographic Control Janne Andresoo.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
ARCHIVISTS’ TOOLKIT WORKSHOP March 13, 2008 Christine de Catanzaro Jody Thompson.
SLIDE 1IS 257 – Fall 2007 Introduction to Description and AACR II University of California, Berkeley School of Information IS 245: Organization.
DACS Describing Archives: A Content Standard. The Background  Archives, Personal Papers & Manuscripts, 1980s –New Technologies with Web, XML, EAD –Revision.
389F/Description1 ARCHIVAL DESCRIPTION. 389F/Description2 INTRODUCTION Finding Aid Any descriptive medium that establishes physical, administrative and/or.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
AACR2 Pt. 1, Monographic Description LIS Session 2.
RDA Compared with AACR2 Presentation given at the ALA conference program session Look Before You Leap: taking RDA for a test-drive July 11, 2009 by Tom.
RDA and Special Libraries Chris Todd, Janess Stewart & Jenny McDonald.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
The physical parts of a computer are called hardware.
Using ArchiveGrid to Promote Archival Collections The Future of Collections: Creating and Managing Digital Content – Gainesville, FL – February 20, 2013.
Description of Bibliographic Items. Review Encoding = Markup. The library cataloging “markup” language is MARC. Unlike HTML, MARC tags have meaning (i.e.,
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
MARC Content Designation and Utilization Learning from Artifacts: Metadata Utilization Analysis William E. Moen School of Library and Information Sciences.
AACR 2 –Rules for Descriptive Cataloguing
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
OCLC Research Library Partnership Work-In-Progress webinar 3 December 2015 A Close Look at the Four Million Archival MARC Records in WorldCat Jackie Dooley.
Mr. P’s Class Term Paper All the Steps on the Path to an “A” Term Paper in World History.
Not Just Suite Talk: Revising Graphic Materials into Descriptive Cataloging of Rare Materials (Graphics) 15 August 20091DACS & Companion Standards.
An Inquiry and Analysis of Metadata Utilization A Case Study of MARC 2005 ASIS&T Annual Meeting, November 1, 2005, Charlotte, North Carolina William E.
Sally McCallum Library of Congress
EAD 101: An Introduction to Encoded Archival Description XML and the Encoded Archival Description: Providing Access to Collections Oregon Library Association.
Presented by: Amy Carson, Trisha Hansen and Jonathan Sears.
MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for.
TAG YOU’RE IT: ENHANCING ACCESS TO GRAPHIC NOVELS WENDY WEST
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
MARC Tags to BIBFRAME Vocabulary: a new view of metadata Sally McCallum Library of Congress ALA - January 2014.
A Complex Standard and Its Use Results from an empirical analysis of MARC 2004 Texas Library Association Annual Conference, March 18, 2004, San Antonio,
SILO File Upload & Feedback System By Marie Harms State Library of Iowa August 18 & 19, 2010.
AN ARCHETYPE FOR INFORMATION ORGANIZATION AND CLASSIFICATION OCLC WorldCat.
7th Annual Hong Kong Innovative Users Group Meeting
How to create a virtual TYP field via tab_type_config.lng
Headline.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Increasing discoverability and access for the Massachusetts Archives
Moving Beyond APPM to a Global Standard:
Using OCLC Work IDs for Discovery
Headline.
Cataloging Tips and Tricks
What It Is, and Why It Helps Christine de Catanzaro August 23, 2006
MARC: Beyond the Basics 11/24/2018 (C) 2006, Tom Kaun.
Moving Beyond APPM to a Global Standard:
IL Step 3: Using Bibliographic Databases
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
‘Splitting’ the MUSIC format
Presentation transcript:

Society of American Archivists Research Forum 18 August 2015 A Deep Dive into the Archival MARC Records in WorldCat (and ArchiveGrid) Jackie Dooley Program Officer OCLC Research

OVERVIEW Research objective Research questions The data set High-level findings Next steps

RESEARCH OBJECTIVE

Research Objective Establish a detailed profile of MARC data element occurrences in archival catalog records, providing a view of 30+ years of practice. Reveal variations in descriptive practice Debunk inaccurate assumptions Characterize before MARC usage diminishes Suggest improvements in descriptive practice Enable analysis of implications for discovery

SAMPLE RESEARCH QUESTIONS

Sample research questions Are descriptions and index terms rich enough to enable effective discovery of archival materials? In what significant ways does archival description differ from one type of material to another? To what extent does use of the archival control byte successfully capture the universe of archival descriptions? Is it true that archivists usually describe materials at the collection level? How often is DACS used as the content standard? And APPM as its predecessor? To what extent are the DACS minimum requirements met?

THE DATA SET

Archival records in WorldCat OCLC’s WorldCat database of 300+ million records, filtered to extract “archival” records (currently 4 million, or about 1% of the total) Brief version of the filter specs: “Unpublished” materials in any format (e.g., text, visual, moving image, sound recording) Coded for “archival control” (Leader byte 08) Held by a single institution (i.e., only one attached holding) Excludes published materials in any format, as well as theses and dissertations Spoiler alert: It’s not perfect.

Same records as in ArchiveGrid Only one library holding symbol is attached (to eliminate non-unique items or collections) The MARC Leader has one or more of the following:Leader –Leader byte 06 (recordtype) has the value d (manuscript music), f (manuscript cartographic), g (projected graphics), i (nonmusic recording), j (music recording), k (visual), p (mixed), r (realia), or t (textual manuscript). [does this include all the new ones?] –Leader byte 06 has the value "a" (language material) and Leader byte 07 (bibliographic level) has the value "c" (collection). –Leader byte 08 has the value "a" (archival control). Field 260 subfields "a" and "b" are not present (to filter out published works) "Bibliography" does not occur at the beginning string of any MARC subject heading subfield "a" or "v" (to filter out published works). Field 502 is not present (to filter out theses and dissertations). Records with material type "book" or "serial" that have no value in fields 008 or 006 “Nature of Contents” bytes (to eliminate theses, reference works, and other non-archival materials). The full filter specs:

So what do you think of our scoping of archival data elements? Spoiler reminder: It’s not perfect. “Unpublished” materials in any format Under “archival control” Held by a single institution Excludes all published materials Briefest version of the filter specs:

HIGH-LEVEL FINDINGS A.Full data B.Mixed materials C.Text D.Visual materials E.Music scores A.Maps B.Audio recordings

Percent of records by type of material

A. Full data “ Archival control”: 28% of records Dates: Nearly half have date span Bibliographic level –53% describe collections –40% describe single items –“Component” levels rarely used 95% are mixed materials, text, or visual materials 85% have ≥1 indexed creator names 75% have ≥1 indexed subject terms 30% have an 856 field (link to external content)

Bibliographic level by type of material

Inclusion of 6xx (subject) index terms

A. Full data, cont. Cataloging level –29% full cataloging –25% minimal –44% unknown Cataloging rules –Specified in 30% of records –appm in 18% of records, dacs in 7%, gihc in 5% Form of material: Used most heavily for non-textual materials Language –Two thirds in English –Not specified in ≥ 25% of records Place of publication vs. location of repository

B. Mixed Materials 44% of all records 50% are under archival control 94% are collection records, 5% are components 1xx in 70% of records Title: 11% have no 245 $a Notes 520 in 74% of records 545 field in 31% of records 500 field in 39% of records No other 5xx used in ≥ 25% of records

B. Mixed Materials, cont. 600 in 40% of records; mean of 1.5 per record 650 in 52% of records; mean of 3.0 per record 651 in 45% of records; mean of 1.3 per record 655 in 63% of records; mean of 1.3 per record 7xx in 28% of records 856 in 29% of records

C. Text 25% of all records –4% are book and pamphlet collections –21% are textual manuscripts 25% of textual manuscript records are under archival control 30% are collection records, 70% are items 1xx in 77% of records Title: 11% have no 245 $a Notes –43% have 520 field –54% have 500 field

C. Text, cont. 600 in 31% of records; mean of 0.9 per record 650 in 42% of records; mean of 1.7 per record 651 in 31% of records; mean of 0.8 per record 655 in 36% of records; mean of 0.7 per record 7xx in 50% of records

D. Visual Materials 26% of all records ≤ 10% are under archival control 57% have 007 (technical data values) 15% are collection records, 76% are items 1xx in 51% of records Notes –500 in 77% of records –520 in 68% of records –540 in 57% of records

D. Visual Materials, cont. 600 in 32% of records; mean of 1.1 per record 650 in 68% of records; mean of 4.2 per record 651 in 38% of records; mean of 1.5 per record 655 in 81% of records; mean of 1.5 per record 7xx in 31% of records 856 in 48% of records

E. Music Scores 4% of all records 1xx in 90% of records 240 in 41% of records 500 in 96% of records; negligible use of other 5xx’s 650 in 96% of records; mean of 2.4 per record 655 in 34% of records; genre/form terms often in 650 instead 856 in 25% of records

F. Maps Less than 1% of all records 65% have 007 (technical data values) Field 043 (hierarchical geographic area code) in 80% of records 052 in 66% of records (geographic classification) 1xx in 53% of records 255 in 92% of records (cartographic mathematical data)

F. Maps, cont. 500 in 93% of records; use of other 5xx’s negligible 650 in 68% of records; mean of 2.8 per record 651 in 83% of records; mean of 2.7 per record 655 in 84% of records; mean of 1.8 per record 7xx in 50% of records

G. Audio Recordings Less than 1% of all records 60% have 007 (technical data values) 1xx in 83% of record Notes –500 in 77% –520 in 68% –530 in 27% –540 in 57%

G. Audio Recordings, cont. 650 in 68%; mean of 5.2 per record 651 in 47%; mean of.9 per record 655 in 67% of records; mean of 1.2 per record 7xx in 100% of records 856 in 22% of records

NEXT STEPS

Draw conclusions (a few for starters) Mixed and textual materials cataloged as collections; other formats not so much “Archival control” byte is far from universally used, so has little value Few of the note fields added for archival or visual materials communities are widely used (does it matter?) As many as 25% of titles for mixed and textual collections make for lousy browsing (e.g., “Papers” or “Records”) Ponder implications for next-gen cataloging (linked data, BIBFRAME, schema.org)

Please send feedback Do the data debunk any assumptions? Are you dubious about any of the data? Would you tweak the specs of our filter? Are changes in practice called for? What other questions should I be asking? Is this a useful project or just an “interesting” one?

Publications & future research Publish this data Second paper: Implications for discovery Future research? –Data content –Potential for data remediation Generic titles (e.g., Papers, Records) Missing language codes Other? –Descriptive practice for web archiving If you need an OCLC data set for research...

SM Thanks! Jackie Dooley Program Officer, OCLC SAA Research Forum