Download presentation
Presentation is loading. Please wait.
Published byRobert Wilkinson Modified over 9 years ago
1
Society of American Archivists Research Forum 18 August 2015 A Deep Dive into the Archival MARC Records in WorldCat (and ArchiveGrid) Jackie Dooley Program Officer OCLC Research
2
OVERVIEW Research objective Research questions The data set High-level findings Next steps
3
RESEARCH OBJECTIVE
4
Research Objective Establish a detailed profile of MARC data element occurrences in archival catalog records, providing a view of 30+ years of practice. Reveal variations in descriptive practice Debunk inaccurate assumptions Characterize before MARC usage diminishes Suggest improvements in descriptive practice Enable analysis of implications for discovery
5
SAMPLE RESEARCH QUESTIONS
6
Sample research questions Are descriptions and index terms rich enough to enable effective discovery of archival materials? In what significant ways does archival description differ from one type of material to another? To what extent does use of the archival control byte successfully capture the universe of archival descriptions? Is it true that archivists usually describe materials at the collection level? How often is DACS used as the content standard? And APPM as its predecessor? To what extent are the DACS minimum requirements met?
7
THE DATA SET
8
Archival records in WorldCat OCLC’s WorldCat database of 300+ million records, filtered to extract “archival” records (currently 4 million, or about 1% of the total) Brief version of the filter specs: “Unpublished” materials in any format (e.g., text, visual, moving image, sound recording) Coded for “archival control” (Leader byte 08) Held by a single institution (i.e., only one attached holding) Excludes published materials in any format, as well as theses and dissertations Spoiler alert: It’s not perfect.
9
Same records as in ArchiveGrid Only one library holding symbol is attached (to eliminate non-unique items or collections) The MARC Leader has one or more of the following:Leader –Leader byte 06 (recordtype) has the value d (manuscript music), f (manuscript cartographic), g (projected graphics), i (nonmusic recording), j (music recording), k (visual), p (mixed), r (realia), or t (textual manuscript). [does this include all the new ones?] –Leader byte 06 has the value "a" (language material) and Leader byte 07 (bibliographic level) has the value "c" (collection). –Leader byte 08 has the value "a" (archival control). Field 260 subfields "a" and "b" are not present (to filter out published works) "Bibliography" does not occur at the beginning string of any MARC subject heading subfield "a" or "v" (to filter out published works). Field 502 is not present (to filter out theses and dissertations). Records with material type "book" or "serial" that have no value in fields 008 or 006 “Nature of Contents” bytes (to eliminate theses, reference works, and other non-archival materials). http://beta.worldcat.org/archivegrid/about/ The full filter specs:
10
So what do you think of our scoping of archival data elements? Spoiler reminder: It’s not perfect. “Unpublished” materials in any format Under “archival control” Held by a single institution Excludes all published materials Briefest version of the filter specs:
11
HIGH-LEVEL FINDINGS A.Full data B.Mixed materials C.Text D.Visual materials E.Music scores A.Maps B.Audio recordings
12
Percent of records by type of material
13
A. Full data “ Archival control”: 28% of records Dates: Nearly half have date span Bibliographic level –53% describe collections –40% describe single items –“Component” levels rarely used 95% are mixed materials, text, or visual materials 85% have ≥1 indexed creator names 75% have ≥1 indexed subject terms 30% have an 856 field (link to external content)
14
Bibliographic level by type of material
15
Inclusion of 6xx (subject) index terms
16
A. Full data, cont. Cataloging level –29% full cataloging –25% minimal –44% unknown Cataloging rules –Specified in 30% of records –appm in 18% of records, dacs in 7%, gihc in 5% Form of material: Used most heavily for non-textual materials Language –Two thirds in English –Not specified in ≥ 25% of records Place of publication vs. location of repository
17
B. Mixed Materials 44% of all records 50% are under archival control 94% are collection records, 5% are components 1xx in 70% of records Title: 11% have no 245 $a Notes 520 in 74% of records 545 field in 31% of records 500 field in 39% of records No other 5xx used in ≥ 25% of records
18
B. Mixed Materials, cont. 600 in 40% of records; mean of 1.5 per record 650 in 52% of records; mean of 3.0 per record 651 in 45% of records; mean of 1.3 per record 655 in 63% of records; mean of 1.3 per record 7xx in 28% of records 856 in 29% of records
19
C. Text 25% of all records –4% are book and pamphlet collections –21% are textual manuscripts 25% of textual manuscript records are under archival control 30% are collection records, 70% are items 1xx in 77% of records Title: 11% have no 245 $a Notes –43% have 520 field –54% have 500 field
20
C. Text, cont. 600 in 31% of records; mean of 0.9 per record 650 in 42% of records; mean of 1.7 per record 651 in 31% of records; mean of 0.8 per record 655 in 36% of records; mean of 0.7 per record 7xx in 50% of records
21
D. Visual Materials 26% of all records ≤ 10% are under archival control 57% have 007 (technical data values) 15% are collection records, 76% are items 1xx in 51% of records Notes –500 in 77% of records –520 in 68% of records –540 in 57% of records
22
D. Visual Materials, cont. 600 in 32% of records; mean of 1.1 per record 650 in 68% of records; mean of 4.2 per record 651 in 38% of records; mean of 1.5 per record 655 in 81% of records; mean of 1.5 per record 7xx in 31% of records 856 in 48% of records
23
E. Music Scores 4% of all records 1xx in 90% of records 240 in 41% of records 500 in 96% of records; negligible use of other 5xx’s 650 in 96% of records; mean of 2.4 per record 655 in 34% of records; genre/form terms often in 650 instead 856 in 25% of records
24
F. Maps Less than 1% of all records 65% have 007 (technical data values) Field 043 (hierarchical geographic area code) in 80% of records 052 in 66% of records (geographic classification) 1xx in 53% of records 255 in 92% of records (cartographic mathematical data)
25
F. Maps, cont. 500 in 93% of records; use of other 5xx’s negligible 650 in 68% of records; mean of 2.8 per record 651 in 83% of records; mean of 2.7 per record 655 in 84% of records; mean of 1.8 per record 7xx in 50% of records
26
G. Audio Recordings Less than 1% of all records 60% have 007 (technical data values) 1xx in 83% of record Notes –500 in 77% –520 in 68% –530 in 27% –540 in 57%
27
G. Audio Recordings, cont. 650 in 68%; mean of 5.2 per record 651 in 47%; mean of.9 per record 655 in 67% of records; mean of 1.2 per record 7xx in 100% of records 856 in 22% of records
28
NEXT STEPS
29
Draw conclusions (a few for starters) Mixed and textual materials cataloged as collections; other formats not so much “Archival control” byte is far from universally used, so has little value Few of the note fields added for archival or visual materials communities are widely used (does it matter?) As many as 25% of titles for mixed and textual collections make for lousy browsing (e.g., “Papers” or “Records”) Ponder implications for next-gen cataloging (linked data, BIBFRAME, schema.org)
30
Please send feedback Do the data debunk any assumptions? Are you dubious about any of the data? Would you tweak the specs of our filter? Are changes in practice called for? What other questions should I be asking? Is this a useful project or just an “interesting” one?
31
Publications & future research Publish this data Second paper: Implications for discovery Future research? –Data content –Potential for data remediation Generic titles (e.g., Papers, Records) Missing language codes Other? –Descriptive practice for web archiving If you need an OCLC data set for research...
32
SM Thanks! Jackie Dooley Program Officer, OCLC Research dooleyj@oclc.org @minniedw SAA Research Forum
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.