Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting October 2005
Aggregate collections Boundaries between local and external collections increasingly blurred … Resource sharing (digital/network technologies) Cooperative collection management (resource allocation) Shift in focus to resources of the system (or subsets of the system), rather than individual collections Need data to support/illuminate system-wide perspective Characterize/analyze aggregate collections WorldCat: largest aggregate collection Aggregate holdings of >20,000 libraries Bridge from local to system-wide perspective
The system-wide print book collection as represented in WorldCat (January 2005) ~55 million ~41 million ~35 million ~32 million print books More information:
Google Print for Libraries Aggregate collection of print books Aggregate print book holdings of five major research libraries (Harvard, Michigan, Oxford, NYPL, and Stanford) Focus on copyright issues; very little discussion of Google Print for Libraries as an aggregate collection What are characteristics of this aggregate collection? How does it relate to the system-wide collection? WorldCat: useful data source for analysis Lavoie, Connaway, Dempsey: Anatomy of Aggregate Collections: The Example of Google Print for Libraries D-Lib (September 2005)
G5 coverage of system-wide print book collection 10.5 million unique books 10.5 million unique books
Holdings overlap Potential redundancy rate of 40 percent Potential redundancy rate of 40 percent
Language distribution LanguageGoogle 5System-wide English German French Spanish Chinese Russian Italian Japanese Hebrew Arabic Portuguese Polish Dutch Latin Korean Swedish0.01< 0.01 All others More than 430 languages in Google 5 collection More than 430 languages in Google 5 collection
Cumulative age distribution of G5 holdings > 80 percent of Google 5 collection still in copyright > 80 percent of Google 5 collection still in copyright
Works Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely) Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely)
Some speculation … What results would have been obtained if a different group of libraries had been selected? What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5? Chose 5 new libraries: Small US liberal arts college Large US public university Large US private university Large US metropolitan library Large Canadian university
Beyond the Google 5 … New Google 5Original Google 5 Total holdings:~8 million~18 million Total unique books:5.9 million10.5 million % of system-wide:18 percent33 percent Redundant holdings:26 percent42 percent Impact by library type:% of holdings unique relative to original G5 collection: Large US metropolitan library:39 percent (most unlike G5) Large US private university:25 percent Large Canadian university:23 percent Large US public university:21 percent Small US liberal arts college:13 percent (most like G5)
The Google 10 Original Google 5 (10.5 million books) Google 10 collection: 12.3 million books million (17 %) Google 10 collection: 12.3 million books million (17 %) Diminishing returns? Original G5: ~18 million holdings 58% unique New G5: ~8 million holdings 22% unique
Mass digitization programs and other aggregate collections increasingly common features of library landscape Effective decision-making/planning aided by convergence on set of standard questions that help map out anatomy of aggregate collections Example: mass digitization programs What are characteristics of overarching population of materials that is target of digitization effort? How much of population will digitization effort cover? What is potential degree of redundancy? What bibliographic unit is focus of digitization (e.g., manifestations, expressions, works)? What number of participants and combination of institution types is optimal for obtaining maximum benefit with minimum cost? Anatomy of aggregate collections
Aggregate collections and WorldCat WorldCat more than tool for cataloging and reference; also strategic resource for managing aggregate collections OCLC Group Services OCLC WorldCat Collection Analysis Service OCLC Research data-mining activities Web site: