Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September.

Similar presentations


Presentation on theme: "Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September."— Presentation transcript:

1 Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September 21, 2006

2 Road map  Aggregate collections  Aggregate collections as a tool for understanding mass digitization projects “Anatomy of aggregate collections: the example of Google Print for Libraries” (d-Lib, September 2005)  Digital preservation and mass digitization  Conclusion

3 The shrinking “width of the border” Collection ACollection B Distance Metrics Economic Technical Physical

4 Aggregate collections  Definition: combined holdings of multiple institutions, viewed as a single collection 2 institutions, consortium, all libraries everywhere … WorldCat: aggregate collection of more than 70 million items, held by more than 25,000 institutions worldwide  Libraries embedded more deeply in networks of collaboration and coordination Decisions increasingly taken in context of inter-institutional environments, rather than local collection in isolation Shift in focus to resources of the “system”, rather than individual collections  As library networks develop and expand, opportunities arise to create value through collective action, or by aligning local collections with aspects of the system-wide environment

5 Anatomy of aggregate collections  Analysis of aggregate collections supports … Collaborative decision-making: direct collaboration by libraries (for example, collaborative storage strategies) “Decision-making in context”: local decision-making made in a larger context (for example, selecting print materials for digitization, given what has already been digitized elsewhere)  Better understanding of the anatomy of aggregate collections critical for wide range of library decision-making contexts: Collection management (cooperative collection development, shared off-site storage, collaborative preservation) Deeper resource sharing (meta-search, reducing frictions in resource sharing networks) Mass digitization  OCLC Research activities aimed at mobilizing library data (WorldCat) to understand and manage aggregate collections

6 Mass digitization and aggregate collections Google Book Search (aka Google Print for Libraries) Aggregate collection of digitized print books (combined holdings of Harvard, Michigan, Oxford, NYPL, and Stanford) Focus on copyright issues; very little discussion of Google Book Search as aggregate collection http://www.dlib.org/dlib/september05/lavoie/09lavoie.html

7 The system-wide print book collection as represented in WorldCat (January 2005) ~55 million ~41 million ~35 million ~32 million print books More information: Schonfeld & Lavoie “Books without Boundaries: A Brief Tour of the System-wide Print Book Collection” Journal of Electronic Publishing, Vol. 9, No. 2, Summer 2006 http://www.hti.umich.edu/cgi/t/text/text-idx?c=jep;cc=jep;view=text;rgn=main;idno=3336451.0009.208

8 G5 coverage of system-wide print book collection 10.5 million unique books 10.5 million unique books

9 Holdings overlap Potential redundancy rate of 40 percent Potential redundancy rate of 40 percent

10 Language distribution LanguageGoogle 5System-wide English0.490.52 German0.100.08 French0.080.08 Spanish0.050.06 Chinese0.040.04 Russian0.040.03 Italian0.030.03 Japanese0.020.04 Hebrew0.020.01 Arabic0.010.01 Portuguese0.010.01 Polish0.010.01 Dutch0.010.01 Latin0.010.01 Korean0.010.01 Swedish0.01< 0.01 All others0.070.08 More than 430 languages in Google 5 collection More than 430 languages in Google 5 collection

11 Cumulative age distribution of G5 holdings > 80 percent of Google 5 collection still in copyright > 80 percent of Google 5 collection still in copyright

12 Works Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely) Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely)

13 Some speculation …  What results would have been obtained if a different group of libraries had been selected?  What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5?  Chose 5 new libraries: Small US liberal arts college Large US public university Large US private university Large US metropolitan library Large Canadian university

14 Beyond the Google 5 … “New” Google 5“Original” Google 5 Total holdings:~8 million~18 million Total unique books:5.9 million10.5 million % of system-wide:18 percent33 percent Redundant holdings:26 percent42 percent Impact by library type:% of holdings unique relative to original G5 collection: Large US metropolitan library:39 percent (most unlike G5) Large US private university:25 percent Large Canadian university:23 percent Large US public university:21 percent Small US liberal arts college:13 percent (most like G5)

15 “The Google 10” Original Google 5 (10.5 million books) Google 10 collection: 12.3 million books + 1.8 million (17 %) Google 10 collection: 12.3 million books + 1.8 million (17 %) Diminishing returns? Original G5: ~18 million holdings 58% unique New G5: ~8 million holdings 22% unique

16 The challenge of digital preservation Capture/Selection Description Secure Storage Media Management Render “The Preservation Pyramid” Adapted from Priscilla Caplan (FCLA) Authenticity/ Understandability ECONOMICSRIGHTS

17 But …  Chris Rusbridge’s “digital preservation fallacies”: Digital preservation is very expensive File formats become obsolete quickly Interventions must occur frequently Digital preservation repositories should have very long time- scale aspirations The preserved object must be easily and instantly accessible in contemporary formats The preserved object must be faithful in all respects to original Source: Rusbridge, C. “Excuse me … Some Digital Preservation Fallacies?” Ariadne February 2006; http://www.ariadne.ac.uk/issue46/rusbridge/  Bottom Line: significant progress has been made, but: Still lack well-understood, standardized practices for preserving digital materials No consensus on what “successful digital preservation” means

18 Mass digitization and digital preservation Roles and responsibilities: Google? Libraries? Elsevier? JSTOR? Digitized books as artifacts to be preserved, or disposable surrogates? Implications for redundancy in system? What uses can digitized output be put to? Discovery/linking (e.g., mbooks) Text-mining Infrastructure to support large- scale digital content management Efficient, automated workflows for preservation metadata “Last copy”

19 Summing up …  Distance between collections shrinking; mass digitization programs and other aggregate collections increasingly common features of library landscape  To mobilize aggregate collections, need to understand anatomy of aggregate collections – i.e., data and analysis to support planning and collaboration Characterize and promote the “collective collection”: the collective library resource Chart a course through mass digitization (e.g., G5 study)  Mass digitization raises important questions about long-term preservation (summarized by “preservation pyramid”); need strategies to secure long-term future of digitization investments


Download ppt "Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September."

Similar presentations


Ads by Google