Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September.


Similar presentations
Digital Library Service at Higher Education in India

OCLC Online Computer Library Center Protecting the Investment: Economic Challenges of Digital Preservation Brian Lavoie Research Scientist OCLC Research.
Ithaka A Systemwide View of Library Collections Brian Lavoie, OCLC Research Roger C. Schonfeld, Ithaka CNI Spring Task Force Meeting April 5, 2005.
LIBER pre-conference, 5 July 05 The inside out library: libraries in the age of Amazoogle Lorcan Dempsey OCLC LIBER pre-conference: Converging and dissolving.
Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting.
Libraries and the network platform: a new cooperative context Lorcan Dempsey 2006 OCLC/Frederick G. Kilgour Lecture in Information and Library Science.
OCLC Research The tale of the library long tail: space, collections, and the network Brian Lavoie Consulting Research Scientist OCLC Space: The Final Frontier.
OCLC Online Computer Library Center Steering Around the Iceberg: Economic Sustainability for Digital Collections Brian Lavoie Research Scientist OCLC Economics.
The White Rose Collaborative Collection Partnership Brian Clifford University of Leeds.
MAIN MESSAGE key reasons enumerated ->please read speaker notes Research. Report. Reposit. Deposit your scholarly research - it’s as easy as 1, 2, 3 id.
Ezra T. Ernst Chief Executive Officer Swets Information Services, Inc. The Long Tail and its’ application To Scholarly Information.
Gwen Bird Executive Director, COPPUL Leonora Crema AUL Client Services & Programs, UBC and Chair, SPAN Management Committee Council of Prairie and Pacific.
Moving libraries to Web scale Matt Goldner Product & Technology Advocate 14 June 2011.
Moving Shared Print to the Network Level Emily Stambaugh ALA Annual Conference Las Vegas, NV June 27, 2014 “Looking to the Future of Shared Print” Shared.
Session 1. Group 1: Implications of multiple- purpose forestry for information providers (Horrendous, unfamiliar, stressful) Wide audience – wide net.
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
Anne R. Kenney SCLD Annual Conference April 24-26, 2006 The Sum of its Parts: Consolidated Storage, Management, and Delivery Services.
Cornell Institute for Digital Collections DIGITIZATION AND THE DIGITAL: THE IMPACT ON ACADEMIC LIBRARIES Peter B. Hirtle Cornell Institute for Digital.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
The world’s libraries. Connected. Print Management at ‘Mega’-scale NITLE Collections in a Mega-regional framework NITLE Shared Academics » Future of Libraries.
OCLC Research Exploration, innovation and community for libraries and archives. Featuring Brian Lavoie, Research Scientist Print Management at “Mega-scale”:
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Swapan Deoghuria Scientist-II, Computer Centre Indian Association for the Cultivation of Science Kolkata , INDIA URL:
Trends in Preserving Scholarly Electronic Journals 1. Golnessa GALYANI MOGHADDAM Shahed University Dept. of Library and Information Science, Shahed University,
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Challenges & opportunities in the preservation of (digital) information: the case of European research libraries Museo de las Ciencias Teatro de UNIVERSUM.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution - Non-Commercial - Share Alike 3.0 License..
LIS654lecture 1 Introduction Thomas Krichel
Login / Upload / Share Deposit your scholarly research - it’s as easy as 1, 2, 3 MAIN MESSAGE key reasons enumerated ->please read speaker notes id / who.
Zack Lane ReCAP Coordinator July 2012 ReCAP Columbia University.
Additional New Content to be Purchased Annually Team 2.
OCoLR # OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May
Looking to the East: Challenges in Connecting Asian Libraries in the World of Information Karen T. Wei University of Illinois at Urbana-Champaign Hong.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
The OCLC-AMICAL RESPOND project: Leveraging WorldCat to connect international American universities.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
September 17, 2015 The Evolving Scholarly Record: Scope, Stakeholders, and Stewardship Brian Lavoie Constance Malpas OCLC Research.
H ATHI T RUST HTTP :// WWW. HATHITRUST. ORG Large-Scale Digital Initiatives and their potential impact on the Maine Shared Collections Strategy Colby College.
Programs and Research Thinking about collections Lorcan Dempsey Fiesole retreat The University of Hong Kong 13 April 2007.
RLG Programs Curating the Collective Collection Ricky Erway RLG Programs OCLC Programs and Research Western Digital Forum 9 August 2007.
Stephanie Orphan Director of Publications, Portico Beyond Serials—Lessons Learned and Future Directions for e-Book Preservation.
OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
New RCLayout. Do product layout 3 improvements All products Local databases New functionalities.
LIS654 lecture 1 Introduction Thomas Krichel
The State of PREMIS Brian Lavoie Research Scientist OCLC PREMIS Implementation Fair San Francisco, CA October 7, 2009.
Libraries and networks: the new cooperative context Lorcan Dempsey University of Illinois, Springfield 30 March 2005.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
OCLC Online Computer Library Center The ‘Hows’ and ‘Whys’ of Preserving Digital Materials Brian Lavoie Research Scientist OCLC CARL program: “Here Today,
NetLibrary Publishers’ Summit Looking at libraries Lorcan Dempsey OCLC NetLibrary Publishers’ Summit June 2005.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
OCLC Online Computer Library Center “HTTP 404: Not Found” Incentives to Preserve Government Information Brian Lavoie OCLC Research Sixth Annual State GILS.
Digital Collections Forum Doug Moncur AIATSIS September 2004.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
GPO’s Future Digital System (FDsys) November 2, 2006 LS&CM CENDI Presentation.
Bringing Europe’s eLectronic Infrastructures to Expanding frontiers Santiago – September 2006 BELIEF Project Bringing Europe’s eLectronic Infrastructures.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Creative Commons Attribution - Non- Commercial - Share.
ADLUG Roma (Italy) What is known must be shared Building on the insights from OCLC Research.
New Opportunities Fund Preservation Workshop March 15th 2002 Maggie Jones Cedars Project Manager.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Batchload User Group Meeting Ted Fons, Director, WorldCat Global Metadata Network Pam Harper, Product Manager, Batchload services Tony Chirakos, Consulting.
Working with personal digital archives Susan Thomas Project Manager & Digital Archivist project Manuscripts Matter, Electronica panel London, October.
eContentplus 2008 Work Programme
Statewide Digitization and the FCLA Digital Archive
Presentation transcript:

Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September 21, 2006

Road map  Aggregate collections  Aggregate collections as a tool for understanding mass digitization projects “Anatomy of aggregate collections: the example of Google Print for Libraries” (d-Lib, September 2005)  Digital preservation and mass digitization  Conclusion

The shrinking “width of the border” Collection ACollection B Distance Metrics Economic Technical Physical

Aggregate collections  Definition: combined holdings of multiple institutions, viewed as a single collection 2 institutions, consortium, all libraries everywhere … WorldCat: aggregate collection of more than 70 million items, held by more than 25,000 institutions worldwide  Libraries embedded more deeply in networks of collaboration and coordination Decisions increasingly taken in context of inter-institutional environments, rather than local collection in isolation Shift in focus to resources of the “system”, rather than individual collections  As library networks develop and expand, opportunities arise to create value through collective action, or by aligning local collections with aspects of the system-wide environment

Anatomy of aggregate collections  Analysis of aggregate collections supports … Collaborative decision-making: direct collaboration by libraries (for example, collaborative storage strategies) “Decision-making in context”: local decision-making made in a larger context (for example, selecting print materials for digitization, given what has already been digitized elsewhere)  Better understanding of the anatomy of aggregate collections critical for wide range of library decision-making contexts: Collection management (cooperative collection development, shared off-site storage, collaborative preservation) Deeper resource sharing (meta-search, reducing frictions in resource sharing networks) Mass digitization  OCLC Research activities aimed at mobilizing library data (WorldCat) to understand and manage aggregate collections

Mass digitization and aggregate collections Google Book Search (aka Google Print for Libraries) Aggregate collection of digitized print books (combined holdings of Harvard, Michigan, Oxford, NYPL, and Stanford) Focus on copyright issues; very little discussion of Google Book Search as aggregate collection

The system-wide print book collection as represented in WorldCat (January 2005) ~55 million ~41 million ~35 million ~32 million print books More information: Schonfeld & Lavoie “Books without Boundaries: A Brief Tour of the System-wide Print Book Collection” Journal of Electronic Publishing, Vol. 9, No. 2, Summer

G5 coverage of system-wide print book collection 10.5 million unique books 10.5 million unique books

Holdings overlap Potential redundancy rate of 40 percent Potential redundancy rate of 40 percent

Language distribution LanguageGoogle 5System-wide English German French Spanish Chinese Russian Italian Japanese Hebrew Arabic Portuguese Polish Dutch Latin Korean Swedish0.01< 0.01 All others More than 430 languages in Google 5 collection More than 430 languages in Google 5 collection

Cumulative age distribution of G5 holdings > 80 percent of Google 5 collection still in copyright > 80 percent of Google 5 collection still in copyright

Works Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely) Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely)

Some speculation …  What results would have been obtained if a different group of libraries had been selected?  What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5?  Chose 5 new libraries: Small US liberal arts college Large US public university Large US private university Large US metropolitan library Large Canadian university

Beyond the Google 5 … “New” Google 5“Original” Google 5 Total holdings:~8 million~18 million Total unique books:5.9 million10.5 million % of system-wide:18 percent33 percent Redundant holdings:26 percent42 percent Impact by library type:% of holdings unique relative to original G5 collection: Large US metropolitan library:39 percent (most unlike G5) Large US private university:25 percent Large Canadian university:23 percent Large US public university:21 percent Small US liberal arts college:13 percent (most like G5)

“The Google 10” Original Google 5 (10.5 million books) Google 10 collection: 12.3 million books million (17 %) Google 10 collection: 12.3 million books million (17 %) Diminishing returns? Original G5: ~18 million holdings 58% unique New G5: ~8 million holdings 22% unique

The challenge of digital preservation Capture/Selection Description Secure Storage Media Management Render “The Preservation Pyramid” Adapted from Priscilla Caplan (FCLA) Authenticity/ Understandability ECONOMICSRIGHTS

But …  Chris Rusbridge’s “digital preservation fallacies”: Digital preservation is very expensive File formats become obsolete quickly Interventions must occur frequently Digital preservation repositories should have very long time- scale aspirations The preserved object must be easily and instantly accessible in contemporary formats The preserved object must be faithful in all respects to original Source: Rusbridge, C. “Excuse me … Some Digital Preservation Fallacies?” Ariadne February 2006;  Bottom Line: significant progress has been made, but: Still lack well-understood, standardized practices for preserving digital materials No consensus on what “successful digital preservation” means

Mass digitization and digital preservation Roles and responsibilities: Google? Libraries? Elsevier? JSTOR? Digitized books as artifacts to be preserved, or disposable surrogates? Implications for redundancy in system? What uses can digitized output be put to? Discovery/linking (e.g., mbooks) Text-mining Infrastructure to support large- scale digital content management Efficient, automated workflows for preservation metadata “Last copy”

Summing up …  Distance between collections shrinking; mass digitization programs and other aggregate collections increasingly common features of library landscape  To mobilize aggregate collections, need to understand anatomy of aggregate collections – i.e., data and analysis to support planning and collaboration Characterize and promote the “collective collection”: the collective library resource Chart a course through mass digitization (e.g., G5 study)  Mass digitization raises important questions about long-term preservation (summarized by “preservation pyramid”); need strategies to secure long-term future of digitization investments