Mining for Digital Resources: Identifying and Characterizing Digital Materials in WorldCat Brian Lavoie Lynn Silipigni Connaway Ed ONeill ACRL 12 th National.

Slides:



Advertisements
Similar presentations
Academic Search Engines
Advertisements

Fall 2003, Keith Curry Lance Conducts Study of School Libraries in Illinois. 657 Schools of all grade levels, enrollment ranges, and regions participated.
1 L U N D U N I V E R S I T Y Integrating Open Access Journals in Library Services & Assisting Authors in choosing publishing channels 4th EBIB Conference.
COUNTER Update Peter Shepherd Project Director COUNTER STM Innovations Seminar, 2 December 2005.
1 Use of Electronic Resources in Research Prof. Dr. Khalid Mahmood Department of Library & Information Science University of the Punjab.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
© 2012 Association for Computing Machinery Intro to the ACM Digital Library February 24, 2012 Intro to the ACM Digital Library February 24, 2012.
1 Post-1949 Chinese Local Gazetteers: Digitization and Collaborative Collection Susan Xue Electronic Resources Librarian University of California at Berkeley.
OCLC Online Computer Library Center Protecting the Investment: Economic Challenges of Digital Preservation Brian Lavoie Research Scientist OCLC Research.
Ithaka A Systemwide View of Library Collections Brian Lavoie, OCLC Research Roger C. Schonfeld, Ithaka CNI Spring Task Force Meeting April 5, 2005.
OhioLINK Collection Analysis Project ASIS&T Annual Conference 28 October 2008 Preliminary Analysis Edward T. ONeill, OCLC Julia A. Gammon, University of.
Journals: Subscriptions Substitutions Cancellations Chandra Prabha Research Scientist Carolyn Hank Research Intern* XXV Annual Charleston Conference November.
OCLC Online Computer Library Center Use of Circulation Statistics and Interlibrary Loan Data in Collection Management Lynn Silipigni Connaway, Ph.D. Office.
OCLC Online Computer Library Center What Can Be Learned From Usage Data Lynn Silipigni Connaway Research Scientist Mark Bendig Systems Analyst ASIST 2003.
OCLC Online Computer Library Center OCLC Research: Collection Assessment and Use Studies Lynn Silipigni Connaway Ed ONeill Chandra Prabha Mark Bendig Anya.
Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting.
OCoLR # OCLCR Making data work harder Lorcan Dempsey OCLC Members Council 17 May 2005.
Modeling Continuing Resources in FRBR [and More] Judith A. Kuhagen CPSO, Library of Congress FRBR Workshop - OCLC May 2, 2005.
Reflections of Reference Practice: Analyzing Virtual Reference Transcripts Presented by Marie L. Radford and Lynn Silipigni Connaway 2007 ALISE Conference.
OCLC Online Computer Library Center Steering Around the Iceberg: Economic Sustainability for Digital Collections Brian Lavoie Research Scientist OCLC Economics.
OCLC Online Computer Library Center Place and space: Collections and access in light of changing patterns of research and learning: a schematic view Lorcan.
RLG Programs Assessing Uniqueness in the System-wide Book Collection Constance Malpas Program Officer RLG Webinar 24 April 2008.
Cataloging Electronic Resources with OCLC CORC (Cooperative Online Resource Catalog) Special Libraries Association Conference Transportation Division June.
Kristīne Pabērza Ministry of Culture State Agency Culture Information Systems Latvia Member States' Expert Group on Digitization and Digital Preservation.
University of Washington and WorldCat Local Pam Mofjeld TexShare ILL Conference 15 February 2008.
Working together in difficult times: Challenges for academic libraries Sally Curry Research Information Network JIBS Conference York, 2 December 2010.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
LRS-V October 8,2010 Lynn Silipigni Connaway Senior Research Scientist Timothy J. Dickey Post-Doctoral Researcher I Dont Have to Know, I Go to One Spot:
Acquiring Chinese E-Books: Where to Start and How to Get Here-- University of Pittsburgh Library System's Experience Hong Xu March 24, 2007.
OPEN ACCESS SCIENTIFIC CONTENT SERVICES FOR THE GREEK HIGHER EDUCATION: AN OVERVIEW AND FUTURE DEVELOPMENTS Dimitris Kouis Nikolaos Konstantinou,
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
Final Report: NAHRS/MLA Magnet Coordinator Survey, July 2007  Pamela Sherwill-Navarro, Co-Chair, NAHRS Task Force to Create Standards for Nursing Information.
1 DIGITAL INTERACTIVE MEDIA Wednesday, October 28, 2009.
Primary Research Online at the G.W. Blunt White Library.
GRAPPLING WITH CHANGING REALITIES John Stratton, Lea Currie, Monica Claassen-Wilson, and Frances Devlin University of Kansas Charleston Conference November.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Embracing Digital Collections: Embracing Digital Collections: Access Issues and Practices for Academic Libraries Oregon Library Association Salem, OR April.
DART 261 Library Research Melinda Reinhart Visual Arts Librarian October 2010.
Using library resources for research Paul Johnson Bedford Library.
Orientation to Libraries Research Methods and Data College of Advancing Studies Brendan Rapple.
CERES AND COLORADO STATE UNIVERSITY LIBRARIES. PROJECT CERES Begun in 2013, Project CERES is a Center for Research Libraries Global Resources Agriculture.
A Seminar report On Electronic Resources :An Overview
OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill.
The Digital Journal Collection in Libraries -what Libraries Are doing -Impact on Scientists Carol Tenopir University of Tennessee
Title of the Poster. “Digital library services and their impact with reference to a developing country: The case of the Faculty of Health Sciences library,
OCLC Research Libraries Partners 10 June 2011 Robin Murray Vice President, Global Product Management OCLC Collaboratively Building Web-Scale with Libraries.
Introduction to Worldcat (OCLC) Presentation for PGDILIT Course By Dr.D.N.Phadke Coordinator,PGDILIT Contact: Mob
OCoLR # OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May
ERIC and the WorldCat Registry Lawrence Henry ERIC Program Manager Joanna White WorldCat Registry Product Manager.
OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting.
2: Getting started Is this a serial? –Mode of issuance Which issue should I use? –Basis for identification of the resource What sources within this issue?
OCLC Online Computer Library Center The ‘Hows’ and ‘Whys’ of Preserving Digital Materials Brian Lavoie Research Scientist OCLC CARL program: “Here Today,
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
FRBR: Cataloging’s New Frontier Emily Dust Nimsakont Nebraska Library Commission NCompass Live December 15, 2010 Photo credit:
OCLC Online Computer Library Center “HTTP 404: Not Found” Incentives to Preserve Government Information Brian Lavoie OCLC Research Sixth Annual State GILS.
How "Next Generation" Are We? A Snapshot of the Current State of OPACs in U.S. and Canadian Academic Libraries Melissa A. Hofmann and Sharon Yang, Moore.
The Catalog of the Future: Integrating Electronic Resources By Dana M. Caudle Cataloging Librarian Auburn University Libraries
OCLC Online Computer Library Center World Cat Cataloging Partners service Affordable records, delivered with your library materials.
Chapter 20 Asking Questions, Finding Sources. Characteristics of a Good Research Paper Poses an interesting question and significant problem Responds.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
AN ARCHETYPE FOR INFORMATION ORGANIZATION AND CLASSIFICATION OCLC WorldCat.
From the old to the new… Towards better resource discoverability
Summon discovers contents from one search box!
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Networked Information Resources
OCLC, WorldCat and Connexion
Digitization Standards: Issues & Updates
Sound Preservation: First Steps
Preserving Access for the Future
Presentation transcript:

Mining for Digital Resources: Identifying and Characterizing Digital Materials in WorldCat Brian Lavoie Lynn Silipigni Connaway Ed ONeill ACRL 12 th National Conference Minneapolis, MN April 9, 2005

More information about the OCLC Research Data Mining activity is available online:

Rising Digital Tide Equivalent of 5 exabytes of new information created in 2002; 92 percent stored on magnetic or optical media Lyman and Varian Rush to digitize: Cultural artifacts (images, audio, video, text) Published content (books, journals, databases) Communication (listservs, blogs, chat rooms) Government information (reports, data, forms, records) Survey of Academic Libraries: Average expenditure in 2003 on digital resources: $250,000 (8 percent increase) 40 percent of respondents intend to reduce spending on print resources in order to increase spending on digital resources

Purpose of Study Focused questions … Identify digital resources in WorldCat Bibliographic criteria for algorithmic identification Characterize digital materials: Cataloging activity; material types; holdings patterns … But also broader questions … Explore ways to use information in bibliographic records to generate new views of the catalog Large scale experiments with existing catalog records to see what can be done with legacy data Roy Tennant, Library Journal

Data Sources WorldCat: worlds largest and most comprehensive bibliographic database > 50,000 libraries worldwide use and contribute to WorldCat Copy of WorldCat from July 2004: ~53 million records Copy of WorldCat holdings file from July 2004: ~950 million holdings Caveats: No presumption that all (or even most) digital materials are cataloged in WorldCat Focus on cataloging practice and experimentation with bibliographic data

Identifying Digital Materials Standard MARC21 criteria: Type of Record: computer file [LDR/6 = m] Form of Item: electronic [008/23 or 29 = s] General Materials Designation: electronic resource [245 $h] Other criteria: Physical Description: electronic resource [007/0 = c] Electronic Location and Access [856 2 nd ind. = 0, no $3] Additional Materials/Form of Material: computer file/electronic resource [006/0 = m] Reproduction Note: electronic reproduction [533 $a]

Analysis of Other Criteria Analyzed records that did NOT meet any of the standard criteria, but DID meet at least one of the other criteria: RecallPrecision 007/0 = cVery HighLow nd ind. = 0HighLow 006/0 = mMediumLow 533 $aLowHigh Cataloging issues: Accompanying materials Separate record vs. combined record Mis-codings Opted for conservative strategy of using only standard criteria Wrote algorithm for automatic scanning of WorldCat

The WorldCat Digital Bucket WorldCat ~53 million records ~750,000 records (~1.5 percent) Digital

Dynamics Earliest Digital Record (lowest OCLC #): # : entered on September 11, 1975 American Antiquarian Society Data file on tape reel Latest Digital Record (highest OCLC #): # : entered on July 1, 2004 Mississippi State University Masters thesis in PDF format Rate of Growth: January 2004 – July 2004 Net increase of 1.8 million WorldCat records Net increase of 61,000 records describing digital materials: ~3 percent of total increase

WorldCat Cataloging Activity for Digital Materials: Number of Digital Records Entered, by Year (75-04) Contributed: 98% (WorldCat: 88%) Contributed: 98% (WorldCat: 88%)

Distribution of Digital Material Types in WorldCat (July 2004)

Digital Material Types in WorldCat: 1985 and 2004 (Percent of Total) Books: - 43 Computer Files: Government Docs: 1 14 Serials: - 6 Theses: - 3 Pamphlets: 1 3 Other:

Digital (e-)Books: Additional Characteristics Median Holdings: 1 (All books in WC: 3) Uniquely Held: 65 percent(All books in WC: 32 percent) Total Holdings: ~13 million(All books in WC: ~700 million) Percent of Total Holdings Set By: ARLs: 6(All books in WC: 23) Non-ARL academics: 71(All books in WC: 44) Publics: 13(All books in WC: 24) Digital books with at least one print equivalent cataloged in WorldCat:~88,000 Percent of digital books available online:70 percent

Looking Ahead … Murky Buckets? Early view: format most important feature of digital materials Implies one digital bucket But as number and variety of digital materials expand … Need for increasingly fine distinctions between buckets Online e-book requires 3 filters to surface in search results Format (digital), Means of access (network), Material type (book) Murky Bucket Syndrome: We cannot entirely, unambiguously slice and dice [large bibliographic databases] because of historic data entry and cataloging practices … that were not oriented toward our new needs Lorcan Dempsey, quoted by Roy Tennant in Library Journal Particularly troublesome for digital materials: Cataloging practices in flux; new types of digital resources

Conclusions Identification and categorization of digital materials: For now … need more work to identify consistent cataloging patterns in existing bibliographic records And for the future … need clear, stable practices for cataloging digital materials Benefits: End users (resource discovery based on new views of the catalog) Librarians (digitization priorities, collection analysis …) Processable catalogs: Make bibliographic data work harder!

More information … Paper forthcoming Contacts: Presentation to be posted on OCLC Research Web site: