Capturing Untapped Descriptive Data: Creating Value for Librarians and Users Lynn Silipigni Connaway OCLC Research ASIST 2006 Conference November 9, 2006.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

OpenSIGLE Crossroads for Libraries, Research and Educational Institutions in the field of Grey Literature By Dominic Farace and Jerry Frantzen; GreyNet,
OpenDOAR The Directory of Open Access Repositories Bill Hubbard SHERPA Manager University of Nottingham.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
National Diet Library Digital Archive Portal - PORTA - Gateway to digital information in Japan April 3, 2008 Hideki Takeuchi Planning.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Charleston Conference 7 November 2008 Lynn Silipigni Connaway, Ph.D. Senior Research Scientist OCLC Research Timothy J. Dickey, Ph.D. Post-Doctoral Researcher.
What in the World? Geographical Representation of Library Collections in WorldCat: A Prototype Lynn Silipigni Connaway Clifton Snyder.
ASIS&T 2008 Annual Meeting Columbus, OH 28 October, 2008 Lynn Silipigni Connaway, Ph.D. Senior Research Scientist OCLC Timothy J. Dickey, Ph.D. Post-Doctoral.
OCLC Online Computer Library Center Use of Circulation Statistics and Interlibrary Loan Data in Collection Management Lynn Silipigni Connaway, Ph.D. Office.
OCLC Online Computer Library Center OCLC Research: Collection Assessment and Use Studies Lynn Silipigni Connaway Ed ONeill Chandra Prabha Mark Bendig Anya.
OCLC Research OCLC Online Computer Library Center 2006 WebWise Los Angeles, CA 17 February 2006 FictionFinder: Don Quixote to Graphic Novels Diane Vizine-Goetz.
A Brief Tour of WorldCats Mexican Landscape Dr. Brian Lavoie OCLC Research.
OCoLR # OCLCR Making data work harder Lorcan Dempsey OCLC Members Council 17 May 2005.
Programs and Research Public Private Agreements for Mass Digitisation Ricky Erway JISC Digitisation Conference July 2007.
XID Web services Xiaoming Liu Senior Software Engineer OCLC.
Metadata workshop, June The Workshop Workshop Timetable introduction to the Go-Geo! project metadata overview Go-Geo! portal hands on session.
Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.
6th International Conference on Social Science Methodology, Amsterdam, 2004 Searching and browsing multiple subject gateways in the Renardus Service Michael.
1 Finding bibliographic information about books on the WWW: an evaluation of available sources Maike Somers Librarian, Public Library, Niel Paul Nieuwenhuysen.
1 Answer to the Questions and Comments on the Services of the National Diet Library NCC 2007 Open Meeting Friday, March 23, 2007 Nobuya AIHARA Reader Service.
Linked Data, Discovery and Discoverability John McCullough Senior Product Manager, OCLC December 3, 2014 UCL Discovery and Discoverability.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Metadata for Digital Content Jane Mandelbaum, Ann Della Porta, Rebecca Guenther.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill.
Using institutional and library identifiers to ensure access to electronic resources NASIG 23rd Annual Conference “Taking the Sting Out of Serials” June.
OCLC Online Computer Library Center OCLC Research Eric Childress OCLC Research SHARES Meeting NYU New York, NY
Is Cataloging Dead: Advocacy for Bibliographic Control Randy Roeder and Rebecca Routh ILA/ACRL Spring Conference Davenport, Iowa March 3, 2008.
OCLC Research Libraries Partners 10 June 2011 Robin Murray Vice President, Global Product Management OCLC Collaboratively Building Web-Scale with Libraries.
CiNii Books is a service that provides information, which has been accumulated by NACSIS-CAT, on books and journals that are held in university libraries.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Nuovo servizio di arricchimento di OPAC. CATALOGUE ENRICHMENT OPACs are now much more than just catalogues Thanks to the internet, library users expect.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
OCLC Online Computer Library Center Strategic Partnerships: An International View 30 October 2003.
November 8, Global Competitive Internet Usage Forecasting Across Countries and Languages June Wei Department of Management/MIS College of Business.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
OCLC Research: an update Lorcan Dempsey
Understanding Virtual Users: Connecting Research to Practice Lynn Silipigni Connaway Consulting Research Scientist Clifton Snyder Software Engineer October.
OCoLR # OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
ERIC and the WorldCat Registry Lawrence Henry ERIC Program Manager Joanna White WorldCat Registry Product Manager.
OCLC Research OCLC Online Computer Library Center Research & New Technologies Interest Group 24 October 2005 DeweyBrowser & Curiouser Diane Vizine-Goetz.
OCLC Research: Selected projects Eric Childress Larry Olszewski Presentation for Dpto. Biblioteconomía y Documentación Universidad Carlos III de Madrid.
XXIX Annual Charleston Conference 5 Nov Timothy J. Dickey, Ph.D. Post-Doctoral Researcher OCLC Research Global Publication Profiles: Books as an.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
OCLC Research Webinar 16 Sept Timothy J. Dickey, Ph.D. Post-Doctoral Researcher OCLC Research Global Book Publication: Books as an Expression of.
World Cat World wide catalog of libraries in the U.S., Canada, and Europe.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Using institutional and library identifiers to ensure access to electronic resources NASIG 23rd Annual Conference “Taking the Sting Out of Serials” June.
9/26/2007OCLC Orientation & Services1 What is OCLC?
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Libraries in the History of Print Culture 10 Sept Timothy J. Dickey, Ph.D. Post-Doctoral Researcher OCLC Research Mining Global Library Records for.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
© 2006 Pearson Education Canada Inc. 3-1 Chapter 3 Database Management PowerPoint Presentation Jack Van Deventer Ward M. Eagen.
Directories Sajjad ur Rehman. Directories Lists of persons and organizations, systematically arranged, complemented by indexes Information about contact.
ADLUG Roma (Italy) What is known must be shared Building on the insights from OCLC Research.
24 November CERL Thesaurus. 24 November CERL Thesaurus Started as a compensation for the lack of global authority control within the HPB.
Metadata Services for Publishers Bruce A. Miller Publisher Services Executive April 27, 2010.
The ___ is a global network of computer networks Internet.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
AN ARCHETYPE FOR INFORMATION ORGANIZATION AND CLASSIFICATION OCLC WorldCat.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Database Design Hacettepe University
Onboarding Webinar 13 April 2019 Presented by and.
Networked Information Resources
Presentation transcript:

Capturing Untapped Descriptive Data: Creating Value for Librarians and Users Lynn Silipigni Connaway OCLC Research ASIST 2006 Conference November 9, 2006

WorldCat: July 2006 Total holdings: 1,071,507,045 Manifestations (records): 67,282,165 Works: 53,472,668 Digital Items: 1,571,803 Institutions: 26,236 Physical Items*: ~1.6 billion *Estimated Physical Items*: ~1.6 billion *Estimated

Origin of materials represented in WorldCat US 34% UK 9% Canada 3% Rest of World 40% Unknown 14%

Some aspects of Global WorldCat … Content Languages: % of WC non-English Top 5 non-English: German:4.5 million French:4.2 million Spanish:2.9 million Dutch:2.1 million Chinese:1.6 million Content Languages: % of WC non-English Top 5 non-English: German:4.5 million French:4.2 million Spanish:2.9 million Dutch:2.1 million Chinese:1.6 million Non-English Metadata Language: 9.3 million (20 languages) Top 5: Dutch:4.1 million Japanese: 0.7 million French:1.4 million Finnish: 0.7 million German:1.0 million Non-English Metadata Language: 9.3 million (20 languages) Top 5: Dutch:4.1 million Japanese: 0.7 million French:1.4 million Finnish: 0.7 million German:1.0 million Materials w/non-US origins: 35.3 million (52%) Top 5: UK:6.1 million Germany:4.0 million France:2.9 million Netherlands:2.2 million Canada:2.1 million Materials w/non-US origins: 35.3 million (52%) Top 5: UK:6.1 million Germany:4.0 million France:2.9 million Netherlands:2.2 million Canada:2.1 million

OCLC WorldCat TM : Decision-making Resource Collection management Cooperative collection development Comparative collection analysis Collection assessment Mass digitization Off-site storage Preservation Services Virtual reference Recommender services Systems Precision

OCLC WorldCat TM : Data Mining Research Projects Audience Level Publisher Name Server WorldMap

Audience Level: Rationale and Objectives Implies: we can infer materials audience level from holdings patterns, which in turn can support: Collection management Readers advisory services Reference services Information retrieval Holdings represent selection decisions by librarians … implies there are about 1 billion individual selection decisions in the WorldCat holdings file Selections are made to serve the interests of a librarys target community … Associate target community (audience level) to particular library profiles - e.g., ARL, non-ARL academic, public, K-12 school … ?

Example : Mother Goose

Publisher Name Server: Research Objectives Resolve for data mining and quality of WorldCat ISBN prefixes to publisher name Variant publisher names to a preferred form Complement Collection Analysis Service Librarians Publishers Capture and make available various attributes of individual publishers Location of publisher Language(s) of materials published Genre(s)/format(s) of materials published Dominant subject domain(s) of the publisher's output Parent company and subsidiaries

Publisher Name Server: Methodology Programmatically cluster publishers using ISBN prefixes Data clustering (The Free Dictionary) "The science of extracting useful information from large data sets or databases" Classification of similar objects into different groups Partitioning of a data set into subsets (clusters) Data in each subset (ideally) share some common trait Hand parse the entities and resolve ISBN prefixes

Publisher Name Server: Database To date >800 records Relational database, preserving hierarchical relationships Begins with high-occurrence entities to identify: Top 10 lists (USA, UK, Canada, Australia, Germany, France, Japan, Italy) Top university presses Mergers and acquisitions

Top U.S. Publishing Entities in WorldCat (22,680,201 total U.S. records)

Publisher Name Server: Database Database Fields: Publisher Name, Preferred Form Source of Preferred Form Former Names Variant Forms ISBN Prefixes HQ City HQ Country Other Cities URL Languages Formats DDC Subjects LCC Subjects Data Sources: U.S. Library of Congress, National Authority File, 110 (Corporate Name) field Books In Print Online (W.W. Bowker) The International ISBN Registry (K.G. Saur) Publishers Weekly Online Hoovers Handbook Online Standard and Poors Corporate Descriptions The Directory of Corporate Affiliations (DIALOG) Company websites DATA MINING

Entity-Parsing in a World of Mergers and Acquisitions Prentice-Hall, Inc. Pearson Education, Inc. Addison-Wesley Publishing Company Allyn and BaconDominie Press Benjamin/Cummings Publishing Company Scott, Foresman and Company HarperCollins Educational Publishers Longmans, Green, and Co. Pearson PLC Pearson CanadaPearson Technology Group Copp ClarkAdobe PressCisco Press Penguin Books Allen LaneLadybird BooksRiverhead Books Puffin BooksPutnam BooksBerkeley Publishing Group Avery

OCLC WorldMap TM : Objectives Geographically represent library data from UNESCO, ARL, and NCES Number of libraries Amount of library expenditures Number of volumes and titles Number of librarians Number of users

OCLC WorldMap TM : Objectives Research prototype Test geographical representation of WorldCat Titles and holdings by country of publication Support data mining research area Visually display mined data to ease review and analysis Internal use Sales and marketing External use Library collection assessment and comparison Complement the AAU/ARL Global Resources Network project Project of the Council on Library and Information Resources (CLIR)

OCLC WorldMap TM : Technology First implemented SVG Open standard maintained by W3C Simple XML file Young technology Browser support limited Requires plug-in Converted to Flash Browser compatibility Plug-in compatibility (if a plug-in was installed!) For a detailed comparison of SVG and Flash, see:

OCLC WorldMap TM

Potential Future Projects Audience Level Integrate into WorldCat.org and OPACS to limit searches and retrieved sources Publisher Name Server Integrate into OCLC Collection Analysis Service for publisher business intelligence WorldMap Subject information aboutness Language of item Content language Metadata language Holdings by country of library

Presentation will be available at Prototypes available at Project Web Site:

Questions and Discussion Contact Information: