OCoLR # OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
OCoLR # OCLCR Overview Some context Looking at data in action OpenWorldCat FRBR Data mining
OCoLR # OCLCR Context: value Amazoogle: what should we be doing which fits into a world that they occupy. Where do we provide unique value. ROI: libraries invest in data but do not extract as much value as they might from it. Unless we release more value, then the argument for this investment becomes weaker. User: how do we co-create value with users. What opportunities are there for mixing catalog data and user contributed data? Management intelligence: how do we use data better to inform management decisions?
OCoLR # OCLCR Context: consequences The role of the catalog? The role of structured data? The role of the library?
OCoLR # OCLCR Data Open WorldCat FRBR WorldCat Wiki Management intelligence
OCoLR # OCLCR FRBR ‘Interim FRBR’ in OWC FRBR in research projects FictionFinder Curioser xISBN Algorithm Top 1000 FRBR in FirstSearch – late this year
Top Sets for Fiction (Records) RecordKeys 1,296defoe, daniel\ /robinson crusoe 1,267 carroll, lewis\ /alices adventures in wonderland 971 cervantes saavedra, miguel de\ /don quixote 828 stevenson, robert louis\ /treasure island 689 twain, mark\ /adventures of huckleberry finn 624 twain, mark\ /adventures of tom sawyer 618 swift, jonathan\ /gullivers travels
Top Sets for Fiction (Holdings) HoldingKeys 29,043twain, mark\ /adventures of huckleberry finn 26,088carroll, lewis\ /alices adventures in wonderland 20,843twain, mark\ /adventures of tom sawyer 19,410defoe, daniel\ /robinson crusoe 18,566cervantes saavedra, miguel de\ /don quixote 18,492stevenson, robert louis\ /treasure island 18,123dickens, charles\ /christmas carol
OCoLR # OCLCR Taking FRBR onto the open web Curio(u)ser
OCoLR # OCLCR MetaWiki WIKI – web pages metaWIKI – data Capture user input in structured ways
OCoLR # OCLCR Extending Wiki’s utility Wiki: supported markup: wikitext page editing: a single text block searches: full text searching collections managed: one per wiki MetaWiki: supported markup: wikitext structured data (e.g., MARC, METS, DC…) page editing: a single text block, or, field level searches: full text searching fielded searching collections managed: one/multiple per OaiWiki
Lorcan: note that this is a work in progress
OCoLR # OCLCR Management intelligence So we have all this data – what can it tell us? Several projects underway: only some discussed here
OCoLR # OCLCR Making Data Work Harder Activities “shed” data: Cataloging bibliographic information Web site traffic transaction logs Reference queries search term lists Need to mine this data for intelligence that creates value for libraries and users OCLC Research undertaking a number of data-mining projects aimed at: Knowing more about the characteristics of library collections Creating interesting and useful data displays Generating intelligence to support library decision-making
OCoLR # OCLCR Data mining OCLC has a new collection analysis service Some research projects looking at systemic questions described here.
OCoLR # OCLCR Looking at Library Print Book Collections … Systematically 32 million print books, representing 26 million distinct works Half of print books published after 1977; more than 80% still “in copyright” Rareness is common! Only a third of print books have more than five holdings; half have two or less OCLC/Ithaka collaboration: Use WorldCat to characterize the “system-wide” print book collection – i.e., aggregate print book holdings in WorldCat Intelligence of this kind can help establish digitization priorities and inform preservation planning More information: Only about 120,000 works had both print book and e-book manifestations
OCoLR # OCLCR The Implications of GooglePrint … Potentially covers about one third of print books in WorldCat ~60 percent of “GooglePrint” books held by only one of the Google 5 Less than 5 percent held by all of the Google 5 ~20 percent of “GooglePrint books” out of copyright Paper forthcoming …
OCoLR # OCLCR Know Your Audience! Implies: we can infer materials’ audience level from holdings patterns, which in turn can support: Collection management Readers’ advisory services Reference services Information retrieval Holdings represent selection decisions by librarians … implies there are about 1 billion individual selection decisions in the WorldCat holdings file Selections are made to serve the interests of a library’s target community … Associate target community (audience level) to particular library profiles - e.g., ARL, non-ARL academic, public, K-12 school … Paper forthcoming! ?
OCoLR # OCLCR “Last Copy”: Identifying At-Risk Materials ~23 million WorldCat records have only a single holding attached Libraries need to know what portions of their collections are: Rare … Rare and valuable … “Last copy” (artifact and/or content) Identification of rare materials essential intelligence in support of storage, digitization, and preservation decision-making Data-mining study of Vanderbilt holdings in WorldCat: Identified 23,000 items held uniquely by Vanderbilt ~60 % are print books ~60 % produced prior to 1950; ~25 % produced after 1970 Paper forthcoming!
OCoLR # OCLCR Thank you! OCLC Research: Lorcan: