The Future of the Online Catalog Andrew K. Pace NCSU Libraries July 28, 2006 Library Automation: Yesterday’s Technology, Tomorrow
What I will cover: Online catalog: the problem Online catalog: the problem Brief environmental scan Brief environmental scan Endeca: team, timeline, technology Endeca: team, timeline, technology Usability, statistical results, relevance study Usability, statistical results, relevance study Dis-integrated systems / Future Catalogs Dis-integrated systems / Future Catalogs
What ILS Catalogs Do Well… (liberally stolen from Roy Tennant) Inventory control: What and where Inventory control: What and where Known item searching Known item searching
Any search other than known item Any search other than known item Most Anything other than books (serials, e-resources, articles, digital objects) Most Anything other than books (serials, e-resources, articles, digital objects) Logical groupings of results (e.g. FRBR) Logical groupings of results (e.g. FRBR) Faceted browsing Faceted browsing Relevance ranking Relevance ranking Sideways searching (suggestions, expansion of searches and search targets) Sideways searching (suggestions, expansion of searches and search targets) What ILS Catalogs Don’t do Well… (liberally stolen from Roy Tennant, and augmented by me)
“OPAC Complainers” “There is certainly no dearth of OPAC complainers. You have Andrew Pace (OPACs suck), and Roy Tennant (You Can’t Put Lipstick on a Pig) writing and presenting about the need for change (more simplicity) in the OPAC world. I can appreciate their arguments for a simpler OPAC (not to mention the rest of the system) but other then [sic] present their arguments, neither has much in the way of suggestions nor have they sparked a movement among librarians or the automation vendors to do anything about the situation.” -ACRL Blog entry Oct
NextGen Library Search Tools RedLightGreen (RLG) OCLC Fictionfinder Vivisimo clustered search (Ex Libris, Serials Soltions) Grokker (EBSCO) Aquabrowser visual context Endeca Information Access Platform OCLC Custom Worldcat and OpenWorldCat Innovative Interfaces OPAC Pro & Encore Ex Libris Primo Polaris, AJAX-Enabled OPAC SirsiDynix Enterprise Portal System, FAST Talis, et al Web Services Georgia Pines and the Library 2.0 Bandwagon
Endeca purchase decision Lots of topical searches and poor subject access Lots of topical searches and poor subject access – Keyword gives too many or too few results – leads to general distrust – Misunderstanding of authority headings No relevancy ranking of results No relevancy ranking of results Needed more responsiveness (speed) Needed more responsiveness (speed)
Implementation Team 7 representative team members 7 representative team members – Andrew Pace, IT, Chair – Emily Lynema, IT, ex officio (tech lead) – Cindy Levine, Research and Information Services – Erik Moore, IT, ex officio (ILS librarian) – Charley Pennell, Metadata and Cataloging – Shirley Rodgers, IT – Tito Sierra, Digital Library Initiatives Timeline Timeline – License / negotiation: Spring 2005 – Acquire: Summer 2005 – Implementation: August 2005 – January 12, 2006
Technical Overview Endeca ProFind co-exists with SirsiDynix Unicorn ILS and Web2 online catalog. Endeca ProFind co-exists with SirsiDynix Unicorn ILS and Web2 online catalog. Endeca indexes MARC records exported from Unicorn. Endeca indexes MARC records exported from Unicorn. Index is refreshed nightly with records added/updated during previous day. Index is refreshed nightly with records added/updated during previous day.
Endeca ProFind Overview Raw MARC data NCSU exports and reformats Flat text files Data Foundry Parse text files Indices Navigation Engine NCSU Web Application HTTP Client browser HTTP Endeca ProFind
Endeca ProFind Overview Raw MARC data NCSU exports and reformats Flat text files Data Foundry Parse text files Indices Navigation Engine NCSU Web Application HTTP Client browser HTTP Offline - Nightly
Endeca ProFind Overview Raw MARC data NCSU exports and reformats Flat text files Data Foundry Parse text files Indices Navigation Engine NCSU Web Application HTTP Client browser HTTP Always Online
Integrating Endeca Endeca doesn’t understand MARC data / MARC-8 character encoding – translate to UTF-8 text files Endeca doesn’t understand MARC data / MARC-8 character encoding – translate to UTF-8 text files Each night a script updates the data indexed by Endeca: Each night a script updates the data indexed by Endeca: – Exports updated or new MARC records from Unicorn. – Reformats and merges these records with those already indexed. – Starts Endeca re-index – completely rebuilding index for the catalog. Process requires about 4 hours. Process requires about 4 hours. Retain Web2 OPAC for some functionality Retain Web2 OPAC for some functionality – Authority searching - known items and cross-references – Detailed record pages – how to make Endeca -> Web2 link?
Quick Demo
Some User Reaction “This is absolutely the coolest thing I've seen all century.” -Will Owen, Head of Systems (UNC Libraries) “Also, I'm really digging the new NCSU library catalog. Very nice." - Educause staff (non-librarian) - Educause staff (non-librarian) “The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than our old online card catalog (and therefore that of most other universities). I've found myself searching the catalog just for fun, whereas before it was a chore to find what I needed.” - NCSU Undergrad, Statistics
Basic statistics (March – May 2006)
Navigation statistics (March – May 2006)
Sorting statistics (March – May 2006)
Other interesting tidbits… (March 2006) Authority searching decreased 45% Authority searching decreased 45% Keyword searching increased 230% Keyword searching increased 230% – Caveat: default catalog search changed from title authority to keyword ~ 5% of keyword searches offered spelling correction or suggestion ~ 5% of keyword searches offered spelling correction or suggestion – 3.1% - automatic spell correction – 2.3% - “Did you mean…” suggestion
Usability Testing Trends 10 undergraduate students 10 undergraduate students – 5 with Endeca catalog – 5 with old Web2 OPAC Endeca performed as well as OPAC for known-item searching Endeca performed as well as OPAC for known-item searching – 89% Endeca tasks completed ‘easily’ (8/9) – 71% OPAC tasks completed ‘easily’ (15/21) Endeca performs better than OPAC for topical searching Endeca performs better than OPAC for topical searching – 61% Endeca tasks completed ‘easily’ (19/31) – 3% Endeca tasks completed as ‘hard’ (1/31) – 33% OPAC tasks completed ‘easily’ (13/39) – 26% OPAC tasks completed as ‘hard’ (10/39)
A study in relevance Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC? Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC? 100 topical user searches from 1 month in fall topical user searches from 1 month in fall 2005 How many of top 5 results relevant? How many of top 5 results relevant? – 40% relevant in Web2 OPAC – 68% relevant in Endeca catalog
Relevance defined Relevance ranking in Endeca – select from a variety of modules and order them based on importance. Relevance ranking in Endeca – select from a variety of modules and order them based on importance. Relevance most important in Keyword Anywhere - searches all fields. Relevance most important in Keyword Anywhere - searches all fields. At NCSU… At NCSU… 1.Original query term(s) (no thesaurus, stemming, spell correction) 2.Exact phrase match 3.Field ranking (Title higher than Author higher than Table of Contents) 4.Number of fields that contain term(s) …
Future Plans Ongoing tweaks: Ongoing tweaks: – Continued usability testing – Relevance ranking algorithms & spell correction thresholds – Additional browsing options Endeca 2.0 ideas Endeca 2.0 ideas – FRBR-ized display – Discussions with OCLC regarding FAST (Faceted Access to Subject Terms) and FRBR – Patron-generated refinements (folksonomies?) – Enrich records with supplemental Web Services content – more usable TOCs, book reviews, etc. – The death of authority searching (?) – More integration with QuickSearch, other data repositories, and third-party discovery tools
Stuff to read… Rethinking how we provide bibliographic services for the University of California by the Bibliographic Services Task Force Rethinking how we provide bibliographic services for the University of California by the Bibliographic Services Task Forcehttp://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf The Changing nature of the catalog and its integration with other discovery tools by Karen Calhoun The Changing nature of the catalog and its integration with other discovery tools by Karen Calhounhttp:// The Changing nature of the catalog and its integration with other discovery tools. Final report. March 17, Prepared for the Library of Congress by Karen Calhoun: A Critical review by Thomas Mann The Changing nature of the catalog and its integration with other discovery tools. Final report. March 17, Prepared for the Library of Congress by Karen Calhoun: A Critical review by Thomas Mannhttp:// A “Next Generation Catalog, Eric Morgan A “Next Generation Catalog, Eric Morgan Metadata Research Center, SILS Metadata Research Center, SILS University of Rochester eXtensible Catalog University of Rochester eXtensible Catalog Toward a 21 st Century Catalog, ITAL, Sept. 2006, by Antelman, Lynema, and Pace Toward a 21 st Century Catalog, ITAL, Sept. 2006, by Antelman, Lynema, and Pace
From the Calhoun Report "If one accepts the premise that library collections have value, then library leaders must move swiftly to establish the catalog within the framework of online information discovery systems of all kinds. Because it is catalog data that has made collections accessible over time, to fail to define a strategic future for library catalogs places in jeopardy the legacy of the world's library collections themselves. For this reason, the option of rejecting library catalogs is not considered in this report." "If one accepts the premise that library collections have value, then library leaders must move swiftly to establish the catalog within the framework of online information discovery systems of all kinds. Because it is catalog data that has made collections accessible over time, to fail to define a strategic future for library catalogs places in jeopardy the legacy of the world's library collections themselves. For this reason, the option of rejecting library catalogs is not considered in this report."
The library system pile “Seams serve as perceptible boundaries that provide points of reference; without such boundaries readers get ‘lost at sea’ and don’t know were they are in relation to anything else; they can’t perceive either the extent of what they have or what they don’t have.” “Seams serve as perceptible boundaries that provide points of reference; without such boundaries readers get ‘lost at sea’ and don’t know were they are in relation to anything else; they can’t perceive either the extent of what they have or what they don’t have.” -Thomas Mann
Wither or Whither the Catalog?
Reversal of fortune OLD SEARCH MODEL NEW SEARCH MODEL
The library system puzzle Catalog Serials A&I / FT DBs Web
The library system puzzle Catalog Serials A&I / FT DBs Web Digital Repositories ERM Systems Guided Navigation Legacy ILS Metasearch IR GS
Thank you. Andrew Pace, Head, IT