NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006
Overview The problem The problem Quick demo Quick demo Technical overview Technical overview Implementation process Implementation process Use data Use data Assessment data Assessment data Next steps Next steps
Existing catalogs are hard to use: Existing catalogs are hard to use: – known item searching works pretty well, but … – users often do keyword searching on topics and get large result sets returned in system sort order – catalogs are unforgiving on spelling errors, stemming Why did we do this? NO RELEVANCY!
Catalog value is buried Subject headings are not leveraged in searching Subject headings are not leveraged in searching – they should be browsed or linked from, not searched Data from the item record is not leveraged Data from the item record is not leveraged – should be able to filter by item type, location, circulation status, popularity
What does the Endeca software do? Provides search software for ecommerce companies Provides search software for ecommerce companies Faceted browse of structured metadata; goal is to expose the ontology Faceted browse of structured metadata; goal is to expose the ontology
Endeca technical overview Raw MARC data NCSU exports and reformats Flat text files Data Foundry Parse text files Indices MDEX Engine NCSU Web Application HTTP Client browser HTTP Endeca Information Access Platform
Integrating Endeca - Enhancements MarcAdapter plugin for raw MARC data. MarcAdapter plugin for raw MARC data. – Eliminate need for external MARC 21 translation and file merging Partial Updates Partial Updates – Update circulation data multiple times throughout the day
Implementation process Timeline Timeline – License / negotiation: Spring 2005 – Acquire: Summer 2005 – Implementation: August 2005 – January 12, representative team members 7 representative team members – functional requirements, metadata, interface issues (total of hours) – project manager: approximately 10 hours per week for 20 weeks Java-trained librarian (30-40 hrs/wk for 14 weeks) Java-trained librarian (30-40 hrs/wk for 14 weeks) It doesn’t have to be perfect! It doesn’t have to be perfect!
Key decision points Search interface Search interface
Main search page Endeca Web2
Advanced search
A few major issues Search interface Search interface Selecting dimensions and their order Selecting dimensions and their order
10. Library of Congress Classification 9. Availability 1.Subject: Topic 2.Subject: Genre 3.Format 4.Library 5.Subject: Region 6.Subject: Era 7.Language 8.Author Dimensions
A few major issues Search interface Search interface Selecting dimensions and their order Selecting dimensions and their order Defining the relevance algorithm Defining the relevance algorithm
Relevance defined Relevance ranking in Endeca – select from a variety of modules and order them based on importance Relevance ranking in Endeca – select from a variety of modules and order them based on importance At NCSU… At NCSU… 1.Original query term(s) (no thesaurus, stemming, spell correction) 2.Exact phrase match 3.Field ranking (Title higher than Author higher than Table of Contents, etc.) 4.Number of fields that contain term(s) …
Use data
Some search statistics (March - May 2006)
Sorting statistics (March – May 2006)
Some navigation statistics (March - May 2006)
Assessment
Some user reaction “The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than our old online card catalog (and therefore that of most other universities). I've found myself searching the catalog just for fun, whereas before it was a chore to find what I needed.” - NCSU Undergrad, Statistics “The new library catalog search features are a big improvement over the old system. Not only is the search extremely fast, but seemingly it's much more intelligent as well.” - NCSU faculty, Psychology
Topical searching tasks
Average topical task duration
Testing relevance Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC? Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC? 100 topical user searches from 1 month in fall topical user searches from 1 month in fall 2005 How many of top 5 results relevant? How many of top 5 results relevant? – 40% relevant in Web2 OPAC – 68% relevant in Endeca catalog
Future plans FRBR-ized displays FRBR-ized displays FAST (Faceted Access to Subject Terms) instead of LCSH FAST (Faceted Access to Subject Terms) instead of LCSH Enrich records with supplemental content Enrich records with supplemental content More integration with website search More integration with website search Use Endeca to index local collections Use Endeca to index local collections
Thank you project page: