Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Kind of Catalog Charley Pennell Principal Cataloger for Metadata North Carolina State University North Carolina Library Association 2007.

Similar presentations


Presentation on theme: "A New Kind of Catalog Charley Pennell Principal Cataloger for Metadata North Carolina State University North Carolina Library Association 2007."— Presentation transcript:

1 A New Kind of Catalog Charley Pennell Principal Cataloger for Metadata North Carolina State University North Carolina Library Association 2007

2 Where is this talk headed? Local motivation National trends What is Endeca? Features Does Endeca work? Where are we going from here? Where is everybody else going?

3 Why a new catalog? What was wrong with the old one?

4 A little TRLN catalog primer TRLN libraries (Duke, NCCU, NCSU, UNC- CH) jointly develop and maintain BIS, 1985- 1992 DRA implemented for catalog (UNC & Duke continue Acq/Serials modules), 1991-1993 No integrated keyword/browse capability, 1993-1999 Web2 catalog implemented, 1999- Sirsi & DRA “merge” in 2002; Taos DOA

5 A little TRLN catalog primer 2 NCSU & NCCU to Unicorn; Duke to Aleph; UNC-CH to Millenium, 2003-2004 Sirsi/Dynix merger, 2004: vendor focus shifts (even more) toward school/public market While agreeing to continue to support Web2, S/D increasingly looking to merge all product catalogs into single interface

6 What was the catalog lacking? Simplicity: a simple, hopefully uncluttered interface Interactivity: ways to interact with results to get better results Forgiveness: just fix my typos and case errors, don’t make me feel stupid! Response time: always Real-time sorting: the limit is how many?!! Relevance ranking: as if! Web services: use the Web to repurpose data, enable mash-ups, add-ons & improvements

7 Which interface is ready for immediate use?

8

9 So, why DOES everyone think that the catalog sucks stinks? "Most integrated library systems, as they are currently configured and used, should be removed from public view." - Roy Tennant, OCLC

10 The old model

11 The integrated library system Historically, the ILS developed as an inventory control system for use by library staff only First library automation systems (Plessey, CLSI, Geac, Innovative) were designed around circulation or acquisitions functions Interaction time was calibrated to the slow pace of backroom work where the audience was basically captive Staff focus on known-item searching, not resource discovery

12 The catalog as part of the ILS The first integrated OPACs were veneers on top of existing inventory management systems—patrons & staff competed for system resources! They still do! First OPACs allowed for browse only; early keyword searching restricted to certain fields (A/T/S) only Libraries with no IT support were stuck with what their vendor provided and the enhancement process for improvements Libraries with IT support created their own systems: BIS, NOTIS, Clarement Colleges, Georgetown, PALS, DOBIS/LIBIS

13 The state of the ILS in 2007 Customer demands for increasing functionality in a marketplace with little $$ to spend has reduced the ILS vendor pool through mergers and buyouts New functionality (multi-search, ERMS, E-Ref, ILL, etc.) increasingly being met by stand-alone and third party applications Increasing competition from open source (Koha, Evergreen, Scriblio, LibraryThing) and e-commerce Q: Is our dogged adherence to MARC the only thing keeping the remaining ILS vendors afloat?

14 The state of the catalog 2007 Library users’ search expectations have been conditioned by interactions with commercial Websites and Google, with which Libraries can barely afford to compete, but must Libraries are becoming increasingly virtual as users interact with us online (e-resources, Second Life) User expectations for online experiences are more interactive, instantaneous, and inviting

15 Perhaps most importantly… The information resources represented in the catalog represent a shrinking percentage of what end users need or want Calhoun’s Aristotelian vs. Copernican views of the catalog

16 What do users want from the OPAC? Make subject searching in online catalogs easier using post- Boolean probabilistic searching with automatic spelling correction, term weighting, intelligent stemming, relevance feedback, and output ranking Streamline users' book selection decisions at the catalog by adding tables of contents and back-of-the-book indexes to cataloging (i.e., metadata) records Reduce the many failed subject searches by expanding the online catalog with full texts—journal and newspaper articles, encyclopedias, dissertations, government documents, etc. Increase finding strategies in online catalogs through the library classification -- Markey, Karen (2007). “The online library catalog: Paradise lost and paradise regained”, D-Lib Magazine, 13(1/2).

17 “Many researchers express surprise at the brevity (from one to three words) of the queries people submit to online systems. Belkin tells why so few words make up their queries, "Precisely because of the inquirer's lack of knowledge about a problem area, it is impossible to specify what would resolve it." For Belkin, the saving grace is the inquirer's ability to recognize what he or she wants or does not want during the course of the search. Therein lies an important solution to the problem— information systems that report results for easy eyeballing and instantaneous recognition of relevant possibilities.” – Karen Markey

18 What is an Endeca?

19

20 A software company based in Cambridge, MA A search and information access technology provider for a number of major e-commerce websites Developers of the Endeca Information Access Platform

21 Endeca features Commercial- strength search/sort speeds Site customizable relevance ranking Faceted browse True browsing (LC classification) Spell-checking ”Did you mean?” Automatic word stemming

22 Endeca at NCSU Libraries Went live in January 2006 Works with a text version of a daily snapshot of Libraries’ MARC & other metadata Used to improve the discovery portion of the library catalog Interoperates with ILS for holdings, current availability status Web2 interface still present for known item & authority searching

23 Implementation timeline License / negotiation: Spring 2005 Acquire: Summer 2005 Implementation: August 2005 : vendor training September 2005 : finalize requirements October 2005 – January 2006 : design and development January 12, 2006 : go-live date Widen to TRLN partners: Winter 2008

24 Implementation Team Implementation Team brought together from IT, DLI, Cataloging, Collections, Reference, Circulation Worked on indexing, UI, usability testing, etc. Areas of contention Number of initial search boxes (1 or 2) Order, grouping of facets Placement of classification hierarchies, breadcrumbs Use of “search” and “browse” on tabs Visualization aided by Tito’s wireframes

25

26 8 th (and Final) Revision: Aggregate holdings information by library. Reduces complexity of continuing and online resources. Brief view vs. Full view gives user choice about displaying holdings.

27 NCSU Endeca features Facets Results Call # browse Breadcrumbs

28 Features we started with Faceted browse Availability facet Breadcrumbs Spell check / Did you mean Hierarchical subject browse based on LCC Fuzzy link to live Web2 data New book browse for titles added in last week only

29 Features that we’ve added New book browse based on relative date (last week, last month, last three months) RSS feeds based on user results “Search within” results Send search to TRLN partners Static unique link to live Web2 data

30 Relevance ranking Based on locally customizable algorithm: Most relevant: query exactly as entered For multi-term searches: phrase match Field match title match more relevant than notes match Other factors: number of fields matched weighted frequency static ordering (publication date, circulation stats)

31 Faceting at the NCSU Libraries Follows on what we have learned from the commercial Web search model Mines metadata already available via MARC record, local class number, ILS item categories, circ status, and date stamping Required massive clean-up of 6xx subdivisions Allows both pre- and post-coordinate limits Uses table mapping to enable drilling down through call number results

32 Facet refinements Availability Author Library Format Language New(ness) LC Classification Subject: Topic Subject: Genre Subject: Region Subject: Era

33 A single facet need not represent data from a single field Single Unicorn item types (Book, Kit, Manuscript, Map, Data set) Multiple Unicorn item types (Audio, Microform, Thesis/Dissertation, Software & Multimedia, Videos) Leader byte 07 (Bib lvl): Journal, Magazine Library (Online)

34 Ranking facet results by number of postings makes sense in a short list, but not in a long list

35 The author facet is less useful in some types of searches …

36 … than others!

37 Technical overview Raw MARC data NCSU exports and reformats Flat text files Data Foundry Parse text files Indices MDEX Engine NCSU Web Application HTTP Information Access Platform

38 MARC ingest MARC  flat text file(s) for ingest by Endeca. Transformation accomplished with MARC4J. Opportunity to manipulate data on the back-end.

39 Transformed data

40 The end result… Video

41 Other Endeca library catalogs Phoenix Public Library: http://www.phoenixpubliclibrary.org/ http://www.phoenixpubliclibrary.org/ McMaster University: http://libcat.mcmaster.ca http://libcat.mcmaster.ca Florida Center for Library Automation http://catalog.fcla.edu/ http://catalog.fcla.edu/ Individual Florida universities http://fs.catalog.fcla.edu/, etc. http://fs.catalog.fcla.edu/

42 Does Endeca work?

43 Problems: authority control Endeca is a keyword search engine; “browse” can only be effected using sort options There is no authority control within Endeca itself, rather it relies on AC within ILS To make use of available metadata, subjects were split along subdivisions. Authors were not Talks were held with the vendor to explain the potential for drawing on authority x-refs to collocate searches

44 Problems: subject context Problems with wrong delimiter values (esp. $v) Problems maintaining context in atomized LCSH One-way relationships English language$vDictionaries$xSpanish Chronological headings devoid of geographic context Cuba$xHistory$yRevolution, 1959 Phrase headings expressed in multiple subdivisions Prisoners$xAbuse of

45 Problems: subject hierarchies Chronological hierarchy not built into $y “19 th century” does not subsume 1800-1809, 1801-1861, 1809-1817, 1815- 1861, 1817-1825, Civil War, 1861-1865, etc. Geological periods exist as text only (Ordovician, Pleistocene, etc.) Some chronological headings are expressed as text in 650$a Middle Ages Nineteen sixties Geographic hierarchy not consistent between 651 and 650 $zNorth Carolina$zRaleigh $aRaleigh (N.C.) BT/NT/RT relationships from authority file lacking

46 Some potential solutions Search behavior education FAST (Faceted Application of Subject Terminology) Web2 x-refs to redirect searches to Endeca Combining $z hierarchies Hierarchy lists

47 What do our users think?

48 “The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than our old online card catalog (and therefore that of most other universities). I've found myself searching the catalog just for fun, whereas before it was a chore to find what I needed.” - NCSU Undergrad, Statistics “The new library catalog search features are a big improvement over the old system. Not only is the search extremely fast, but seemingly it's much more intelligent as well.” - NCSU faculty, Psychology

49 Usability testing

50

51 Usage statistics

52 Newness wearing off? March ‘06 - May ‘06 July ‘06-January ‘07

53 July 06 – Jan 07

54

55

56 Where are we going from here?

57 Future directions Additional hierarchies (geographic names, dates) Make use of NAF, SAF, particularly cross-reference structure Massage underlying metadata Addition of Date Cataloged – Done! Addition of LC Class numbers to e-resources – Done! FRBR work numbers/records? – Tested! FAST headings? Accommodation of true browse for all indexes

58 Future opportunities Expanding the scope of the implementation to the 10M records in TRLN (Duke, NCCU, NCSU, UNC- Chapel Hill) Enrich catalog through external web services: book jackets, reviews, TOC, etc. – Amazon, OCLC. LibraryThing, Bowker Syndetics Build use-case based cross-application shopping cart functionality Integrate catalog w/other tools through web services—“Free the Data”

59 Web services…

60

61 Mobile device searching

62

63 Where is everybody else going? Catalogs detaching themselves from ILS Detached data lends itself to experimentation Don’t have to throw out baby with bathwater when better interfaces come out Data itself safe and secure in ILS MARC becoming superfluous; MARC’s granularity NOT! Social interaction: reviews, folksonomic tags, ratings

64 Phoenix Public Library on Endeca

65 III’s new faceted catalog, Encore

66 ExLibris Primo at Vanderbilt

67 Athens County, OH—Koha Zoom open source

68 Georgia PINES—Evergreen open source

69 Casey Bisson’s Scriblio

70 Danbury Public powered by LibraryThing

71 OCLC WorldCat Local at UW

72 Thanks for listening! Charley Pennell Principal Cataloger for Metadata NCSU Libraries North Carolina State University Raleigh, NC 27695-7111 cpennell@ncsu.edu More info at: http://www.lib.ncsu.edu/endeca/


Download ppt "A New Kind of Catalog Charley Pennell Principal Cataloger for Metadata North Carolina State University North Carolina Library Association 2007."

Similar presentations


Ads by Google