Presentation is loading. Please wait.

Presentation is loading. Please wait.

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Users and Uses of Bibliographic Data: The Promise and Paradox.

Similar presentations


Presentation on theme: "Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Users and Uses of Bibliographic Data: The Promise and Paradox."— Presentation transcript:

1 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Users and Uses of Bibliographic Data: The Promise and Paradox of Bibliographic Control NCSU Case study: Faceted Navigation Andrew K. Pace Head, Information Technology NCSU Libraries

2 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007

3 Agenda  NCSU’s Endeca-powered catalog  Data Reality Check  Relevance ranking in online catalogs  Statistics: what are patrons doing?  The Metadata Paradox (in 3 parts)  A brief wish-list for the future of bibliographic control

4 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 NCSU’s Endeca-powered catalog

5 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Rumsfeld’s Law of Bibliographic Control You search the data you have, not the data you wish you had.

6 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Existing catalogs are hard to use  Known item searching works pretty well (sometimes), but …  Lots of topical searches and poor subject access keyword gives too many or too few results – leads to general distrust among users authority searching is under-utilized and misunderstood  Relevance = system sort order  Impossible to browse the collection  Unforgiving on spelling errors, stemming  Response time doesn’t meet expectations of web-savvy users

7 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Valuable metadata is buried  Subject headings are not leveraged in keyword searching they should be browsed or linked from, not searched  Data from the item record is not leveraged should be able to easily filter based on user’s changing requirements using item type, location, circulation status, popularity

8 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 In a nutshell… "Most integrated library systems, as they are currently configured and used, should be removed from public view." - Roy Tennant, CDL

9 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 What’s the big picture?  Improve the quality of the library catalog user experience  Exploit our existing authority infrastructure (aka make MARC data work harder)  Build a more flexible catalog tool that can be integrated with discovery tools of the future.

10 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 “This-Gen” search tools  Proving that it’s possible to improve the search experience beyond the functionality that traditional online catalogs have supported.

11 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 What is Endeca?  Software company based in Cambridge, MA  Search and information access technology provider for a number of major e- commerce websites  Developers of the Endeca Information Access Platform

12 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Why Endeca?  Customized relevance ranking of results  Better subject access by leveraging available metadata (including item level data!) through facets  Improved response time  Enhanced natural language searching through spell correction, etc.  True browse

13 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Demo

14 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Data Reality Check

15 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Data Reality Check: Part I  Our Integrated Library System has ~80 MARC fields and subfields indexed in its keyword index  33 additional indexed fields (29% !) are not publicly displayed  Displayed fields use 37 different labels

16 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007  A lot of those same fields are indexed in Endeca (just much more quickly and efficiently)  ~50 MARC fields are indexed  New catalog has ~37 Properties and 11 Dimensions derived from ~160 MARC fields and subfields Data Reality Check: Part II

17 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Data Reality Check: Part III  Simple data is the best MARC4J to convert MARC into flat files Lots of stripping of punctuation…ugh Perl to update files

18 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking

19 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking  TF/IDF alone is inadequate for determining relevance order of bibliographic metadata  The Endeca MDEX Engine offers NCSU alternatives Matching techniques: matchall, matchany, matchboolean, matchallpartial  A suite of relevance ranking options are applied to Boolean-type searches

20 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking (cont.)  Individual RR modules are combined and prioritized according to our specifications to form an overall RR strategy, or algorithm  Current strategy includes seven modules, 5 of which rank results dynamically on things like: phrase, rank of the field in which term appears, weighted frequency 2 final rules provide static ordering based on publication date and aggregate circulation totals

21 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking: Challenge and Promise Challenge  Uncharted territory required a best-guess approach  More experimentation required with “matchallpartial” to provide an interface intuitive enough for users to know what is happening Promise  Having technology nimble enough to support experimentation with indexing and relevance strategies

22 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Statistics: What patrons are doing

23 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

24 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

25 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 19.4% Subj./Class July 06 – Jan 07

26 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

27 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

28 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

29 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

30 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Most Popular Dimension Values Physical  New:New (92,037)  Format:Book (40,183)  Availability:Available (33,125)  Library: D.H. Hill (33,091)  Language: English (22,668)  Format: eBook (21,177) Topical  LC Class: Q-Science (25,277)  Subject|Region: US (20,954)  Subject|Topic: History (20,861)  LC Class: T-Technology (16,951)  LC Class: H-Soc. Sci. (16,345)  Subject Topic: Bioethics (12,933) Out of 765,170 Navigation Requests July 06 – Jan 07

31 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 The Metadata Paradox

32 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Plugging Holes in the System  Natural language problem LCSH=United States—History—Revolution, 1775-1783 keyword=revolutionary war (834 hits) Subject keyword=“United States—History—Revolution, 1775- 1783” (3081 hits)United States—History—Revolution, 1775- 1783  Facets taken out of the free-floating and hierarchical context of LCSH can be misleading  I’ve followed many a tag cloud, but assuming that browsing is still popular, how does one browse keywords?

33 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #1 We finally have interesting discovery tools that make use of bibliographic data in ways that show us that the data are not completely adequate for use with the new discovery tools.

34 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 “Subject Keywords”  “[from Recommendations:] ….Abandon the attempt to do comprehensive subject analysis manually with LCSH in favor of subject keywords; urge LC to dismantle LCSH” -- Karen Calhoun, The Changing Nature of the Catalog and its Integration with Other Discovery Tools, report prepared for the Library of Congress, March 2006

35 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #2 “Subject keywords” should replace the controlled vocabulary from which the keywords themselves are most easily derived. Let’s build bridges between the mountains of bibliographic description so that we can tear down the mountains.

36 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 If not LCSH, then what?  Dissertation Abstracts Model Constrained list of subject terms Dissertations are at the edge of scholarship, so granular pre- existing thesaurus is inadequate Vocabulary cannot be “thwarted” by authors  Social Tagging Model Human intervention times n users New difficulty of matching tags between creators Contrary to the “literary warrant” that requires a hierarchical thesaurus to differentiate thousands of titles from one another  Full Text Model Good for currency: hypothetically uses the most contemporary language Harder for research: some sort of citation analysis required; but arguably could be solved computationally The question of scale (with some help from colleague Charley Pennell, Principle Cataloger for Metadata, NCSU)

37 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #3 Computational (e.g. non-human mediated) creation of subject-based facets will work perfectly once all the full text of every work is available in electronic format. What does a search and retrieval system for 50 million books and 50 million articles look like? G o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o g l e

38 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Bibliographic Control Wish List

39 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Bibliographic Control Wish List  A classification or subject thesaurus system that enables faceted navigation  A work identifier for books and serials  Something other than LC Name Authority for “organizations”  Physical descriptions that help libraries send books to off-site shelving and to patron’s mailboxes  Something other than MARC in which to encode all of the above  Systems that can actually use the encoding

40 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #4: The Ultimate Paradox “You’re damned if you do and you’re damned if you don’t.” - Bart Simpson

41 Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Thanks  NCSU project site (includes these slides): http://www.lib.ncsu.edu/endeca  Andrew K. Pace Head, Information Technology, NCSU Libraries andrew_pace@ncsu.edu


Download ppt "Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Users and Uses of Bibliographic Data: The Promise and Paradox."

Similar presentations


Ads by Google