Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Users and Uses of Bibliographic Data: The Promise and Paradox.

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

Database Searching: How to Find Journal Articles? START.
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Accessing and Using the e-Book Collection from EBSCOhost ® When an arrow appears, click to proceed to the next slide at your own pace. To go back, click.
SHARED COLLECTIONS, SHARED RECORDS? RESOURCE SHARING AT THE META-LEVEL Charley Pennell, NCSU - Natalie Sommerville, Duke TRLN Annual Meeting, 13 July 2012.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Summon: Web-scale discovery. Agenda Web-scale Discovery Defined How Summon Works Summon User Experience (live demonstration) Additional Resources.
Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
Opening the Door: using Endeca for a faceted catalog Emily Lynema NCSU Libraries MLC: Discovery & Access March 2, 2007.
Engineering Village ™ ® Basic Searching On Compendex ®
Search Engines and Information Retrieval
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Relevance ranking of results from MARC-based catalogues: from guidelines to implementation exploiting structured metadata Tony Boston and Alison Dellit.
Making sense of the data jumble Trinity College Library Dublin’s Discovery Solution Experience Arlene Healy & Charles Montague Digital Systems and Services.
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
SIRSI Online Catalog WLAC Heldman Learing Resource Center.
What difference a good tool? using Endeca for a faceted catalog Emily Lynema NCSU Libraries ACRL Delaware Valley Chapter Fall Program November 3, 2006.
The FCLA Endeca Project By Michele Newberry. M.Newberry2 Why ENDECA?  Already proven by NCSU  Build on NCSU’s work instead of starting from zero  Product.
To Browse or To Keyword? … that’s the question! Colorado Horizon Users Group April 21, 2006 Presented by Donna Spearman Cataloger Westminster Public Library.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
The Dis-integrated Library System of the Future Kristin Antelman NCSU Libraries October 28, 2005.
Search Engines and Information Retrieval Chapter 1.
Why Open-Source? No Vendor-Locking In a proprietary software --- Your supports lock with it. freedom to customize and improvements in software needs,
Connecting users to Collections Collection Development/Resource Sharing Conference March 26, 2009 Jean Phillips Florida Center for Library Automation
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
You are about to view an instructional presentation created in PowerPoint. Many of the slides have animated text. Please wait several seconds before advancing.
BEYOND THE OPAC: FUTURE DIRECTIONS FOR WEB-BASED CATALOGUES Martha M. Yee September 11, 2006 draft.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Basic Catalog Searching Rich Edwards Innovative Coordinator Washington State Library.
The FCLA Endeca Project By Michele Newberry. M.Newberry2 Current OPAC environment  Aleph 500 v.15.5  Heavily customized to reflect pre- implementation.
NCSU Libraries Endeca and faceted browsing: Giving the user a useful catalog Scott Warren NCSU Libraries South Carolina Library Association Annual Meeting.
WorldCat Local & World Cat Quick Start a new way to search your library’s resources and the world beyond.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
OPAC Search & Navigation. “OPAC Complainers” “There is certainly no dearth of OPAC complainers. You have Andrew Pace (OPACs suck), and Roy Tennant (You.
AURAK Library OPAC How to Access and Use AURAK Library Online Public Access Catalog (OPAC)? AURAK SAQR LIBRARY.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Implementing a Faceted Search Framework Emily Lynema & Andrew K. Pace NC State University Libraries ASIS&T Seminar April 9, 2007.
Endeca: a faceted search solution for the library catalog Kristin Antelman & Emily Lynema UNC University Library Advisory Council June 15, 2006.
WebOPAC is computerized online catalogue of the materials held in library. The OPAC consist of an index of the bibliographic data cataloged in the system,
Indexes and Abstracts: Dissecting the Resource By M. Leedy.
Implementation of a faceted catalog search solution Kristin Antelman & Emily Lynema NCSU Libraries Feb. 7, 2006.
A brief tour of Academic Search Premier. Agenda: Agenda: What is a database? What is a database? Searching keywords and using truncation. Searching keywords.
Unbundling the ILS: Deploying an e-commerce catalog search solution Andrew Pace & Emily Lynema NCSU Libraries April 12, 2006.
Sally McCallum Library of Congress
Discovery Tool Implementation: UGA Bill Clayton Assistant University Librarian for Systems University of Georgia Libraries GUGM, Macon State, May.
How "Next Generation" Are We? A Snapshot of the Current State of OPACs in U.S. and Canadian Academic Libraries Melissa A. Hofmann and Sharon Yang, Moore.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
A Faceted Interface to the Library Catalog Tito Sierra NCSU Libraries ALA Midwinter Meeting January 20, 2007.
SmartSearch. SmartSearch is the Library’s new improved Online Catalogue A single site searches all Library resources:  The Library Online Catalogue (ie,
Sitecore. Compelling Web Experiences Page 1www.sitecore.net Patrick Schweizer Director of Sales Enablement 2013.
Access to Electronic Journals and Articles in ARL Libraries By Dana M. Caudle Cecilia M. Schmitz.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
ONE LINK IN THE INTEGRATION OF DATA Name and subject authority G. Wakuraya Wanjohi 18 December 2010.
What is it that cataloguers and librarians fear the most?
Searching for Information
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
BIBFRAME at the Library of Congress
Introduction into Knowledge and information
IL Step 3: Using Bibliographic Databases
DATABASES By: Hanna Ben-Or Phone:
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Introduction to Information Retrieval
Presentation transcript:

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Users and Uses of Bibliographic Data: The Promise and Paradox of Bibliographic Control NCSU Case study: Faceted Navigation Andrew K. Pace Head, Information Technology NCSU Libraries

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007

Agenda  NCSU’s Endeca-powered catalog  Data Reality Check  Relevance ranking in online catalogs  Statistics: what are patrons doing?  The Metadata Paradox (in 3 parts)  A brief wish-list for the future of bibliographic control

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 NCSU’s Endeca-powered catalog

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Rumsfeld’s Law of Bibliographic Control You search the data you have, not the data you wish you had.

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Existing catalogs are hard to use  Known item searching works pretty well (sometimes), but …  Lots of topical searches and poor subject access keyword gives too many or too few results – leads to general distrust among users authority searching is under-utilized and misunderstood  Relevance = system sort order  Impossible to browse the collection  Unforgiving on spelling errors, stemming  Response time doesn’t meet expectations of web-savvy users

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Valuable metadata is buried  Subject headings are not leveraged in keyword searching they should be browsed or linked from, not searched  Data from the item record is not leveraged should be able to easily filter based on user’s changing requirements using item type, location, circulation status, popularity

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 In a nutshell… "Most integrated library systems, as they are currently configured and used, should be removed from public view." - Roy Tennant, CDL

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 What’s the big picture?  Improve the quality of the library catalog user experience  Exploit our existing authority infrastructure (aka make MARC data work harder)  Build a more flexible catalog tool that can be integrated with discovery tools of the future.

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 “This-Gen” search tools  Proving that it’s possible to improve the search experience beyond the functionality that traditional online catalogs have supported.

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 What is Endeca?  Software company based in Cambridge, MA  Search and information access technology provider for a number of major e- commerce websites  Developers of the Endeca Information Access Platform

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Why Endeca?  Customized relevance ranking of results  Better subject access by leveraging available metadata (including item level data!) through facets  Improved response time  Enhanced natural language searching through spell correction, etc.  True browse

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Demo

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Data Reality Check

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Data Reality Check: Part I  Our Integrated Library System has ~80 MARC fields and subfields indexed in its keyword index  33 additional indexed fields (29% !) are not publicly displayed  Displayed fields use 37 different labels

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007  A lot of those same fields are indexed in Endeca (just much more quickly and efficiently)  ~50 MARC fields are indexed  New catalog has ~37 Properties and 11 Dimensions derived from ~160 MARC fields and subfields Data Reality Check: Part II

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Data Reality Check: Part III  Simple data is the best MARC4J to convert MARC into flat files Lots of stripping of punctuation…ugh Perl to update files

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking  TF/IDF alone is inadequate for determining relevance order of bibliographic metadata  The Endeca MDEX Engine offers NCSU alternatives Matching techniques: matchall, matchany, matchboolean, matchallpartial  A suite of relevance ranking options are applied to Boolean-type searches

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking (cont.)  Individual RR modules are combined and prioritized according to our specifications to form an overall RR strategy, or algorithm  Current strategy includes seven modules, 5 of which rank results dynamically on things like: phrase, rank of the field in which term appears, weighted frequency 2 final rules provide static ordering based on publication date and aggregate circulation totals

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Relevance Ranking: Challenge and Promise Challenge  Uncharted territory required a best-guess approach  More experimentation required with “matchallpartial” to provide an interface intuitive enough for users to know what is happening Promise  Having technology nimble enough to support experimentation with indexing and relevance strategies

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Statistics: What patrons are doing

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, % Subj./Class July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Most Popular Dimension Values Physical  New:New (92,037)  Format:Book (40,183)  Availability:Available (33,125)  Library: D.H. Hill (33,091)  Language: English (22,668)  Format: eBook (21,177) Topical  LC Class: Q-Science (25,277)  Subject|Region: US (20,954)  Subject|Topic: History (20,861)  LC Class: T-Technology (16,951)  LC Class: H-Soc. Sci. (16,345)  Subject Topic: Bioethics (12,933) Out of 765,170 Navigation Requests July 06 – Jan 07

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 The Metadata Paradox

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Plugging Holes in the System  Natural language problem LCSH=United States—History—Revolution, keyword=revolutionary war (834 hits) Subject keyword=“United States—History—Revolution, ” (3081 hits)United States—History—Revolution,  Facets taken out of the free-floating and hierarchical context of LCSH can be misleading  I’ve followed many a tag cloud, but assuming that browsing is still popular, how does one browse keywords?

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #1 We finally have interesting discovery tools that make use of bibliographic data in ways that show us that the data are not completely adequate for use with the new discovery tools.

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 “Subject Keywords”  “[from Recommendations:] ….Abandon the attempt to do comprehensive subject analysis manually with LCSH in favor of subject keywords; urge LC to dismantle LCSH” -- Karen Calhoun, The Changing Nature of the Catalog and its Integration with Other Discovery Tools, report prepared for the Library of Congress, March 2006

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #2 “Subject keywords” should replace the controlled vocabulary from which the keywords themselves are most easily derived. Let’s build bridges between the mountains of bibliographic description so that we can tear down the mountains.

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 If not LCSH, then what?  Dissertation Abstracts Model Constrained list of subject terms Dissertations are at the edge of scholarship, so granular pre- existing thesaurus is inadequate Vocabulary cannot be “thwarted” by authors  Social Tagging Model Human intervention times n users New difficulty of matching tags between creators Contrary to the “literary warrant” that requires a hierarchical thesaurus to differentiate thousands of titles from one another  Full Text Model Good for currency: hypothetically uses the most contemporary language Harder for research: some sort of citation analysis required; but arguably could be solved computationally The question of scale (with some help from colleague Charley Pennell, Principle Cataloger for Metadata, NCSU)

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #3 Computational (e.g. non-human mediated) creation of subject-based facets will work perfectly once all the full text of every work is available in electronic format. What does a search and retrieval system for 50 million books and 50 million articles look like? G o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o g l e

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Bibliographic Control Wish List

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Bibliographic Control Wish List  A classification or subject thesaurus system that enables faceted navigation  A work identifier for books and serials  Something other than LC Name Authority for “organizations”  Physical descriptions that help libraries send books to off-site shelving and to patron’s mailboxes  Something other than MARC in which to encode all of the above  Systems that can actually use the encoding

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Paradox #4: The Ultimate Paradox “You’re damned if you do and you’re damned if you don’t.” - Bart Simpson

Library of Congress Working Group on the Future of Bibliographic Control ~ March 8, 2007 Thanks  NCSU project site (includes these slides):  Andrew K. Pace Head, Information Technology, NCSU Libraries