Download presentation
Presentation is loading. Please wait.
Published byAlison Janis Matthews Modified over 9 years ago
1
New approaches to the catalog T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28
2
OCLC Founded 1967 Nonprofit membership organization > 53,000 libraries 96 countries ~1,000 employees Cataloging Interlibrary Loan Preservation Dewey Decimal Classification netLibrary FirstSearch
3
OCLC Research Research for both OCLC services Membership Metadata management Knowledge organization Content management Interoperability Systems & interaction design ~30 employees
4
What do users want? The right information – with minimum effort
5
How to give them what they want Catch them where they are Increase our data Improve our data Make the data work harder Interconnect with other systems Do all this efficiently
6
What has changed Computers and telecommunications User expectations Digital materials Remoteness of our users Huge amounts of bandwidth, storage
7
The competition Online booksellers Reviews Tables of contents Excerpts Inside-the-book searching Web search engines Speed Full-text searching Global coverage (of web resources) Good enough Ourselves Electronic journals
8
Current projects (my group) Live search Registries, PURLs Dewey browser Harvesting, electronic theses VIAF, LAF SRU/W, OpenURLs, OAI FRBR, xISBN Beowulf cluster Map-reduce Text searching Batch loading Open WorldCat WorldCat Wiki Publisher Names MXG
9
Other Research Projects FictionFinder, Curiouser Schema Transformation Terminology Services Digital Preservation Collection Analysis Dublin Core FAST User Studies Data mining Also: http://www.oclc.org/research/researchworks/http://www.oclc.org/research/researchworks/
10
Catch them where they are Google, Yahoo, etc. Open WorldCat Open URL OAI-PMH Creation too WCat Wiki Tags?
11
Open WorldCat
12
Editions
13
OpenURL OpenURL registry Supports version 1.0 Also registry of OpenURL servers Used for WikiD
14
WorldCat ‘Wiki’ Opening up WorldCat to user annotations Reviews Notes Tables of contents Cover art? Book lists? Based on WikiD software Full Wiki Many features off for WorldCat Uses OpenURL 1.0 protocol internally Allows collections of pages of arbitrary XML schemas Tools for the creation of simple collections Doesn’t look like a Wiki
15
Reviews
16
Tags? Folksonomies? User-generated key words We’ve been here before Is it different? Is there another direction?
17
Opening Dewey
19
More data Harvesting OAI-PMH ETDs Batch load 60 million records 3 million new manifestations Other Cover art Reviews WC
20
Better data and organization VIAF FRBR Authority files in general LAF Publisher names Genre FAST Registries PURLs Generalized solution? Get them nearer to creation
21
FRBR Work-set algorithm Keys based on author/title Authority files Auxiliary authority files xISBN Used for xISBN Open WorldCat FirstSearch (coming) Collection analysis (coming) Research
23
Authority Files LAF http://errol.oclc.org/laf/n82-54463.html Publisher names Not normally controlled Looking for variations with ISBN prefixes Also worked with dissertations
24
VIAF Merge national-level files Library of Congress (NACO) and Die Deutsche Bibliothek Bibliographic records analyzed 15% would be erroneous based just on names Basic matching now completed 435,000 matching names < 1% mismatched Working on Public interface OAI harvesting Persistent identifiers
26
Maj
27
Registries Show relationships between metadata Often associated with an identifier General solution? Examples Authority files WorldCat PURLs
28
Persistent URLs Map one URL to another http://purl.org/hickey/outgoing -> http://purl.org/hickey/outgoing http://outgoing.typepad.com/ 500,000+ PURLs 111 million resolutions Port to Wiki’D platform? http://www.oclc.org/research/projects/wikid/ String of PURL servers? Use OAI-PMH for synchronization Spread responsibility Generalized solution?
29
More connectivity Open URL RSS feeds OpenSearch, SRU/W OAI-PMH
30
OpenURL Developed to address the ‘appropriate copy’ problem Transitioning to OpenURL 1.0 OpenURL resolver Accepts requests specifying Resource Services Generalized syntax Specifying a resource Services to be performed Metadata elements specified in registry http://purl.org/openurl/
32
SRU Simplified version of Z39.50 Web based SRW – SOAP SRU – URL Even simpler? OpenSearch No search syntax Looking for common ground MXG Metasearch XML Gateway Simplifies metasearcher’s lives
33
OAI-PMH Method of harvesting metadata More generally, a way of synchronizing databases No real restriction to metadata Becomes a repository protocol Identifiers Timestamps Layered implementation OAI SRU Pears
34
Efficient processing Beowulf cluster Map reduce Text searching
35
Beowulf Cluster 24 nodes 2 processors, 4 gigabytes of RAM, 120 gigabytes disk Gigabit network Use it for FRBR processing Text indexing Text searching ~ 30-fold speed up on many tasks 1 year ⇒ 2 weeks 1 week ⇒ 1 day 1 day ⇒ 1 hour 1 hour ⇒ 2 minutes Extremely cheap processing
36
Map reduce Pioneered by Google Petabytes of data on thousands of nodes Adapted to our cluster Tens of gigabytes of data on dozens of nodes Simple functional programming paradigm Allows batch processing across cluster
37
Text Searching Spread database across cluster Two levels of aggregation 3 servers/node 24-way aggregation Aggregators run across cluster SRU used HTTP based SRW (SOAP) slowed it down Open source software
38
Better interfaces More interactive Live search Dewey Browser Better connected
44
Post-coordination of Services Systems that expose low level services Higher level coordination of those services Loosely coupled services Examples from OCLC Validation service RSS feeds SRU OpenURL, OAI-PMH xISBN DDC Browser built this way Very different interfaces have been built
45
DDC Browser XML swe
46
Do We Need It? Just have Google harvest everything Our experience with Google Fielded searching Reliable searching Possibility of user-supplied metadata Cost of good metadata Cost of non-existent metadata
47
Conclusions Shift to remote users Online availability – trend towards centralization More flexibility in implementations Patrons are better served Less emphasis on physical collections
48
Thank you T. Hickey http://errol.oclc.org/laf/n82-54463.html Swedish Library Association 2005 October 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.