Download presentation
Presentation is loading. Please wait.
Published byAmos Cross Modified over 9 years ago
1
The University of Cambridge Universal Catalogue: a work in progress Patricia Killiard Head of IT Services Cambridge University Library
2
Libraries in the University of Cambridge UC University Library Dependent libraries Medical Library Scientific Periodicals Squire Law Library Betty & Gordon Moore Library College libraries Departmental & Faculty libraries Affiliated Institutions Other libraries associated with the University
3
The Union Catalogue: Beginnings and growth Began in 1982 with the Union List of Serials – non-MARC records based on a printed list 19855 libraries began contributing short records for books to a Union Catalogue 1987UC first made available to the public with 53,000 records 200290+ contributing libraries New contributors are still joining Software was written in-house and continued to be used until 2002
4
Standards... Early records were subject to no bibliographic standards to encourage contributions Brief records due to cost of disk space in 1980s No Authority control, even today Independence of colleges, faculties and departments means no overall control of standards... consequences for the UC Serials records were non-MARC until 2002
5
Pre-2002 Union Catalogue Model Consortial model with duplicate bibliographic records No authority control Completely separate from the authority- controlled file for the University Library Separate Union List of Serials which was de- duplicated Can still be seen at http://linux01.lib.cam.ac.uk/Catalogues/OPAC /xunion.shtml
6
Pre-2002 Union Catalogue
7
Search Results in pre-2002 Union Catalogue
8
Cambridge Union List of Serials
9
Advantages and disadvantages of the old UC model Advantages Ability to request preferred 3 libraries first Some patron functionality, e.g. Patrons able to view books on loan Each library’s holdings could be distinguished immediately Disadvantages Lack of de-duplication in the main Union Catalogue Large numbers of search results Exclusion of the University Library holdings from the UC Separation of serials catalogue from monographs
10
Voyager vision for Cambridge Single de-duplicated Universal Catalogue incorporating all public databases, bringing University Library and other databases together Based on authority-controlled records All patron functionality possible through the UC Libraries able to retain local rights over records and patron functionality Local subject headings retained
11
From Consortial Catalogue to Universal Catalogue Department/Faculty and College databases in Voyager have multiple owning libraries - no record sharing Could move to a Union Catalogue module by allowing record sharing within databases but... –Requires political will –Is very slow since records would merge on a individual basis –Interim stage of merging confusing for patrons
12
Cambridge System Hardware Universal Catalogue Feeder databases Web Server
13
Hardware specifications Sun Fire 4800 4 x T3 arrays configured in 2 partner groups 2 x 4 x 750MHZ CPU’s 16GB memory (8GB for each domain) Disk space is: 2 x 18GB (used for Solaris) and 2 x 9 x 36GB (in one T3 partner pair) for each domain Domain A (Hookea) holds all production databases Domain C (Hookec) holds UC Web server = Sun 280R 2 x 750MHz UltraSPARC III processors 4GB memory 72GB disk Test server = Sun 220R
14
Cambridge Voyager Databases
15
De-duplication Indexes used: –010, 020, 022, 0350, 0359 Large proportion of records do not have ISBNs or LCCNs De-duplication is very loose Resulted in very low levels of de-duplication (3-15%) De-duplication may actually reduce as the file accumulates due to addition of older records without control numbers
16
Replace vs Merge in de-duplication Bi-directional merge profile should have been available in 2001.2 but not yet working Essential in order to preserve British Education Index and local subject headings in 650._4 and 650._7 Might be used in future to preserve other fields, e.g. 856 fields
17
Quality Hierarchy Leader/06Leader/17040$a040$d * * DLC * as **depfacaedb ab **depfacaedb as **depfacfmdb ab **depfacfmdb as **depfacozdb ab **depfacozdb as **collandb ab **collandb as **collpwdb ab **collpwdb as **otherdb ab ** otherdb * **cambrdgedb
18
Trial UC build no. 1: Aug 2001 First UC build with 2000.1.3 – built before remainder of system went live Contributing files were all test loads of data for all libraries - very slow to configure and build UC Phase 2 – should have had link back to holdings records but bug in 2000.1.3 prevented it from working Upgrade to 2000.2.1 needed to make it work (Oct 2001) No UB functionality Very generic build using only 010, 020, 022 and 035 to de-duplicate
19
Trial build no. 2: Nov 2002 2 databases: cambrdgedb and depfacaedb with 2001.2 Beta Bugs in Sysadmin affected –Duplicate detection profiles –Quality hierarchy –Bi-directional merge –Saving values in Sysadmin generally Build failed several times at pre-bulk stage
20
Trial no. 3: March 2003 Began March 2003, again with 2 databases Early problems with matching location codes and Oracle database names Further pre-bulk problems Delayed while databases were clustered in March and upgraded to 2001.2.1 in early April Build completed but –quality hierarchy failed to work –bi-directional merge –unable to test patron functionality
21
Production build 21 July Initial load began with 2 databases: cambrdgedb and depfacaedb Indexed and reviewed at this stage 22 August load of remaining databases began 28 August load and indexing complete Currently under review –Authorities not loaded –UB not yet enabled –Bi-directional merge not yet functioning
22
De-duplication in production build Cambrdgedb Processed 1,546,138 Added 1,493,243 Discarded 203 Rejected 2911 Replaced 49,779 Replacement level 3.2% Depfacaedb Processed 412,727 Added 339,408 Discarded 397 Rejected 59,397 Replaced 13,523 Replacement level 3.3% Collandb Processed 481,002 Added 260,311 Discarded 9593 Rejected 136,146 Replaced 749,51 Replacement level 15.6% Depfacfmdb Processed 352,619 Added 284674 Discarded 1419 Rejected 47,660 Replaced 18,866 Replacement level 5.3%
23
Newton OPAC
24
UC Search Results
25
Full Record View
26
Major issues to tackle De-duplication of short records with no match points at present Authority control in a non-authority controlled environment Presentation of results to users: –Display doesn’t support multiple libraries in database: shows database name as location rather than holding library –Public names in OPAC need to be revised to reflect multiple libraries - 60 characters is not always sufficient
27
Short record with no de-duplication:
28
Short record de-duplication Option 1: Additional indexes Creation of index solely for de-duplication purposes Manual matching by cataloguers Addition of local control number in matching records Accurate but extremely slow However, additional left-anchored indexes for de-duping, like 015 (BNB numbers) would help.
29
Short record de-duplication Option 2: Combining indexes is probably the best way to tackle the very large numbers of short records Algorithm to combine author, title, and publication date would be ideal Option 3: Upgrading all short records through retrocon projects - expensive and not justified if only purpose is de-duplication
30
Serials: a special problem Two types of serials records: –Short Union List of Serials records: identical for all libraries but multiple copies in each database –Upgraded serials records in all department/faculty and college databases Need to ensure that –Higher quality records from departments etc. take precedence –Former Union List of Serials records do not diverge by controlling standards as they are upgraded
31
Authority control in the UC Authority records from the University Library database will be loaded into UC Local authorities discarded from Voyager build No authorities in 7 out of 8 contributing databases Options? –Load authorities into all databases? Too much space –Introduce authority control into other 7 databases through Web authorities or copying authority records from cambrdgedb - problem of cleaning up existing records
32
Presentation of search results Patrons are interested in library holdings not database holdings Location Limits appear to be possible only by database not library May be able to work with access control groups and holdings sort groups Random order of MFHDs very confusing
33
Patron issues: UB environment... but not entirely Full patron functionality in the UC OPAC was part of the Cambridge contract but recalls, holds and call slip requests not yet working Patron records from all contributing libraries display in OPAC Books on loan, requests, blocks, fines and fees from all libraries display in OPAC Circulation clustered environment UB installed but no reciprocal borrowing
34
Top Enhancements Additional tools for de-duplication, preferably allowing combinations of indexes Fix for the multiple MFHDs being delivered in random order - incomprehensible to the user ISBN matching not ignoring text after first 10 digits (problem nos. 13283, 58877, etc.) –020 __ |a 0335203884 and –020 __ |a 0335203884(pbk) Link from the UC record to the record in the contributing database would be very useful for Cambridge
35
Can be seen at: http://hookec.lib.cam.ac.uk University of Cambridge Universal Catalogue
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.