Periodical Holdings Audit:

Slides:



Advertisements
Similar presentations
How to Create a Local Collection
Advertisements

EBSCO Discovery Service
Batches, Buckets and Bookbags Elizabeth B. Thomsen NOBLE: North of Boston Library Exchange EGILS2014.
Leeds University Library Implementing ERM at Leeds: planning and implementation Michael Emly 1 st September 2005.
BETH BRENNAN CHRISTINE MOULEN ELUNA 5/2/2014 Automating MARCit! for a single-record approach.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Conceptual Modelling Entity Relationship Model Overview Entities, Attributes and Relationship modelling Generating a Relational Database for an EAR model.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
Collections Management Museums Reporting in KE EMu.
MarcEdit Basics and Beyond By Mary Aycock Head, Catalog Department Missouri University of Science and Technology MOBIUS 2012 Conference.
Chapter 2 Basic SQL SELECT Statements
Coding for Excel Analysis Optional Exercise Map Your Hazards! Module, Unit 2 Map Your Hazards! Combining Natural Hazards with Societal Issues.
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
Using Excel for A – Z Analysis: ‘To Present’ items Jack Weinbender, Milligan College.
Link Resolvers: An Introduction for Reference Librarians Doris Munson Systems/Reference Librarian Eastern Washington University Innovative.
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
PHP meets MySQL.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Running a Report.  List Bibliography Report  Found under: All Titles Purpose : Creates customized bibliographies by catalog, call number, or item characteristics.
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. WHAT’S.
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
Education Full Text Searching. To search Education Full Text, I will need to start at the Rod Library Homepage.
Chapter 9 The Microsoft Access 2007 Window © 2007 Lawrenceville Press Slide 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
13-1 Sequential File Processing Chapter Chapter Contents Overview of Sequential File Processing Sequential File Updating - Creating a New Master.
ARABIC SCRIPT CATALOGUING at Georgetown University in Qatar Stefan Seeger MENA-IUG 5 th Annual Conference, Dubai 2010.
Files Tutor: You will need ….
13- 1 Chapter 13.  Overview of Sequential File Processing  Sequential File Updating - Creating a New Master File  Validity Checking in Update Procedures.
Using SQL for Patron Card Expiration Reminders For Norcal IUG – Nov. 20, 2015 At the Berkeley Public Library.
Chapter 7 Continued Arrays & Strings. Arrays of Structures Arrays can contain structures as well as simple data types. Let’s look at an example of this,
Bibliographic Record Description of a book or other library material.
Export Ad-Hoc to Excel Presented by Leigh Ann Leach Logan-Hocking Schools.
Data Mining What is to be done before we get to Data Mining?
Normalizing Data for Migration Kyle Banerjee
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Updating E-journal Holdings with Millennium Silver “Coverage Load” Carolina Innovative Users Group 2005 Meeting University of North Carolina at Charlotte.
1 Yoel Kortick Senior Librarian Working with the Alma Community Zone and Electronic Resources.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
N5 Databases Notes Information Systems Design & Development: Structures and links.
Unit 2 Technology Systems
Finding Magazine & Newspaper Articles in a Library Database
AGB 260: Agribusiness Data Literacy
Using JSTOR May 2016.
7 ways to clean up the catalog
DATA INPUT AND OUTPUT.
Internet Searching: Finding Quality Information
Judith Nagata, Electronic Resources Librarian
Bulk Editing Catalogue Records
Holdings – vital to library success
Applied CyberInfrastructure Concepts Fall 2017
HCT: The Library Catalogue
CAT FLAG Communication
Cleaning up the catalog: getting your data in order
Building A Web-based University Archive
Gary R. Cocozzoli Lawrence Technological University
Cumulative Index to Nursing and Allied Health Literature
Working the A to Z List enhance journal access in the OPAC
Intro To Design 1 Elementary School Library: User Sub-System Class Diagrams Software Engineering CSCI-3321 Dr. Tom Hicks Computer Science Department.
Journal separation anxiety
Library Content Comparison System
Chapter 27 WWW and HTTP.
Build Better Data: Best Practices for Catalog Cleanup CT Library Association, April 23, 2018 Diane Napert, Interim Director Monographic Processing Services,
Computer Science Testing.
Title I Interchange Data, Program Evaluation and Reporting (DPER) Office Unit of Federal Programs May 2016.
USING OPENREFINE FOR DATA-DRIVEN DECISION-MAKING
Spreadsheets, Modelling & Databases
Databases This topic looks at the basic concept of a database, the key features and benefits of a Database Management System (DBMS) and the basic theory.
Database Instructor: Bei Kang.
Database management systems
Presentation transcript:

Periodical Holdings Audit: Correcting Discrepancies and Improbabilities in Catalogs and Periodical A-Z Lists Ken Irwin Wittenberg University kirwin@wittenberg.edu

The Problem(s) Two systems contain periodical holdings information: The library catalog (III) The A-Z list (EBSCO Holdings Management/Full Text Finder) The two systems don’t always agree Historically maintained by two different departments Bad data has been copied from one system to another Data corrupted by EBSCO on original ingest

The Problem(s) Kinds of bad information: Disagreement between systems Disagreement between fields within a system Impossible data (e.g. Holdings: 2010-1940) Improbable data (e.g. 1869-Present) Especially improbable for niche publications Just plain incorrect (hard to detect)

Ken, you might be projecting… The Solution(s) Identify known problems Record problems found in real life Model the problem in a way detectable by algorithm “Algorithm”, sadly, does not imply the lack of grunt work Fix ’em all Imagine problems you don’t know you have You probably have those problems too Fix them too Ken, you might be projecting… Photo: Max Halberstadt, Public Domain

What does “fix” mean RESEARCH! Where data imported poorly to EBSCO, sometime the catalog alone is enough to clarify correct holdings statement Often have to check with print holdings in person

Methods A lot of Excel: PHP & MySQL Filter Copy filter results to new table Filter again PHP & MySQL If you are or have access to a programmer, almost any scripting language would do: Perl, PHP, Python, etc.

Catalog vs. EBSCO data structure Lib. Has: text: v.1(1960)-v.4:1(1971) No “real” date fields CustomCoverageBegin: date: 01/01/1960 CustomCoverageEnd: date: 12/31/1971 + CoverageStatement: text: v.1(1960)-v.4:1(1971) (optional) EBSCO’s format does not allow for volume/issue information in a structured way Only in the free-text, optional CoverageStatement field

Example 1: Records exist in FTF, not in catalog Scenario: Record was deleted from catalog after export to FTF, not deleted from FTF Process improvement: When records are removed or suppressed from the catalog, change them in FTF too. Disagreement

Example 1: Records exist in FTF, not in catalog Detection: Export “serlist” records from catalog Export records from FTF Trim urls / bib records 7-digits e.g. (b1262517) Compare using: “Compare two lists” from MIT Bioinformatics & Research Computing http://jura.wi.mit.edu/bioc/tools/compare.php Remove FTF-only titles from FTF

Example 1: Records exist in FTF, not in catalog http://jura.wi.mit.edu/bioc/tools/compare.php

Example 2: Coverage “to Present” & End Date EBSCO’s FTF metadata includes 3 columns related to holdings dates: CustomCoverageBegin (date only) CustomCoverageEnd (date only) CoverageStatement (free text, supports volume #, date, etc.) The use of multiple fields to cover the same information leads to the potential for discrepancies Disagreement

Example 2: Coverage “to Present” & End Date Detection: Excel filter: CustomCoverageEnd = ‘Present’ Excel find: Coverage statement contains ‘-v’ or ‘- v’

Example 3: Complex ≠ Simple holdings Scenario: FTF shows complex holdings and simple coverage statement OR Vice versa Disagreement

Example 3: Complex ≠ Simple holdings Detection: Excel filter: ‘|’ (pipe) in CustomCoverage Excel filter: does not contain ‘ , ’ (comma) in CoverageStatement And then: Decide what to do about it…

Example 4: Holdings "to present" but not listed as Retains Current In our library, most current subscriptions are held in “Current Periodicals” e.g.: “v.49(2012)-;Retains current volume in Current Periodicals.” Places where that statement is missing are suspect Some are legit, but many absences for current subscriptions indicate errors IMPROBABLE

Example 4: Holdings "to present" but not listed as Retains Current Detection: Excel filter: CustomCoverageEnd = ‘Present’ Excel filter: Coverage Statement does not contain ‘Retain’ Results: Some correct Some withdrawn Some should have had Current Periodicals statement

Example 4b: Vice Versa “Retains current” but end date does not contain ‘Present’ Found four total erroneous records Errors in catalog Errors in EBSCO ingest

Example 5: Special Collections to ‘Present’ We have very few titles in Storage or Special Collections with current subscriptions. There were 55 questionable titles Most: Catalog record was out of date Some: sloppy data ingest (e.g. a single volume or issue was recorded as the beginning of a series: e.g. n.10(1938)  n.10(1938)- IMPROBABLe

Example 5: Special Collections to ‘Present’ Detection: Limit holdings to PackageName = “THOMAS RARE” Or one of several other special collections locations Limit to CustomCoverageEnd contains “Present”

Example 6: Impossible Date Ranges Items with non-sequential holdings Lib. Has: n.16(1985),n.22(1987),n.24(1988),n.56(1964)-; Lib. Has: v.21(1896)-v.22(1987),v.28(1900)-v.86(1929) Detection: Did not find a good way to do this! Fixed them as we found them IMPOSSIBLE

IMPOSSIBLE Example 7: Old News We don’t have a science library anymore, but: Solution: Create List of Bib Records WHERE CHECKIN has ‘sci’ IMPOSSIBLE

Example 8: LibHas vs. CoverageStmt What if we just look for basic textual disagreement? LibHas statement (catalog) is textual different from the Coverage Statement in EBSCO Disagreement

Example 8: LibHas vs. CoverageStmt For each record, compare the catalog record with EBSCO’s url for the item Catalog: b10242582 URL: http://ezra.wittenberg.edu/record=b1024258~S0 Create List in catalog, export Record #, Title, LibHas. Export titles from EBSCO, including URL

Example 8: LibHas vs. CoverageStmt Match based on record # / URL, compare holdings statements I did this with a PHP script & MySQL database, comparing strings You could try Excel, something like: =INDEX(‘ebsco'!$V:$V,MATCH(B2,‘ebsco'!$E:$E)) but I had trouble getting this to work In the standard EBSCO export format $E:$E is the URL column, $V:$V is the Coverage Statement In this example, B2 contains a catalog URL to match on

Example 8: LibHas vs. CoverageStmt Results of this approach All records in main periodical collection (n = 2029) LibHas != CoverageStatement (498) Eliminate blank CoverageStatement (276) Control for varied spacing and quotation marks (219) Newly introduced by weeding project (156) Other errors (63) Limitation: Only works where CoverageStatement was defined

Future directions Improve staff workflows Periodic checks for data consistency Exploring further mechanisms for comparisons/tests