Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill.

Slides:



Advertisements
Similar presentations
Phil Shirley Cuyahoga Falls Library. Overview Word (2 Examples) Excel (8 Examples) Global Update (1 Example) Getting item barcodes into a review file.
Advertisements

Getting Started with MarcEdit
MILLENNIUM STATISTICS … fun for all!! Matt Polcyn August 6, 2004.
Accessing and Using the e-Book Collection from EBSCOhost ® When an arrow appears, click to proceed to the next slide at your own pace. To go back, click.
SHARED COLLECTIONS, SHARED RECORDS? RESOURCE SHARING AT THE META-LEVEL Charley Pennell, NCSU - Natalie Sommerville, Duke TRLN Annual Meeting, 13 July 2012.
Leeds University Library Implementing ERM at Leeds: planning and implementation Michael Emly 1 st September 2005.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Constructing the Memories Creating a Digital Collection Linda J. White, Digital Project Coordinator.
Library integrated system -Aleph Fang Peng Stony Brook University.
CHLA/HEARTH bibliographic metadata creation process.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
M AKING E - RESOURCE ACCESSIBLE FROM ONLINE CATALOG *e-books *serials Yan Wang Senior Librarian Head of Cataloging & Database Maintenance Central Piedmont.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Western Region Cadet Command
Global Update with Confidence Mary M. Strouse Innovative Users Group May 19, 2009.
MarcEdit Basics and Beyond By Mary Aycock Head, Catalog Department Missouri University of Science and Technology MOBIUS 2012 Conference.
WILIUG 1. June 2, 2005 Using Review Files with Millennium Rapid & Global Update jenny schmidt SWITCH Library Consortium.
Gadgets & More…. “Date Range” Gadgets Allows you to choose a specific date, before or after a date or a range of dates using the Workflows calendar.
RDA Implementation: What, Why, and How in One Hour Richard Guajardo & Stephanie Rodriguez University of Houston.
Vended Authority Control --Procedures and issues.
How to handle the Multitude Successfully handling thousands of E-Book records using MARCEdit and BIBLOAD reports Kelly Swickard Decker Library Maryland.
Cataloging and Metadata at the University Library.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Implementing ERMs: Opportunities and Challenges Jeff Campbell, Systems Librarian, UNC Chapel Hill Rebecca Kemp, Serials Supervisor, UNC Wilmington 2007.
Relational Databases Melton, Beth “Databases: Access Terminology and Relational Database Concepts.” 09/LPMArticle.asp?ID=73http://pubs.logicalexpressions.com/Pub00.
Writing macros and programs for Voyager cataloging Kathryn Lybarger ELUNA 2013 May 3, #ELUNA2013.
Z-Books: Hunting Down Zombie Ebooks Hiding in your Catalog Kathryn OVGTSL 2013#ovgtsl2013 May 17, 2013.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting.
Web Z: A Non-Programmers Perspective Sandy Card State University of New York at Binghamton March 23, 1999.
ECDL. Word processing Work with documents and save them in different file formats Choose built-in options such as the Help function to enhance productivity.
The physical parts of a computer are called hardware.
Introduction to Archon for CARLI Members Jen Masciadrelli, Library Systems Coordinator, CARLI Office Sarah Horowitz, Special Collections Librarian, Augustana.
Enhancing the OPAC with the Virtual Shelf Browser Renata Dyer Manager, Systems and Electronic Services High Court of Australia Library.
Web Discovery and Millennium Integrating Millennium with Summon Helen Bronleigh Library Systems Coordinator.
Case study : creating a usable MARC file from a spreadsheet Thomas Meehan Head of Current Cataloguing UCL Library Services CILIP CIG Metadata.
I. Understanding Record Loading and EDIS II. Database Statistics & Top 10 Search III. Problem with merging records IV. Pseudo Tag (Special 035 Tag ) V.
Loading Bibliographic Records Online and in Batch Pat Riva Romance Languages Cataloguer/ Bibliographic Database Specialist McGill University
Creating Staff Accounts. Objectives that will be covered; 1.Exporting the correct data from your school’s MIS system 2.Preparing the data for upload 3.Importing.
Chapter 28. Copyright 2003, Paradigm Publishing Inc. CHAPTER 28 BACKNEXTEND 28-2 LINKS TO OBJECTIVES Table Calculations Table Properties Fields in a Table.
The ___ is a global network of computer networks Internet.
The Periodical Cat Amy Kreitzer College of St. Catherine Library.
Creating MARC Records for CLICnet Using E-journal Title Lists Amy Kreitzer College of St. Catherine Library.
Creative Solutions for Managing eResources Workflow Laura Edwards and Kelly Smith OVGTSL Conference May 21, 2010.
Richard Wisneski OVGTSL Conference May  Kelvin Smith Library works primarily with Ingram/Coutts  Cataloging services are through SkyRiver  Integrated.
Chapter 7 Creating Templates, Importing Data, and Working with SmartArt, Images, and Screen Shots Microsoft Excel 2013.
Copyright 2007, Paradigm Publishing Inc. EXCEL 2007 Chapter 8 BACKNEXTEND 8-1 LINKS TO OBJECTIVES Import data from Access, a Web site, or a CSV text file.
E-books in the Catalog: Managing MARC Records in Batches Bonnie Figgatt Sacred Heart University Library April 15 & 16, 2011.
NC LIVE Titles Common Problems Ralph Kaplan 3 April 2003.
Updating E-journal Holdings with Millennium Silver “Coverage Load” Carolina Innovative Users Group 2005 Meeting University of North Carolina at Charlotte.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Introduction to MarcEdit
Chapter A - Getting Started with Dreamweaver MX 2004
Managing Copyrights in Invenio
Bulk Editing Catalogue Records
Metadata Editor Introduction
Creative Solutions for Managing eResources Workflow
Tools and Techniques to Clean Up your Database
Tools and Techniques to Clean Up your Database
List Creation with Millennium
Working the A to Z List enhance journal access in the OPAC
ALEPH Version 22 Beginning Cataloging
Therese - Good morning, (Introductions)
Seek and They Will Find Improving Discoverability for Online Resources
Library Content Comparison System
Build Better Data: Best Practices for Catalog Cleanup CT Library Association, April 23, 2018 Diane Napert, Interim Director Monographic Processing Services,
Vendor Records What to do?
E-Resources in Prospector
Doug Williams, Campbell County Public Library, September 22, 2017
BasicSafe Enhancements Update to employee CSV file uploads
Presentation transcript:

Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill

Photo by dannybirchalldannybirchall BRICOLAGE “construction (as of a sculpture or a structure of ideas) achieved by using whatever comes to hand; also : something constructed in this way” –m-w.comm-w.com

A map A bit of context – my institution and my role in it My favorite load table – gathering bib records Extended example of “data bricolage” for cleaning/enhancing bib records Script to verify full text access to ebooks Script/program to summarize data exported from Millennium

Photo of Davis UNC by benuskibenuski -University of Chapel Hill -Large institution, ARL member -6,048,337 catalog results (not exactly what’s in our III backend, but gives an idea of scale) -3 administrative units -+/- 30 branches and specialized collection locations ->1060 item locations -Part of Triangle Research Libraries Network, sharing: -Endeca OPAC -Physical storage space -Some MARC records -Some acquisitions -1 staff member with load table training “It’s complicated…”

My official job – E-resources cataloger Managing & loading batches of MARC records for ebooks Individual cataloging of Web sites, online databases, and some ebooks Oversee maintenance of URLs in catalog records (new!) Extraction of our catalog data from Millennium for use in our Endeca OPAC

My official job – Tools of the data bricoleur

My unofficial job – “Fixer” So, HathiTrust requires very specific info in their metadata for an ingest… Image from QuotesPics.comQuotesPics.com Oops, a lot titles in that big ebook package we just cancelled were on EReserve. How can we identify them? This branch library has an old Access database of items they want to put in the catalog… We need a way to easily work with payment data outside Millennium for a serials review!

MY FAVORITE LOAD TABLE gathering, cleaning, & enhancing records

BACKGROUND: a pre-existing workflow Spreadsheet from Internet Archive Scribe manager: Spreadsheet >> MarcEdit Delimited Text Translator:

BACKGROUND: A pre-existing workflow Compiled to.mrc and loaded with locally-created load table that: Matches on bnum (907) for overlay Protects ALL fields in existing record (LDR, Cat Date, etc… everything) Inserts any fields from the new stub record (will create dupe fields) Creates new item

“How can I get these back into a review file?” b b b b b b b b b b b b … “You can’t, really.” (me) Why am I telling you about this old thing?

What if I loaded stub records containing nothing but the bnum? (me) On load, check “Use Review Files” box It works! We toggle item creation in the load table as needed (trivial tweak)

THE SAVINE SAGA cleaning & maintaining catalog records

Savine Digital Library home:

Local Millennium Record OCLC Master Record

+3600 local records OCLC records became

list of bnums not associated with new URLs new URLs manually identified for each bnum initial list of catalog bnums for Savine records (but for print only… oops) new URL for each bnum

Local Record Strategy - Create review file of all bib records with 856 matching old db URL - Export data from Millennium/open in Excel… (table name = mill) - New worksheet w/new DB info (table name = contdm) Hmm… these bnums won’t match…

Local Record Strategy - Add 8-character bnum to mill table - Copy entire bnum8 column - “Paste special > Values” back in the same place

Local Record Strategy - VLOOKUP formula to grab new URLS from contdm table contdm table mill table (some columns hidden)

Local Record Strategy - Identify pattern in missing new URLs - Create new table (name = urlmatch)

Local Record Strategy - In mill table, clear out NEW URL column

Local Record Strategy - In mill table, repopulate NEW URL with VLOOKUP from urlmatch

Local Record Strategy - Use MarcEdit Delimited Text Translator to create “stub records”

Local Record Strategy - Global update on review file of Savine records -Delete all old 865s containing |uhttp://rbr.lib.unc.edu - Load stub records with my favorite load table -New URLs added

OCLC Record Strategy Batch search OCLC#s into local OCLC save file Validate/correct as necessary Use MARCedit/OCLC plugin to open local save file in MARCedit Copy all to new MARCedit file Delete old URLs, Save Merge in new URLs from “stub” record file created w/OCLC# and new URLs Copy merged records back into file created by plugin Save records from plugin MARCedit file back to local OCLC save file Batch replace records in OCLC Connexion

Other bricolage projects using my favorite load table - SpringerLink ebook records -950s (subject module) were deleted from many records -In SpringerLink title list: DOI url, Subject module -In Millennium: bnum, DOI url -Stub records with bnum (907) and new Alexander Street Press (ASP) records released without OCLC nums -From ASP: ASP record ID, OCLC num -From Mill: bnum, ASP record ID -Stub records with bnum (907) and new 035

BEYOND THE URL CHECKER A script to verify full-text access to ebooks

Access checker: Ideally, vendors would provide us with: – MARC records for ALL items to which we have full access – NO MARC record for items to which we have restricted access Reality is not ideal. Example: SpringerLink e-books new MARC records a month The problem addressed

URL CHECKER ACCESS CHECKER !=

Data souces: – Extract from MARC file pre-load using MARCedit – Export from Millennium Create Lists post-load URL must be final column – One URL per row Any number of columns can be included before the URL Access checker: Script use: input

Access checker: Script use: running the script In Windows Powershell:

Access checker: Script use: running the script In Windows Powershell:

Access checker: Script use: running the script In Windows Powershell:

Access checker: Script use: output

Access checker: Other info Looks at the “landing page” for each URL – does not download or harvest any full text content Written in JRuby Open source – Code available from GitHubCode available from GitHub Instructions for use also at GitHub – I tried to write them for people not familiar with using scripts

DEALING WITH PAYMENT DATA A script to summarize PAID data from order records

Payment data processor: Millennium will export payment data from Create Lists of order records BUT the format of the exported data makes it virtually unusable. – 9 payment field columns, repeated One row in the output below had data all the way to column ST! The problem addressed

Payment data processor: Script outputs either: – One payment per line – Payments summarized by fiscal year The solution

Exported.txt file from Millennium Create Lists Payment data processor: Script use: input

Payment data processor: Script use: running the script You can run the Ruby (.rb) script from the command line BUT Everyone using this at UNC just double-clicks on the.exe

Payment data processor: Script use: running the script

Payment data processor: Script use: running the script

Payment data processor: Script use: running the script

Payment data processor: Script use: output

Payment data processor: Script use: running the script

Payment data processor: Script use: output

Payment data processor: Other info Written in Ruby Open source – Code available from GitHubCode available from GitHub Instructions for use also at GitHub – I tried to write them for people not familiar with using scripts

Questions? Photo by theunquietlibrarian on Flickrtheunquietlibrarian on Flickr