Fearless Transformation: Applying OpenRefine to Digital Collections

Slides:



Advertisements
Similar presentations
Holdings Management Overview
Advertisements

Possibility in Digital Collection Management Introduction to CONTENTdm TM Hitoshi Kamada University of Arizona Presentation for OCLC-CJK Users Group Annual.
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
Increasing the Visibility of Full-Text, Electronic Format Journals Matt Hall Serials Solutions, LLC.
Single Search By Rakphao Theppan, librarian Searching Online Resources.
CONTENT: A model for collaborative database building Trevor Bond Alan Cornish Washington State University Libraries.
BAYLOR UNIVERSITY STREAMING RECITAL RECORDINGS Ready, Set ….. Wait for it …..
LSTA Digital Imaging Grants Presentation Projects Workshop September 13, 2002 Wendy Sistrunk Music Catalog Librarian University of Missouri—Kansas City.
1 Agenda Views Pages Web Parts Navigation Office Wrap-Up.
ASERL IT/Digital Initiative Interest Group Webinar February 9, 2010 Stephen Cunetto Administrator of Systems Mississippi State University Libraries
Tame Your Data with OpenRefine GIL User Group Meeting May 14 th, 2015 Tricia Clayton Collection Services Librarian Georgia State University.
CONTENTdm for Beginners September 25, 2012 by Kourtney Blackburn.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Taking Action: Linked Data for Digital Library Managers Silvia Southwick and Cory Lampert UNLV Digital Collections American Library Association Annual.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
National Sea Grant Library The New Library System and Publication Submittals Communications Staff Tutorial October 2014 National Sea Grant Library The.
Data Management Console Synonym Editor
Reconciling Ebook Packages at NCSU Libraries Eric Hanson, Electronic Resources Librarian Christee Pascale, Associate Head Xiaoyan Song, Electronic Resources.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Maps and their textual associations in a digital collection: a report from the Early Washington Maps project. Trevor Bond, Special Collections Librarian.
Reports and Learning Resources Module 5 1. SLMS Primary Administrator Training Module 5: Reports and Learning Resources 2.
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Libraries and Museums Jenn Riley Metadata Librarian Indiana University Digital Library.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
Persistent Digital Archives and Library System (PeDALS)
Implementing an RDF Schema for Pathology Images, From the Association for Pathology Informatics Jules J. Berman, Ph.D., M.D. APIII, Pittsburgh, PA Monday,
Mike Bolam Metadata Librarian Digital Scholarship Services University Library System //
MS-Access XP Lesson 4. Modifying Queries 1.Select query in queries 2.Click design button or Right click on query and click design view 3.Change query.
Panmun Co., Ltd. Approval Plan & Data loading Service March 24, 2015.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
NDIIPP Access Project Building on Metadata NDIIPP Partner Meeting June 25, 2009.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
Using OpenRefine in Digital Collections: the Spencer Sheet Music Project Bruce J. Evans Cataloging & Metadata Unit Leader/Music and Fine Arts Catalog Librarian.
GEOVISUALIZATION: VISUALIZE THAT ON A MAP Sarah G. Park April 14, 2016.
Using “batches” to create workflows in CONTENTdm for shared projects within the library Jill Ellern, Systems Librarian Paromita Biswas, Metadata Librarian.
Sharing Your Finding Aids in CONTENTdm Encoded Archival Description (EAD) Files in Mountain West Digital Library June 3, 2009 Sandra McIntyre, Mountain.
Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston.
Introduction to EBSCOhost
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Data Cleaning with Open Refine:
Yoel Kortick Senior Librarian
TE004 Smart Change Management with Sage CRM Component Manager
Holdings Management Overview
Metadata Editor Introduction
Adding a File to a Course
Building A Web-based University Archive
The Shortest Distance…
Mining Citation Data Using the Web of Science API
In Search of Useful Collection Metadata
Denise D. Novak, Acquisitions Librarian, Carnegie Mellon University
Digging into Linked Data: Perspectives from the Long Tail
Understanding the Primo Back Office
Getting started With Linked Data.
*Includes adding a 658 Tag*
Library Content Comparison System
Adventures in ETD metadata wrangling:
Build Better Data: Best Practices for Catalog Cleanup CT Library Association, April 23, 2018 Diane Napert, Interim Director Monographic Processing Services,
ArchivesSpace Migration
USING OPENREFINE FOR DATA-DRIVEN DECISION-MAKING
Designing and Using Normalization Rules
Introduction to EBSCOhost
The Life-Changing Magic of OpenRefine
‘Splitting’ the MUSIC format
RSA 2019, Toronto Preconference day March 16, AM-1PM
Alternate graphic representation 880 field
Metadata supported full-text search in a web archive
ADMINISTRATION A guide to setup and manage your innovation platform…
Using FAST (Faceted Application of Subject Headings) in CONTENTdm
Presentation transcript:

Fearless Transformation: Applying OpenRefine to Digital Collections Kara Long Catalog and Metadata Librarian Baylor University Libraries @thekaralong

Agenda: The Spencer project Re-examining the workflow Implementing OpenRefine Other uses for OpenRefine in digital collections

the Beginning Project started in 1999 TexTreasures Grant to digitize 1,000 pieces Descriptive metadata loaded into ILS Static HTML was programmatically generated and placed on server

Adding Partners That completed the phase 2003 - Moved to CDM Started back up in CDM --------- 30/month; an obvious metadata bottleneck (77 years to complete the project) 2008: Flourish outsourced music cataloging

the Workflow To set up outsrc [workflow] Catalog: send you an XML 300-line PERL script TSV This many parts -- BUGGY 2013 - Personnel change Kara Do we continue...XSLT...PERL OpenRefine at a TCDL post-conference workshop OpenRefine has taken us from this to...

the Workflow Revised Direct to TSV CDM ready Transformations take place by MD group, not programmers. Efficient, fast. Hand it over to Kara to discuss the OpenRefine process in more detail.

MARC

of Data Loss!

OpenRefine Interactive Data Transformation tool (IDT) Open source Runs locally Interactive like a spreadsheet - but more powerful Programmable like a database - but more exploratory

OpenRefine Creating a new project Rename and reorder MARC fields Join values Split values Re-format dates Remove unnecessary punctuation, delimiters, etc. Add new fields for the digital collection

OpenRefine Renaming columns Columns are the primary units of interaction. Column names must exactly match our CDM field names in order for upload the metadata. MARC 100 → Composer

OpenRefine Re-ordering columns Once all the fields have been re-named, they can be re-ordered under the All columns menu.

OpenRefine Joining Values Transform data with Google Refine Expression Language (GREL) Joining the 245$a and 245$b to create the Title field

OpenRefine Splitting values The 246 must be split into two or three fields: Alternative Title First line of verse First line of chorus

OpenRefine Splitting values Know your data!

OpenRefine Extract and save operation history

OpenRefine Identifying clean up in existing CONTENTdm collections Text faceting Custom text facets Identifying duplicates

Invaluable Resources http://openrefine.org/ http://freeyourmetadata.org/ https://github.com/OpenRefine/OpenRefine/wiki/GREL-Functions Verborgh, Ruben, and Max De Wilde. Using OpenRefine. Birmingham: PACKT Publishing, 2013. Van Hooland, Seth, and Ruben Verborgh. Linked Data for Libraries, Archives, and Museums: How to Clean, Link, and Publish Your Metadata. Chicago: Neal-Schuman, 2014.