Terry Reese reeset@gmail.com Build your toolbox: In depth data manipulation with MarcEdit to prepare your data for the ANBD Terry Reese reeset@gmail.com.

Slides:



Advertisements
Similar presentations
E-books at CUNY LACUNY Cataloguing Roundtable November 5, 2009.
Advertisements

OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
MarcEdit: Doing more, but faster
Getting Started with MarcEdit
With Microsoft Access 2010© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Catalog: Batch delete old Patron Records How to conduct global/batch updates to records – patron Adding Faculty and Patron/Student Records Manually Standardizing.
BIBFRAME Projects at the
T ERRY R EESE ’ S M ARC E DIT : P RACTICAL U SES Jenn Nolte Middlesex Community College 25 April 2008.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
MS Access: Database Concepts Instructor: Vicki Weidler.
Sage Library Consortium Cataloging-in-Publication MARC record conversion.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
1 ADVANCED MICROSOFT WORD Lesson 15 – Creating Forms and Working with Web Documents Microsoft Office 2003: Advanced.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 5: Setting Up Global Accessibility.
Advanced File Processing
Global Update with Confidence Mary M. Strouse Innovative Users Group May 19, 2009.
MarcEdit Basics and Beyond By Mary Aycock Head, Catalog Department Missouri University of Science and Technology MOBIUS 2012 Conference.
XP New Perspectives on Microsoft Access 2002 Tutorial 51 Microsoft Access 2002 Tutorial 5 – Enhancing a Table’s Design, and Creating Advanced Queries and.
JUNE 13-15, 2011  LANCASTER, PENNSYLVANIA Cataloging with MarcEdit Doreen Herold Lehigh University Symphony Sharon Scott Cumberland County Library System.
Pharmacy Set up Bedside Medication Verification. Pharmacy Toolbox Parameters.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
1/14 ITApplications XML Module Session 2: Using and Creating XML Documents.
1 The following presentation is from the Oracle Webcast “What’s New in P6 EPPM Release 8.1.” As a partner, you may not use the Oracle Power Point template,
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May , 2009.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Excel 2010 by Robert Grauer, Keith.
Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Mind Your Metadata Geri Miller. Metadata in ArcGIS ArcGIS metadata goals Editing metadata Setting your metadata style Leveraging metadata in ArcGIS Importing.
Demystifying Batchload Analysis Yael Mandelstam Fordham Law Library AALL 2009 Annual Meeting.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
New Millennium Enhancements SEE HANDOUT. Release 2002 Improved record editor Easier to navigate to NEXT and PREVIOUS records (Ctrl [ and Ctrl ]) More.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Reading Flash. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also choose some.
Case study : creating a usable MARC file from a spreadsheet Thomas Meehan Head of Current Cataloguing UCL Library Services CILIP CIG Metadata.
Chapter 3 Automating Your Work. It is frustrating when you have to type the same passage of text repeatedly. For example your name and address. Word includes.
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
The Catalog of the Future: Integrating Electronic Resources By Dana M. Caudle Cataloging Librarian Auburn University Libraries
Strengthening Hybrid RDA/AACR2 Bibliographic Records La Donna Riddle Weber – November 2015.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall. 1 Skills for Success with Office 2010 Vol. 1, 2e PowerPoint Lecture to Accompany.
Automating Data Normalization and Clean-up.
SILO File Upload & Feedback System By Marie Harms State Library of Iowa August 18 & 19, 2010.
© 2015 Ex Libris | Confidential & Proprietary Yoel Kortick Senior Librarian Cataloging introductory flow.
Product Training Program
Lesson 5-Exploring Utilities
Introduction to MarcEdit
MARCEdit TNUIG 2016.
MARC extensions Yoel Kortick | Senior Librarian
Microsoft Word 2010.
Bulk Editing Catalogue Records
Metadata Editor Introduction
Core LIMS Training: Advanced Administration
Connection changes in Scribe Starting with version 7.9.0
Be Your Own data Mechanic
Cataloging introductory flow
Workshop on XML-Based Library Applications 5
ALEPH Version 22 Beginning Cataloging
MODULE 7 Microsoft Access 2010
Assistant lecturer Nisreen A. Jabr
Case Study: Fixing MARC data with MarcEdit and OpenRefine
Guide To UNIX Using Linux Third Edition
Session 2. Automating Legacy Data Cleanup Projects June 8, 2016
Vendor Records What to do?
E-Resources in Prospector
Doug Williams, Campbell County Public Library, September 22, 2017
Some Options for Non-MARC Descriptive Metadata
Designing and Using Normalization Rules
Skills for Success with Microsoft® Office 2010
Presentation transcript:

Terry Reese reeset@gmail.com Build your toolbox: In depth data manipulation with MarcEdit to prepare your data for the ANBD Terry Reese reeset@gmail.com

Data Files PowerPoint: http://marcedit.reeset.net/workshops/aussie/session2/aussie_2.pptx Data: http://marcedit.reeset.net/workshops/aussie/session2/data.zip

Session Themes Working with MarcEdit’s Regular Expression Language Regular Expression Samples Isolating and Manipulating Data How do I find data missing a particular field or subfield? Removing invalid control characters? Looking at MarcEdit’s Global Editing Functions Edit Shortcuts Working with XML Data How do they work? How can I change them?

MarcEdit Regular Expression Support Functions that presently support regular expressions Delete Field Edit Field Copy Field Swap Field Build New Field Validation Extract/Delete Selected Records

Microsoft’s Regular Expression language Concepts: Character escapes Anchors Character classes Grouping Qualifiers Substitutions Let’s open Regular Expression Language - Quick Reference.html or https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

How we use Regular Expressions in MarcEdit Your most important parts of the regular expression language are: Character escapes: \d\r\n\$\x## Character Classes [] & [^] Grouping Elements () Anchors: ^$ Quantifiers: *?+{#} Substitutions: $#

Examples Looking at regex_example.mrk using the replace function: Add a period to the 500 if it is missing Update the 300 to reflect electronic information Split the 856 into two fields, breaking on the $u.

Examples 1 Explanation: Add a period to the 500 if it is missing Find What: (=500 ..)(.*[^.]$) Replace With: $1$2. Explanation: (=500 ..) Searches for the 500 field. We leave two blanks because there are always 2 blank characters as part of the mnemonic format. The two periods which stand for any character. If we want to search for exact indicators, you’d place those values rather than the periods. (.*[^.]$) Take any characters, and match on a field where the last character in the field isn’t a period.

Examples 2 Add online resource information to the 300 field Example: Change: 300  \\$a 32 p. To: 300  \\$a1 online resource (32 p.) Explanation: (=500 ..) Searches for the 500 field. We leave two blanks because there are always 2 blank characters as part of the mnemonic format. The two periods which stand for any character. If we want to search for exact indicators, you’d place those values rather than the periods. (?<one>\$a)([^$]*) Capture the $a and then all data in the subfield until you get to the next subfield (if there is one)

Example 3 Split the 856 into two fields, breaking on the $u. Find What: (=856.{4})(\$u.*[^$])(\$u.*) (=856.{4}) Matches the 856 field (\$u.*[^$]) Match $u, but stop at the end of the subfield (\$u.*) Match reminder of field Replace With: $1$2\n=856 41$3

lcase/ucase MarcEdit’s regular expression engine includes to extension functions for dealing with case switching of characters. lcase & ucase Usage: (=450.{4})(\$a.)(.*) $1$2lcase($3) Example: Find the 500 with all upper case characters and convert the case of all values but the first letter in the sentence to lower case.

Example (lcase) Find the 500 with all upper case characters and convert the case of all values but the first letter in the sentence to lower case. Find What: (=500.{4})(\$a.)([A-Z .]*) Replace With: $1$2lcase($3)

Multi-Field Replacements By default, MarcEdit handles one field at a time when doing regular expressions. However, when you need to do evaluations against multiple fields, you can by adding /m to the end of your replacement in the Replace Function in the MarcEditor This is a special function added to the MarcEdit regular expression engine Lcase and ucase

Example Using regex_example.mrk Changing video disc to blue-ray in the 300 if the 538 is marked as blue-ray

Multi-Line Example

Isolating and Manipulating Data Question: Missing Fields 260 or 264 Publication Details - V1.mrk is designed to show how to remove records without either one of the Libraries Australia required data element fields 260 or 264. This is a tricky one because some records have only a field 260, only a 264, both, or none. Only records without both need to be removed. Two Answers: Extract Select Records with the field search. Check retain options and invert selections at the end Use the RDA Helper, target only the 260, and convert all 260s. Then you just have to find those missing a 264 (much easier)

Isolating and Manipulating Data Question: Move Field 001 Control Number to Field 035 System Control Number - V1.mrk is designed to show how to transfer the member’s local system number from field 001 (reserved for Libraries Australia numbers) to field 035. A second field 035 can also contain an OCLC record number. Answer: Lots of ways to do this – easiest – use the copy field data tool if you don’t need to make any edits Build New Field Tool if you need to make edits to the data before creating the new field and then Delete Field to remove the 001 if necessary.

Isolating and Manipulating Data Question: Invalid Control Code - Escape Code 07 Bell - V1.mrc and the associated log file Invalid Control Code - Escape Code 07 Bell - V1 - Warning Log File.txt is designed to show how to identify device control codes in MARC records. The text file should be viewed with a fixed-width font for best results. Hex code 07 Bell is in field 520. The log file does not show a character, because there is none. This is slightly harder because MarcEdit doesn’t know that these are not characters that you want. It also assumes you are working in MARC8, because these would be almost impossible to find in the Unicode data. Use the Find all to determine if any exist Use Extract Selected; using a regular expression and searching all fields

Isolating and Manipulating Data Question: How do I deal with line breaks in fields copied from other data (example, data in the 520) When the Data is in MARC When the Data is copied into MarcEdit’s mnemonic format

MARC Conversions This is really the heart of MarcEdit All utilities and functions interact with the MARCEngine in some fashion.

MarcEdit: crosswalking design MarcEdit model: So long as a schema has been mapped to MARCXML, any metadata combination could be utilized. This means that no more than two transformations will ever take place. Example: MODS  MARCXML  EAD

MarcEdit: crosswalking design MarcEdit Crosswalk model Pro Crosswalks need not be directly related to each other Requires crosswalker to know specific knowledge of only one schema Con each known crosswalk must be mapped to MARCXML.

MarcEdit Crosswalking model MARC21XML EAD FGDC MODS MARC Dublin Core

MarcEdit: Crosswalks for everyone

MarcEdit: Crosswalks for everyone What’s MarcEdit doing? Facilitates the crosswalk by: Performing character translations (MARC8-UTF8) Facilitates interaction between binary and XML formats.

MarcEdit Plugins

Working With UNIMARC Data Building off the XML Platform, Unimarc processing was added leveraging an XSLT and some coding to enable processing of data of any file size

MarcEdit and SQL Data MarcEdit includes and SQL Explorer Can be integrated into the MarcEditor Support SQLite and MYSQL Available in All version of MarcEdit Designed for large file sets (I’ve worked with up to 1 TB)

Questions