Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012
John Cook Wyllie Library Ebook titles in OPAC & Ebook packages on web in finding aids Rate of e-book acquisition increased netLibrary – 3k titles per year EBSCOhost Ebook Academic Collection – 65k titles initial load – 5-10k titles additional every quarter 2
Batch Loading Problems Existing procedures were difficult to follow Procedures were inconsistent – especially for different vendors Didn't take advantage of MARCEdit Tools 949 holdings field now includes $a class# – previously, files loaded with AUTO “call#” 3
Solution? Wish list? Determine quality of MARC records – OCLC files vs. other vendor files Determine editing priorities – required (001/949), recommended, optional Learn to construct Regular Expression Strings – Batch Editing Tools & Find/Replace Streamlined format – needed both an outline & more detailed info Make available on-line/web-page 4
MARCEdit proficiency Beginner Advanced Beginner – Uses MARCEditor Tools window (Add/Delete field, Edit Subfield Data, Sort by... ) – Can apply Regular Expression Strings Intermediate – Uses MARC Tools wizard (Extract Selected Records, MARCSplit, Extract selected records) – Can construct Regular Expressions Expert 5
Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/ 6
Batch-Load Points Counter (BLPC) Webpage & Project link people.uvawise.edu/acv6d/ 1.Introduction – project concept & desired outcomes 2.Checklist # – outlines the batch-load procedures & steps – points counter: “what to do” & “when to stop” 3.Processing Guidelines # – procedures & how-to s & copy/paste info processing 7
BLPC Introduction & Outcomes Validation – determine integrity of the file Processing – determine quality of the records Statistics – track vendor pkgs, record counts, 001 prefixes Points – max. points = 150 (2.5 hours) STOP & contact vendor (request corrected file) 8
BLPC CheckList w/Time estimates Step 1 & 2: Preparation & validation – number of records in file – integrity of file – valid URL links Step 3-4: Review & processing – quality of records – lists all processing/edits possible Step 5: 949 holdings Print on one page (2 p. per sheet / front&back) 9
BLPC Processing Guidelines ( Procedures) Gives details for CheckList – Steps 1-2, Steps 3-4, Step 5 Gives the regular expression strings (copy/paste) – Finding/ Replacing/Deleting – MARCEditor Tools & MARCEdit Tools Always use along with Checklist – includes information to process every field, BUT – not every field needs processing Do not print out 10
BLPC Step 1: Preparation & Reports MARC Validator – Identify Invalid Records – Validate Record (copy/paste into text file) Material Type Report Field Count – verify vendor count against MARCEditor count (LDR/000) – count early / count often Deduplicate (See Addt’l Instruct.) 11
Reports/ MARC Validator: Identify Invalid Records 12
Reports/ MARC Validator: Validate Records 13
Reports / Material Type 14
BLPC Step 2: Verify Field Counts Reports/ FieldCount for error checking – first field listed is 000 (corresponds to =LDR) – last field listed is “numeric” – 245 count Reports/ MARCValidator errors – open text file created in Step 1 – look for specific errors in error file Check URL links to make sure they work 15
Reports/ Field Count (vendor count = 8556) 16
Field Count Error & "bad field tag" (vendor count =694) 17
Reports/ Field Count: Detail (highlight field & right-click) 18
Review Validate Records report (saved as text file in Step 1.B) 19
BLPC: Review for processing Checklist Step 3 workflow Check field counts Mark-up notes on the Checklist – Track/count fields that need processing Track points for fields that need processing Track points for fields that need manual editing Each record to fix means extra points Rule of thumb: for more than 12 manual edits Treat as separate post-load maintenance project 20
BLPC Checklist Step 3: Review Fields Examples of required processing Examine first record & check field count Title control# – 001 (prefer OCLC#) If lacking: use info. from 035 or create local 001 Check field counts / subfield counts Title/GMD – 245 $h URL – 856 $3 $y $u Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat” Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8 21
BLPC Checklist Step 4: Review fields Examples of optional processing Check field count & delete if present 029 / 583 / 584 / 938 Check field data and delete Other vendor pkg names (netLibrary/ebrary/myiLibrary/24x7/Ebsco) Check field data & ignore/defer 300 lacks phrase: (1 electronic resource) 22
BLPC Checklist with mark-ups 23
BLPC Processing workflow Step 3 - Step 4 Review Field Count Review Field data – Use Find/Sort window and review first/last field Add/Delete/Edit field Review Field data – look at field in first record or Find/Sort window – Mistake? Typo? – use the Edit/SpecialUndo Review FieldCount Save edited file / SaveAs new filename 24
MARCEditor Tools window adding/editing/deleting fields adding/editing deleting subfields MARCEditor Edit/Find window editing/replacing field data displays sortable list MARCEdit Tools wizard for select & extract records extract tab-delimited records for Excel MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process 25
BLPC Processing: Add std. Phrase 506 => Step 3.S Check Field Count for presence of 506 Delete existing 506 field (if present) Consult Step 3.S in BLPC Procedures – Determine that AddField Tool is needed for processing – Copy Std.phrase from Step 3.S notes – Paste into AddField Tool window and submit Review 506 data in first record Check field count Save file 26
MARCEditor Tools: Add std. Phrase 506 => Step 3.S 27
BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V Check Field Count for Presence of 650 Ind2=5/6/8 Consult Step 3.V in BLPC Procedures – Optional Review – FindAll(RegEx) instructions – Determine that Tools/DeleteField tool is needed – Copy RegEx pattern from Step 3.V – Paste into Tools/DeleteField window – Use Regular Expressions radio button option – Submit using Delete button Check Field Count & Indicator count Save file 28
MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V 29
Regular expressions (RegEx) Finding/Editing patterns in strings (letters/numbers) – Like learning another language Parentheses are used to group data – Forces the computer to "store" data in "chunks" – Data “chunks” are numbered for recall/retrieval/use – Helps the programmer "read" the pattern Optional functionality, and not necessary Some punctuation is "reserved" (has a special meaning) BLPC uses consistent format for RegEx patterns 30
Reading RegEx Patterns 650 Ind2= 5/6/8 (non-LC) Pattern: (=650 )(.[568])(\$a)(.+) (=650 ) look for 650 fields with two blank spaces (. [568])look for any Ind1 & listed Ind2 numbers (\$a) look for subfield $ a (used as "anchor chunk") (.+)any letter/number to the end of the field Use Edit/FindAll(RegEx) to verify pattern 31
Interpreting RegEx punctuation Pattern: (=650 )(.[568])(\$a)(.+) ( )Parentheses for data “chunks”.Period for any single letter/number [ ]Square brackets for a list using “OR” \Backslash before “reserved” punctuation esp.: $ \ ( ) [ ] +Plus sign for more of the same “Chunks” are stored as: $1$2$3$4 32
Creating RegEx patterns Start with known pattern: For non-LC Subjects: (=650 )(.[568])(\$a)(.+) FindAll(RegEx) for “local” Subjects (Ind2 = 4/7) (=650 )(.[47])(\$a)(.+) FindAll(RegEx) for “local” Genres (Ind2 = 4/7) (=655 )(.[47])(\$a)(.+) 33
Editing with RegEx string pattern 650 BISAC subjects => 690 Start with known pattern: (=650 )(.[568])(\$a)(.+) Use Edit/Replace(RegEx): Change 650 to 690 Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh Determine which “chunks” change/stay the same Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh) Replace(RegEx): (=690 )$2$3$4$5 34
Reading RegEx Patterns 650 BISAC subjects => 690 Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh ) (=650 ) look for 650 fields with two blank spaces (.[7])look for any Ind1 & Ind2 =7 (\$a) look for subfield $a (optional “anchor” text) (.+)any letter/number to the next “chunk” (\$2bisacsh) look for subfield & data at end of field Can be shortened (which makes the pattern look complicated) : Find(RegEx): (=650)(.+\$2bisacsh) Replace(RegEx): (=690)$2 35
MARCEditor: FindAll(RegEx) Testing the pattern: 650 BISAC subjects 36
MARCEditor: Replace(RegEx) 650 BISAC subjects =>
BLPC Step 5: 949 processing Required processing Policy: Include Class# in Unicorn Item record 949 $a -- Pull the call# from the 050$a -- Insert the standard phrase: ' INTERNET' $v -- Pull the 001/OCLC# as a unique no. $w $h $t $x $z -- Add standard holdings data See Addt'l instruct, 38
Batch-loading MARCEdit with files no larger than 10k records – MARCEdit/Tool MARCSplit MARCEditor/File: Compile File into MARC Unicorn batch load rpt uses 001 match point – 'o' for OCLC# o & 'g' for local vendor key Unicorn batch load rpt settings – create new bibliographic records only Date cataloged -- back dated to prev. month – prevents interference w/scheduled Authority reports – max. load two files a day 39
Identifying records for Cleanup Checklist finds problems to correct post-load Item maintenance projects – 949 lacks call# Bibliographic record maintenance projects – 245 lacks $h (if more than 5-12 records) – URLs lacking Record reload/overlay project – Record already in OPAC (P-N duplicates) 40
MARCEdit Tools: Select/Extract selected records Step 3.F: 245 lacks $h 41
MARCEdit Tools: Export Tab Delimited records 42
Help! MarcEdit Help – Click thru the Contents menu: Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions. RegularExpressions.info MARCEDIT-L list BATCH list 43
Amelia C. VanGundy The University of Virginia's College at Wise John Cook Wyllie Library Virginia SirsiDynix Library Users Group Meeting Nov. 14,
BLPC Project Presentation revisions Originally presented Nov. 14, 2012 Additional Slides: – BLCP Project web-page – MARCEditor: FindAll(RegEx) – MARCEdit Tools: Export Tab Delimited records – BLPC Project: Presentation revisions 45