Download presentation
Presentation is loading. Please wait.
Published byLawrence Norris Modified over 8 years ago
1
Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston College Revised June 2016
2
When the call for data comes HathiTrust Rapid ILL Browzine Your data extraction headache here Revised June 2016 2
3
Frustrations dumping data out of Alma and Analytics 65,000 row Excel export limit in Analytics Alma Bibliographic Export Processes MARC21 Binary MARC XML Entire MARC is too much data to sift through Alma APIs Too slow for millions of records Daily limit to the number of API calls Revised June 2016 3
4
Solution: Alma Publishing Profiles Is set based Can be published in full once, subsequent publishing contains the delta Re-publish of full set is now available Need a place for the published files to land, such as S/FTP server Revised June 2016 4
5
HathiTrust Files Requirement Print Holdings in 3 separate files: Single Print Monographs Multi-Part Monographs Serials Revised June 2016 5
6
BC’s Managed Sets for HathiTrust Sets built for 9 separate libraries for both books and serials using the Advanced Repository Search Physical titles where library = libraryname and material type = Books Physical titles where library = libraryname and material type = Issue or Bound Issue Kept sets separate for each library to keep track of counts of records exported. You don’t have to do it this way but we initially felt our record export counts were too low and felt a need to closely monitor the monograph and serials counts for each library Revised June 2016 6
7
Normalization Rules Publishing Profiles can use normalization rules to determine what data is output See Alma Help, browse normalization rules if unsure how to add or edit a rule Briefly: Resource Management -> Cataloging -> Metadata Editor -> File -> New -> Normalization Rule OR Resource Management -> Cataloging -> Metadata Editor -> Rules -> Normalization Rules Revised June 2016 7
8
Normalization Rules We use a rule that removes all of the MARC fields except: 001 – contains system number (MMS ID) * 035 – contains OCLC number * 022 – contains ISSN. Used when set is for serials 074 – contains government document number 901 – for books only - publishing profile puts item information here (more on this soon) * Required by HathiTrust Revised June 2016 8
9
Publishing Profiles – Profile Details Resource Configuration -> Configuration Menu -> Publishing Profiles -> Add Profile -> General Profile BC ended up with 18 publishing profiles, 2 for each of our 9 libraries, one for monographs and one for serials Under Content -> Publish On: Bibliographic Level Under Publishing Protocol can choose: FTP or OAI. BC uses FTP MARC Output format = MARC21 XML or MARC 21 Binary BC uses MARC21 XML, 10,000 records per file Added filename prefix to distinguish the sets for each of our 9 libraries, example: hathi_law_books, hathi_law_serials Revised June 2016 9
10
Publishing Profiles – Profile Details Revised June 2016 10
11
Publishing Profiles – Data Enrichment Under Bibliographic Normalization – select normalization rule you created to only export the MARC data you want Under Physical Inventory Enrichment – Check ‘Add Items Information’ only if profile is for books. Set repeatable field = 901, barcode in subfield x, description in subfield y, process type in subfield z. We are using barcode to count number of items for monographs and description to determine multi-part monographs. Revised June 2016 11
12
Publishing Profiles – Data Enrichment Revised June 2016 12
13
Publishing Profiles - Actions Revised June 2016 13
14
What to do with all those files Unzip them – I wrote a PERL script to move all of the zip files FTP’d by Alma to our staging server to another server where they were unziped in separate monograph and serials directories for each library Process them – I wrote another PERL script to read each XML file and process each record in the file. To go to Hathi Trust each record needed an MMS ID and OCLC number. I run this perl script twice, once for monographs and once for serials. The PERL file attribute is set to append. Revised June 2016 14
15
Process the XML Files When processing monograph files I wrote an entry for every barcode Multi-part monographs were determined by the presence of a description field. An entry was written for every description found If the process type was set to LOST or MISSING then holding status was set to ‘LM’ otherwise holding status was set to ‘CH’ If the 074 tag was present then the government document indicator was set to 1, otherwise it was set to 0 I added the ISSN to the serials entries if present Revised June 2016 15
16
HathiTrust elements I ignored Holding Status WD - Withdrawn Condition BRT – Brittle, damaged and/or deteriorating Revised June 2016 16
17
Why I ignored them Alma does not currently distinguish between items that are deleted versus items that have been withdrawn. Alma removes withdrawn titles and items. They can still be retrieved from analytics but we would still need to deal with the export limit. We are working with Ex Libris to come up with a way to send withdrawn item information to HathiTrust. BC does not have a standardized way of indicating item condition. Revised June 2016 17
18
Your Turn What have you done? How can we do this better? What should we ask Ex Libris for to make this process easier? Revised June 2016 18
19
Contact Me Margaret Briand Wolfe briandwo@bc.edu June 2016 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.