Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wrangling DigiTool Data For LOCKSS Brian Meuse - Digital Collections Systems Analyst University Libraries Boston College MetaArchive Cooperative Annual.

Similar presentations


Presentation on theme: "Wrangling DigiTool Data For LOCKSS Brian Meuse - Digital Collections Systems Analyst University Libraries Boston College MetaArchive Cooperative Annual."— Presentation transcript:

1 Wrangling DigiTool Data For LOCKSS Brian Meuse - Digital Collections Systems Analyst University Libraries Boston College MetaArchive Cooperative Annual Meeting October 23, 2009

2 eTD@BC Electronic Theses and Dissertations –Undergraduate Honors Theses –Graduate Level Theses and Dissertations Archive and distribute Provide global Open Access to content –Embargoes when needed –No mandate to publish

3

4

5

6

7

8

9

10 What happens next? ProQuest processes students submission ProQuest ftp's back –Thesis (pdf) –Any additional files (3rd party permissions) –Descriptive metadata Once student uploads to ProQuest we get back within a day.

11 LOCKSS MetaArchive Cooperative –LOCKSS based dark archive –Long Term Digital Preservation

12 DigiTool Oracle backend –Maintains object relationships –Stores all associated MetaData (XML) –Original filenames File storage –Simple directories on filesystem –Renamed to Unique Identifier (PID)

13

14

15

16 DigiTool To LOCKSS Export ETD files from DigiTool –Export function –Duplicate data –Current ETD collection is ~1GB –Bobbie Hanvey, ~30,000 photo-negatives ~600GB

17 DigiTool To LOCKSS Direct URL links –MetaData –Objects (Viewers for different formats) Direct links not persistent –Redirected to URL with session id –Every node is different –Not good for polling.

18 DigiTool To LOCKSS DigiTool API –SOAP web service –Can query database –Retrieve XML MetaData Links to objects

19 DigiTool To LOCKSS Wrangling the data –Perl –Web Services –XSLT

20 DigiTool To LOCKSS createMetaArchiveAU.pl #!/usr/bin/perl -w use strict; use SOAP::Lite; use FileHandle; use Getopt::Long; use LWP::Simple; use Time::localtime; use XML::LibXSLT; use XML::LibXML; ….

21 DigiTool To LOCKSS Query DigiTool … contains type electronic thesis dissertation after createDate FROM … XML response is list of pid’s

22 DigiTool To LOCKSS Retrieve digital entity for each PID XML contains –All Metadata for object –PID’s of related objects –Filename and path of file on server

23 DigiTool To LOCKSS Metadata descriptive etd-ms The Impact of Pension Policy on Older Adults' Life Satisfaction: an Analysis of Longitudinal Mulitlevel Data Calvo, Esteban aging individualization life satisfaction pension policy redistribution subjective well- being Boston College Williamson, John B. 2009 Electronic Thesis or Dissertation text application/pdf http://hdl.handle.net/2345/752 English I hereby allow Boston College to include and preserve my dissertation/thesis in electronic form in the Boston College Institutional Repository, which shall include the right to publicly post my dissertation/thesis on the World Wide Web. I will retain copyright ownership, but I grant to Boston College the non-exclusive right to copy, distribute, and publicly display my dissertation/thesis in any form as may be necessary or convenient in the future as file formats, storage media, and distribution mechanisms evolve. PhD Doctoral Sociology Boston College. Graduate School of Arts & Sciences. ]]>

24 DigiTool To LOCKSS Related objects manifestation 106483 manifestation 108561 manifestation 108562

25 DigiTool To LOCKSS Filename and path Calvo-Esteban.pdf pdf application/pdf /exlibris1/bcd03storage/2009/08/27/file_1/106484 1 1005 -1 349524

26 DigiTool To LOCKSS Retrieve each related item to get filename and path for those items manifestation 106483 manifestation 108561 manifestation 108562

27 DigiTool To LOCKSS Generate script to generate links –Symbolic link for AU –From manifest web directory to object ln -s /exlibris1/bcd03storage/2009/08/27/file_1/106484 18640905-20090930/106484/Calvo-Esteban.pdf When file is harvested, it will be given the original filename.

28 DigiTool To LOCKSS Manifest Pages –Transform XML to HTML –XSLT

29 DigiTool To LOCKSS manifestation 106483 manifestation 108561 manifestation 108562

30 DigiTool To LOCKSS Manifest for Calvo, Esteban 2009 Electronic Theses and Dissertations at Boston College Manifest for Calvo, Esteban 2009 Metadata and Relationships http://dcollections.bc.edu/webclient/DeliveryManager?metadata_requ est=true&GET_XML=1&pid=106484 ETD PDF Calvo-Esteban.pdf Permissions/Suppressed file Calvo-Esteban-permission.txt Thumbnail _106484_pdf_thumbnail.jpg

31 Thank you!


Download ppt "Wrangling DigiTool Data For LOCKSS Brian Meuse - Digital Collections Systems Analyst University Libraries Boston College MetaArchive Cooperative Annual."

Similar presentations


Ads by Google