Download presentation
Presentation is loading. Please wait.
Published byAlexander Norton Modified over 9 years ago
1
Wrangling DigiTool Data For LOCKSS Brian Meuse - Digital Collections Systems Analyst University Libraries Boston College MetaArchive Cooperative Annual Meeting October 23, 2009
2
eTD@BC Electronic Theses and Dissertations –Undergraduate Honors Theses –Graduate Level Theses and Dissertations Archive and distribute Provide global Open Access to content –Embargoes when needed –No mandate to publish
10
What happens next? ProQuest processes students submission ProQuest ftp's back –Thesis (pdf) –Any additional files (3rd party permissions) –Descriptive metadata Once student uploads to ProQuest we get back within a day.
11
LOCKSS MetaArchive Cooperative –LOCKSS based dark archive –Long Term Digital Preservation
12
DigiTool Oracle backend –Maintains object relationships –Stores all associated MetaData (XML) –Original filenames File storage –Simple directories on filesystem –Renamed to Unique Identifier (PID)
16
DigiTool To LOCKSS Export ETD files from DigiTool –Export function –Duplicate data –Current ETD collection is ~1GB –Bobbie Hanvey, ~30,000 photo-negatives ~600GB
17
DigiTool To LOCKSS Direct URL links –MetaData –Objects (Viewers for different formats) Direct links not persistent –Redirected to URL with session id –Every node is different –Not good for polling.
18
DigiTool To LOCKSS DigiTool API –SOAP web service –Can query database –Retrieve XML MetaData Links to objects
19
DigiTool To LOCKSS Wrangling the data –Perl –Web Services –XSLT
20
DigiTool To LOCKSS createMetaArchiveAU.pl #!/usr/bin/perl -w use strict; use SOAP::Lite; use FileHandle; use Getopt::Long; use LWP::Simple; use Time::localtime; use XML::LibXSLT; use XML::LibXML; ….
21
DigiTool To LOCKSS Query DigiTool … contains type electronic thesis dissertation after createDate FROM … XML response is list of pid’s
22
DigiTool To LOCKSS Retrieve digital entity for each PID XML contains –All Metadata for object –PID’s of related objects –Filename and path of file on server
23
DigiTool To LOCKSS Metadata descriptive etd-ms The Impact of Pension Policy on Older Adults' Life Satisfaction: an Analysis of Longitudinal Mulitlevel Data Calvo, Esteban aging individualization life satisfaction pension policy redistribution subjective well- being Boston College Williamson, John B. 2009 Electronic Thesis or Dissertation text application/pdf http://hdl.handle.net/2345/752 English I hereby allow Boston College to include and preserve my dissertation/thesis in electronic form in the Boston College Institutional Repository, which shall include the right to publicly post my dissertation/thesis on the World Wide Web. I will retain copyright ownership, but I grant to Boston College the non-exclusive right to copy, distribute, and publicly display my dissertation/thesis in any form as may be necessary or convenient in the future as file formats, storage media, and distribution mechanisms evolve. PhD Doctoral Sociology Boston College. Graduate School of Arts & Sciences. ]]>
24
DigiTool To LOCKSS Related objects manifestation 106483 manifestation 108561 manifestation 108562
25
DigiTool To LOCKSS Filename and path Calvo-Esteban.pdf pdf application/pdf /exlibris1/bcd03storage/2009/08/27/file_1/106484 1 1005 -1 349524
26
DigiTool To LOCKSS Retrieve each related item to get filename and path for those items manifestation 106483 manifestation 108561 manifestation 108562
27
DigiTool To LOCKSS Generate script to generate links –Symbolic link for AU –From manifest web directory to object ln -s /exlibris1/bcd03storage/2009/08/27/file_1/106484 18640905-20090930/106484/Calvo-Esteban.pdf When file is harvested, it will be given the original filename.
28
DigiTool To LOCKSS Manifest Pages –Transform XML to HTML –XSLT
29
DigiTool To LOCKSS manifestation 106483 manifestation 108561 manifestation 108562
30
DigiTool To LOCKSS Manifest for Calvo, Esteban 2009 Electronic Theses and Dissertations at Boston College Manifest for Calvo, Esteban 2009 Metadata and Relationships http://dcollections.bc.edu/webclient/DeliveryManager?metadata_requ est=true&GET_XML=1&pid=106484 ETD PDF Calvo-Esteban.pdf Permissions/Suppressed file Calvo-Esteban-permission.txt Thumbnail _106484_pdf_thumbnail.jpg
31
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.