Download presentation
Presentation is loading. Please wait.
Published byPhillip Stafford Modified over 9 years ago
1
Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014
2
Library Digital Initiative Funds (1998-) Build technical infrastructure - Digital Repository Service (DRS) Hire specialists Build digital collections via 49 internal grants to be preserved in the DRS
3
DRS Users Grew to 55 Organizational Units at Harvard
4
DRS is Central to User Workflows DRS Access (discovery, search, delivery platforms) Ingest (deposit tools) Manage (cataloging & management tools) reformatting labs; automated system deposits; library, archives and museum staff reformatting labs ; library, archives and museum staff; repository managers researchers, teachers, learners
5
Why a New DRS? Upgrade to best-in-breed technologies Adopt digital preservation best practices and standards Preserve metadata better Improve collection management Support preservation planning & activities Improve access to content & metadata Support more formats & genres
6
Evolution of the DRS 2000200220032004200520062007200820092010201120122001 DRS in production New DRS in production DRS enhancements New DRS infrastructure development 201320142015 New DRS metadata migration & user adoption
7
New DRS - Completed 20092010 20112012 convened DRS Advisory Group software in production 20132014 2015 users trained, phase 1 hardware in production migrated content to new hardware Infrastructure Development Metadata Migration & User Adoption Fedora assessment DuraCloud pilot test early releasebeta 1beta 2 beta 3 first object deposited to the new DRS
8
New DRS – In Progress 20092010 2011201220132014 2015 Infrastructure Development Metadata Migration & User Adoption metadata migration tools created & tested migrating metadata moving users
9
Why “Metadata” Migration? Why not “content” migration?
10
Pre-migration DRS Content DRS Database DRS Database
11
Post-migration DRS Content DRS Database New DRS Database New DRS Index New DRS Object Descriptors
12
New DRS Data Model Not a simple metadata conversion A new DRS object is a logical intellectual entity that unifies multiple DRS files, for example: – Still image objects - archival and production masters, and deliverables including thumbnails – Audio objects - archival and production masters and deliverables – PDS objects - page image and text files
13
Object Descriptors METS files generated for each object – Standards-based schemas (PREMIS, MODS, MIX, etc.) Metadata gathered from multiple sources – Current DRS database – Every content file parsed using FITS – In some cases catalog records, finding aids, legacy METS files
14
Technical Challenges Many formats Unique migration rules per format Preserving all identifiers Uninterrupted access for end users Large (>5000 file) page-turned documents 46+ million DRS files - At 1 sec/file would take 530+ days!
15
Formulating a Migration Plan Technical analysis – DRS content – Possible metadata sources User analysis – Management activity via system logs – Preparation via training and testing registration lists – Perceived preparation & concerns via survey of highest volume, active users
16
Migration Plan Combines needs of users with technical requirements – Respects all technical requirements – Minimizes the time users need to work in two systems at the same time
17
Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content
18
Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content simpler objects more complex objects
19
Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content dependencies between tiers dependencies within tiers
20
Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3
21
Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3
22
Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3 Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one time Tier 2: Migrate one DRS owner code at a time
23
Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3 Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one time Tier 2: Migrate one DRS owner code at a time * Minimizes the amount of time the content they manage the most is in 2 different systems
24
Technical Strategies Modular, parallelizable migration design Delivery services made migration-aware Test, test, test Design for migration failures – make do-overs possible
25
Technical Strategy – Modular, Parallelizable 1) Group files into objects 2) Run FITS, combine with metadata to generate object descriptors 3) Ingest into new DRS Objects queue Descriptors ready queue END START
26
Tuning Experiments Single powerful computer – Dell R720 Server using Intel(R) Xeon(R) CPU E5- 2643 0 @ 3.30GHz CPU’s with 16 Cores, 64 GB of Memory and 1 TB of internal disk – Various thread counts – 4-35 files processed per second Next: – RAM disk – Multiple computers
27
User Strategies Advisors - DRS Advisory Group Minimize disruption – Tier 2 migration - one owner at a time – Close partners - Imaging Services Tapping help of experts – “pioneer” depositors, beta testers, trainers Regular communications monthly via HL Update
28
Migration State Diagram
29
Migration Set Checklist Description of the affected content List of steps needing human intervention, who will do them, date of completion – includes communication, migration kickoff and post-migration verification tasks Final step – manager signs off on completion Checklist is preserved
30
Learned So Far Can migrate in sub-second/file time User-contributed metadata varies in quality – Should automate more and/or put more validation checks in place – Useful exercise to analyze metadata values and elements periodically errors in metadata values value vs. effort of metadata elements
31
Preservation Capability Before and After the DRS2 Project Level OneLevel TwoLevel ThreeLevel Four Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata File Formats = already compliant= will be compliant after the DRS2 project Based on the NDSA Levels of Digital Preservation
32
Q & A Thanks! DRS Advisory Group DRS beta testers DCSWG Bobbi Fox Franziska Frey Andrea Goethals Wendy Gogel Chip Goines HUIT Security Jonathan Kennedy LTS Operations Spencer McEwen Grainne Reilly Tracey Robinson Randy Stern Janet Taylor Chris Vicary Robin Wendler Julie Wetherill Vitaly Zakuta
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.