Presentation is loading. Please wait.

Presentation is loading. Please wait.

Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Similar presentations


Presentation on theme: "Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014."— Presentation transcript:

1 Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014

2 Library Digital Initiative Funds (1998-) Build technical infrastructure - Digital Repository Service (DRS) Hire specialists Build digital collections via 49 internal grants to be preserved in the DRS

3 DRS Users Grew to 55 Organizational Units at Harvard

4 DRS is Central to User Workflows DRS Access (discovery, search, delivery platforms) Ingest (deposit tools) Manage (cataloging & management tools) reformatting labs; automated system deposits; library, archives and museum staff reformatting labs ; library, archives and museum staff; repository managers researchers, teachers, learners

5 Why a New DRS? Upgrade to best-in-breed technologies Adopt digital preservation best practices and standards Preserve metadata better Improve collection management Support preservation planning & activities Improve access to content & metadata Support more formats & genres

6 Evolution of the DRS 2000200220032004200520062007200820092010201120122001 DRS in production New DRS in production DRS enhancements New DRS infrastructure development 201320142015 New DRS metadata migration & user adoption

7 New DRS - Completed 20092010 20112012 convened DRS Advisory Group software in production 20132014 2015 users trained, phase 1 hardware in production migrated content to new hardware Infrastructure Development Metadata Migration & User Adoption Fedora assessment DuraCloud pilot test early releasebeta 1beta 2 beta 3 first object deposited to the new DRS

8 New DRS – In Progress 20092010 2011201220132014 2015 Infrastructure Development Metadata Migration & User Adoption metadata migration tools created & tested migrating metadata moving users

9 Why “Metadata” Migration? Why not “content” migration?

10 Pre-migration DRS Content DRS Database DRS Database

11 Post-migration DRS Content DRS Database New DRS Database New DRS Index New DRS Object Descriptors

12 New DRS Data Model Not a simple metadata conversion A new DRS object is a logical intellectual entity that unifies multiple DRS files, for example: – Still image objects - archival and production masters, and deliverables including thumbnails – Audio objects - archival and production masters and deliverables – PDS objects - page image and text files

13 Object Descriptors METS files generated for each object – Standards-based schemas (PREMIS, MODS, MIX, etc.) Metadata gathered from multiple sources – Current DRS database – Every content file parsed using FITS – In some cases catalog records, finding aids, legacy METS files

14 Technical Challenges Many formats Unique migration rules per format Preserving all identifiers Uninterrupted access for end users Large (>5000 file) page-turned documents 46+ million DRS files - At 1 sec/file would take 530+ days!

15 Formulating a Migration Plan Technical analysis – DRS content – Possible metadata sources User analysis – Management activity via system logs – Preparation via training and testing registration lists – Perceived preparation & concerns via survey of highest volume, active users

16 Migration Plan Combines needs of users with technical requirements – Respects all technical requirements – Minimizes the time users need to work in two systems at the same time

17 Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content

18 Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content simpler objects more complex objects

19 Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content dependencies between tiers dependencies within tiers

20 Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3

21 Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3

22 Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3 Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one time Tier 2: Migrate one DRS owner code at a time

23 Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3 Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one time Tier 2: Migrate one DRS owner code at a time * Minimizes the amount of time the content they manage the most is in 2 different systems

24 Technical Strategies Modular, parallelizable migration design Delivery services made migration-aware Test, test, test Design for migration failures – make do-overs possible

25 Technical Strategy – Modular, Parallelizable 1) Group files into objects 2) Run FITS, combine with metadata to generate object descriptors 3) Ingest into new DRS Objects queue Descriptors ready queue END START

26 Tuning Experiments Single powerful computer – Dell R720 Server using Intel(R) Xeon(R) CPU E5- 2643 0 @ 3.30GHz CPU’s with 16 Cores, 64 GB of Memory and 1 TB of internal disk – Various thread counts – 4-35 files processed per second Next: – RAM disk – Multiple computers

27 User Strategies Advisors - DRS Advisory Group Minimize disruption – Tier 2 migration - one owner at a time – Close partners - Imaging Services Tapping help of experts – “pioneer” depositors, beta testers, trainers Regular communications monthly via HL Update

28 Migration State Diagram

29 Migration Set Checklist Description of the affected content List of steps needing human intervention, who will do them, date of completion – includes communication, migration kickoff and post-migration verification tasks Final step – manager signs off on completion Checklist is preserved

30 Learned So Far Can migrate in sub-second/file time User-contributed metadata varies in quality – Should automate more and/or put more validation checks in place – Useful exercise to analyze metadata values and elements periodically errors in metadata values value vs. effort of metadata elements

31 Preservation Capability Before and After the DRS2 Project Level OneLevel TwoLevel ThreeLevel Four Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata File Formats = already compliant= will be compliant after the DRS2 project Based on the NDSA Levels of Digital Preservation

32 Q & A Thanks! DRS Advisory Group DRS beta testers DCSWG Bobbi Fox Franziska Frey Andrea Goethals Wendy Gogel Chip Goines HUIT Security Jonathan Kennedy LTS Operations Spencer McEwen Grainne Reilly Tracey Robinson Randy Stern Janet Taylor Chris Vicary Robin Wendler Julie Wetherill Vitaly Zakuta


Download ppt "Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014."

Similar presentations


Ads by Google