Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
The New DRS: Plan for Metadata Migration Harvard Library & Library Technology Services February 26, 2014.
DRS 2 Metadata Migration June 25, Agenda Introduction Preliminary results - content analysis Metadata options Next steps Questions.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Digital Preservation Practices and Strategies at Colorado State University Libraries.
The Academy of Motion Picture Arts & Sciences Building an Interim Digital Preservation System Nancy Silver Digital Archival Program Manager Science and.
Constructing the Memories Creating a Digital Collection Linda J. White, Digital Project Coordinator.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
Digitization at the National Archives and Records Administration Doris Hamburg Director, Preservation Programs James Hastings Director, Access Programs.
Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009.
The New DRS (DRS 2) Introduction. What is DRS? Digital repository for preservation and access –Maintains integrity of deposited content –Preserves content.
Digital Repository Service (DRS) Harvard University Library OIS presented by: Wendy Gogel & Andrea Goethals.
A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Preserving Digital Collections Andrea Goethals Florida Center for Library Automation (FCLA)
Columbia Digital Preservation Planning & Implementation Status Report, August 2010.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Kentuckiana Digital Library: A Digital Archive of Kentucky History Eric Weig Head, Digital Programs Special Collections & Digital Programs Division University.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Choosing Delivery Software for a Digital Library Jody DeRidder Digital Library Center University of Tennessee.
OCLC Online Computer Library Center Digital Preservation with OCLC Digitization Standards: Issues & Updates Taylor Surface, OCLC.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
DRS 2 Orientation Harvard University Library September 30, 2010 DRS = Digital Repository Service.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
WORKFLOW. What is workflow A system to manage and monitor working processes Defining and tracking the flow of work between individuals and/or departments.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
Introduction to metadata
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
DRS 2 Project (2008 – Present!) Andrea Goethals, Harvard Library Digital Preservation Management Workshop, MIT June 13, 2013.
OAIS: From Requirements to Reality at OCLC FLICC / CENDI Symposium, Dec Pam Kircher Product Manager, Digital Archive OCLC Digital & Preservation.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Digital Repository Service Update ___________________________ Yale University Library Roy Lechich, ILTS Audrey Novak 15 Aug 2007.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan Florida Center for Library Automation (FCLA)
Developing a Framework for File Format Migrations iPRES 2015 Chapel Hill, NC 3 November 2015 Joey Heinen and Andrea Goethals.
The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving 2015 Andrea Goethals. Franziska Frey and David Ackerman.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
HUIT Cloud Initiative Update November, /20/2013 Ryan Frazier & Rob Parrott.
Primo at the British Library Mandy Stewart. 2 About the British Library The British Library is the National Library of the UK It is a world-class.
Meeting of the Member States Expert Group on Digitisation and Digital Preservation , Luxembourg European Archival Records and Knowledge Preservation.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
Preserving Digital Collections
Joseph JaJa, Mike Smorul, and Sangchul Song
Digital Asset Management Part 15: Summary
CS 501: Software Engineering Fall 1999
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Presentation transcript:

Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May

Library Digital Initiative Funds (1998-) Build technical infrastructure - Digital Repository Service (DRS) Hire specialists Build digital collections via 49 internal grants to be preserved in the DRS

DRS Users Grew to 55 Organizational Units at Harvard

DRS is Central to User Workflows DRS Access (discovery, search, delivery platforms) Ingest (deposit tools) Manage (cataloging & management tools) reformatting labs; automated system deposits; library, archives and museum staff reformatting labs ; library, archives and museum staff; repository managers researchers, teachers, learners

Why a New DRS? Upgrade to best-in-breed technologies Adopt digital preservation best practices and standards Preserve metadata better Improve collection management Support preservation planning & activities Improve access to content & metadata Support more formats & genres

Evolution of the DRS DRS in production New DRS in production DRS enhancements New DRS infrastructure development New DRS metadata migration & user adoption

New DRS - Completed convened DRS Advisory Group software in production users trained, phase 1 hardware in production migrated content to new hardware Infrastructure Development Metadata Migration & User Adoption Fedora assessment DuraCloud pilot test early releasebeta 1beta 2 beta 3 first object deposited to the new DRS

New DRS – In Progress Infrastructure Development Metadata Migration & User Adoption metadata migration tools created & tested migrating metadata moving users

Why “Metadata” Migration? Why not “content” migration?

Pre-migration DRS Content DRS Database DRS Database

Post-migration DRS Content DRS Database New DRS Database New DRS Index New DRS Object Descriptors

New DRS Data Model Not a simple metadata conversion A new DRS object is a logical intellectual entity that unifies multiple DRS files, for example: – Still image objects - archival and production masters, and deliverables including thumbnails – Audio objects - archival and production masters and deliverables – PDS objects - page image and text files

Object Descriptors METS files generated for each object – Standards-based schemas (PREMIS, MODS, MIX, etc.) Metadata gathered from multiple sources – Current DRS database – Every content file parsed using FITS – In some cases catalog records, finding aids, legacy METS files

Technical Challenges Many formats Unique migration rules per format Preserving all identifiers Uninterrupted access for end users Large (>5000 file) page-turned documents 46+ million DRS files - At 1 sec/file would take 530+ days!

Formulating a Migration Plan Technical analysis – DRS content – Possible metadata sources User analysis – Management activity via system logs – Preparation via training and testing registration lists – Perceived preparation & concerns via survey of highest volume, active users

Migration Plan Combines needs of users with technical requirements – Respects all technical requirements – Minimizes the time users need to work in two systems at the same time

Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content

Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content simpler objects more complex objects

Migrating Content in 5 Stages Migrate 1 st : Tier 1 content Migrate 2 nd : Tier 2 content Migrate 3 rd : Tier 3 content Migrate 4 th : Tier 4 content Migrate 5 th : Tier 5 content dependencies between tiers dependencies within tiers

Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3

Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3

Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3 Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one time Tier 2: Migrate one DRS owner code at a time

Migrating Content in 5 Stages TierContent 1Text (Methodology, ESRI World File), Document, Color Profile, Target Image 2PDS Document, Still Image 3Audio, Text (SMIL) 4Web Harvest, Opaque Container 5Biomedical Image; Google Document Container 1, 2, 3 Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one time Tier 2: Migrate one DRS owner code at a time * Minimizes the amount of time the content they manage the most is in 2 different systems

Technical Strategies Modular, parallelizable migration design Delivery services made migration-aware Test, test, test Design for migration failures – make do-overs possible

Technical Strategy – Modular, Parallelizable 1) Group files into objects 2) Run FITS, combine with metadata to generate object descriptors 3) Ingest into new DRS Objects queue Descriptors ready queue END START

Tuning Experiments Single powerful computer – Dell R720 Server using Intel(R) Xeon(R) CPU E GHz CPU’s with 16 Cores, 64 GB of Memory and 1 TB of internal disk – Various thread counts – 4-35 files processed per second Next: – RAM disk – Multiple computers

User Strategies Advisors - DRS Advisory Group Minimize disruption – Tier 2 migration - one owner at a time – Close partners - Imaging Services Tapping help of experts – “pioneer” depositors, beta testers, trainers Regular communications monthly via HL Update

Migration State Diagram

Migration Set Checklist Description of the affected content List of steps needing human intervention, who will do them, date of completion – includes communication, migration kickoff and post-migration verification tasks Final step – manager signs off on completion Checklist is preserved

Learned So Far Can migrate in sub-second/file time User-contributed metadata varies in quality – Should automate more and/or put more validation checks in place – Useful exercise to analyze metadata values and elements periodically errors in metadata values value vs. effort of metadata elements

Preservation Capability Before and After the DRS2 Project Level OneLevel TwoLevel ThreeLevel Four Storage & Geographic Location File Fixity and Data Integrity Information Security Metadata File Formats = already compliant= will be compliant after the DRS2 project Based on the NDSA Levels of Digital Preservation

Q & A Thanks! DRS Advisory Group DRS beta testers DCSWG Bobbi Fox Franziska Frey Andrea Goethals Wendy Gogel Chip Goines HUIT Security Jonathan Kennedy LTS Operations Spencer McEwen Grainne Reilly Tracey Robinson Randy Stern Janet Taylor Chris Vicary Robin Wendler Julie Wetherill Vitaly Zakuta