Building a Digital Archives for the City of Vancouver Glenn Dingwall 14 September, 2011
Project Context VanRIMS Classification Project VanDOCS ERDMS Project Olympic Legacy Project
Project Phases I - Proof of Concept ( ) Public records Controlled creation environment II – Prototype ( ) Private records Uncontrolled creation environment
Initial Assumptions Use OAIS (Open Archival Information System Reference Model) as a starting point Progressively add to requirements, drawing from: –General Preservation Standards InterPARES RLG/OCLC Trusted Digital Repositories (TDR) –Task specific E.g., PREMIS metadata –Institution specific requirements
CoV Digital Archives: Producers and Consumers
Digital Preservation: The Business Case Technology obsolescence Technology incompatibility Long-term access and useability
Alternatives – What’s out there already? Already many free/open source tools available: Repository DSpace FEDORA Greenstone Ingest Tools JHOVE DROID XENA Access Archivist’s Toolkit ICA AtoM Each only does a small part in the preservation chain, no start-to-finish single solution
So, what can we do with the existing tools? Can we piece all of the various components together to come up with a complete Digital Preservation system? Constraints: Use open source tools wherever possible Lightweight system architecture Architecturally independent components
What is OAIS? OAIS (=Open Archival Information System) ISO 14721:2003 Is a high level reference model Defacto standard for discussing digital preservation concepts at this level Important concepts include –Information Model –Functional Entities –Mandatory Responsibilities
OAIS Information Model Information Packages contain: –Content (records) –PDI = Preservation Description Information (metadata) –Packaging Information Three types of Information Packages: SIP = Submission Information Package (what we get) AIP = Archival Information Package (what we preserve) DIP = Dissemination Information Package (what we provide)
Information Package Model
OAIS Responsibilities Accept submissions from Producer Establish control over material Implement long-term preservation policies Determine who the users are (“designated Community”) Ensure preserved information is understandable to users Provide access
OAIS Functional Entities Establishes the main functional components of the system Defines the relationships of the components to each other in terms of the information that passes between them
OAIS Functional Entities
City of Vancouver Archives Implementation
Archivematica
Archivematica Pipeline
Ingest Workflow Summary
Micro-services Create SIP backupCharacterize and extract metadata Scan for viruses in submission documentation Verify SIP complianceSet file permissions Characterize and extract metadata in submission documentation Assign file UUIDs and checksumsAppraise SIP for preservationNormalize submission documentation Verify metadata directory checksums Scan for removed files post appraise SIP for preservationRemove files without PREMIS Remove thumbs.db filesCreate DIP directoryVerify PREMIS checksums Create Dublin Core templateNormalizeCompile METS Set file permissions Add Dublin Core to METS Appraise SIP for submissionApprove normalizationCopy METS to DIP directory Scan for removed files post appraise SIP for submissionCheck for submission documentationGenerate DIP Place in quarantine Move Submission Documentation into objects directorySet file permissions Remove from quarantine Assign file UUIDs and checksums to submission documentationPrepare AIP Extract packagesExtract packages in submission documentationUpload DIP Sanitize file and directory names Sanitize file and directory names in submission documentationStore AIP Scan for viruses
Media typeFile formats Preservation format(s)Access format(s)Normalization tool Audio AC3, AIFF, MP3, WAV, WMAWAVE (LPCM)MP3FFmpeg PST MBOX readpst Office Open XML DOCX, PPTX, XLSXOriginal formatPDF for PPTXOpenOffice Plain textTXT Original format None Portable Document FormatPDF PDF/APDFGhostscript Presentation filesPPT ODFPDFOpenOffice Raster images BMP, GIF, JPG, JP2*, PCT, PNG*, PSD, TIFF, TGA Uncompressed TIFFJPEGImageMagick Raw camera files/Digital Negative format** 3FR, ARW, CR2, CRW, DCR, DNG, ERF, KDC, MRW, NEF, ORF, PEF, RAF, RAW, X3FOriginal formatJPEGImageMagick/UFRaw SpreadsheetsXLS ODFOriginal formatOpenOffice Vector images AI, EPS, SVGSVGPDFInkscape Video AVI, FLV, MOV, MPEG-1, MPEG- 2, MPEG-4, SWF, WMVMPEG-2MPGFFmpeg Word processing files DOC, WPD, RTFODFPDFOpenOffice Media Type Preservation Plans
GIS Preservation Questions Appropriate formats Acceptable losses during migration/normalization Availability of normalization software Availability of viewing software Necessary metadata
Archivematica Collaborators Artefactual Systems Inc. City of Vancouver Archives International Monetary Fund University of British Columbia Library Rockefeller Archive Centre
Documentation Wikis Vancouver Digital Archives Project ancouver_Digital_Archiveshttp://artefactual.com/wiki/index.php?title=V ancouver_Digital_Archives Archivematica Qubit (ICA-AtoM)