Download presentation
Presentation is loading. Please wait.
Published byEmma Knight Modified over 8 years ago
1
Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies
2
Why do I need a long term strategy?
3
White: Blank
4
Defining an Archive – What it is
5
Defining an Archive – What it isn’t
6
What should I archive? What is driving your archive requirement? Regulatory requirements? Sarbanes Oxley / HIPAA / FDA CFR Business Continuity requirements? Cultural Preservation?
7
What should I archive? Document Content Do you have multiple content streams? Document Metadata Do you need it all, or will a subset suffice? What metadata is required to locate a document? Related Documents By association? By related search? How do you identify what is related? Version history Audit trail? How will you present audit in a self-contained manner?
8
How do I prepare it for the long term? What good is the data if you can’t view it? Say yes to: Well defined formats Open specifications Broad vendor support Stable, strong governing bodies (ISO, etc) Human readable text Say no to: Single vendor specific formats Closed specifications Binary only data Patent or IP encumbered formats
9
XML Data Verbosity is your friend Usually considered a weakness of XML, but for long term viability a verbose, descriptive format is desirable. Don’t forget the DTD / Schema External DTD/Schema references may disappear, when packaging for archive, grab a local copy. Prefer multi-vendor standards AniML (Analytical Markup Language) Many others (domain dependent) Avoid embedded binary data
10
PDF and PDF/A PDF is an excellent choice for long term archiving of documents (with a few caveats) PDF/A adds restrictions designed to make preservation for the long term more reliable No linked fonts, embedded only No audio / video content (must archive independently if required) Device independent color space definition No encryption Many more, depending on conformance level
11
PDF/A Demo
12
Images, Audio and Video Most image formats are open specifications and have broad support, choose the one that makes the most sense for your type of content. Use unencumbered video formats wherever possible, trading off size vs quality as required Use unencumbered, open audio formats where possible
13
Exporting Most archive mechanisms will require getting the files and related artifacts OUT of Alfresco ACP Bulk Export Download as Zip Custom Action ArchiveService
14
Packaging Not all archives will require packaging Storing of related artifacts together Document Metadata file Audit trail file Related documents Packaging type depends on archive target Simple Folder Zip file Cloud container
15
Now where do I put it?
16
Archive In Place Pros Simplest to implement Good support in Alfresco OOTB Easy to move to inexpensive storage (ContentStoreSelector) May be as simple as a marker aspect, property or path No export requried Cons Remains in Alfresco, contributing to DB size and potentially affecting performance Indexing in a separate store requires SOLR changes Backups of live repository grow with archive
17
Archive Repository Pros Active repository is leaner, faster Potentially no need to export to file Alfresco -> Alfresco transfers supported OOTB Cons Cost (separate repository, separate server) Complexity May need to develop a connector for the remote repository, if it is not Alfresco
18
Archive To Media Pros Repurpose existing backup equipment Media management and rotation schedules are well understood Easy to move very large volumes of data off site Cons Media degrades over time Requires preserving both software AND hardware Bulky Labor intensive Expensive Requires exporting to a file
19
Archive To Cloud Pros Cost Simplicity Transfer of risk to third party Cons Loss of control Long retrieval times for some services (< 4 hrs for AWS Glacier)
20
Glacier Direct
21
AWS Archive Demo
22
Glacier via S3 (Cloud Deployment)
23
Glacier via S3 (Hybrid Model)
24
References and Additional Reading US Library of Congress – Sustainability of Digital Formats http://www.digitalpreservation.gov/formats/ Blog of “The Long Now Foundation” http://blog.longnow.org/category/digital-dark-age/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.