Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies.

Slides:



Advertisements
Similar presentations
Preservation of the Texas Agricultural Experiment Station Bulletin in the Digital Repository By Dr. Rob McGeachin Texas A&M University Libraries June,
Advertisements

Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
File Management Chapter 3
Enterprise Integration Solutions SharePoint Imaging.
Enterprise Content Management Departmental Solutions Enterprisewide Document/Content Management at half the cost of competitive systems ImageSite is:
Washington State Digital Archives Presented by: Adam Jansen Digital Archivist Washington State Archives
Digital Storage in the Cloud: Amazon Web Services & DSpace Barry Davis - Coordinator of Multimedia & Digital Production Services Kevin Gilbertson - Web.
1 Metadata for Asset Management Peter B. Hirtle Co-Director Cornell Institute for Digital Collections.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
HNA-Drive Familiarization Presentation. From the address bar in your preferred internet browser, navigate to Site supports: Internet.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
3. Technical and administrative metadata standards Metadata Standards and Applications.
Records Management Network Digital Archiving Workshop 19 March 2015.
Beyond Paper: Records Preservation in the Digital World Nien-Ling Wacker, CEO LaserFiche Document Imaging
HTML, XML, PDF Pros and Cons.
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
November 2009 Network Disaster Recovery October 2014.
Class 6 Data and Business MIS 2000 Updated: September 2012.
Class 3 Data and Business MIS 2000 Updated: January 2014.
2005 Adobe Systems Incorporated. All Rights Reserved. 1 Ontolog Forum Gunar Penikis Sr. Product Manager Adobe Systems.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
© 2011 Delmar, Cengage Learning Chapter 7 Managing a Web Server and Files.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
CS117 Introduction to Computer Science II Lecture 1 Introduction to WWW and HTML Instructor: Li Ma Office: NBC 126 Phone: (713)
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Digital Preservation 101, or, How to Keep Bits for Centuries Julie C. Swierczek Digital Asset Manager and Digital Archivist Harvard Art Museums.
Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.
Chapter 9 Section 2 : Storage Networking Technologies and Virtualization.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Archiving s. How to Manage Auto-Archive in Outlook Your Microsoft Outlook mailbox grows as you create and receive items. To manage the space.
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
Chapter 8 Browsing and Searching the Web. 2Practical PC 5 th Edition Chapter 8 Getting Started In this Chapter, you will learn: − What is a Web page −
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
William H. Bowers Storage & Retrieval. William H. Bowers Topics Storing vs. Finding Retrieval Methods Associative Retrieval It Ain’t Document-centric.
Services for Object Storage and Preservation March 2008 All content in these slides is considered work in progress. In no way does it represent an absolute.
Introduction to metadata
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
11 Researcher practice in data management Margaret Henty.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
@ulccwww.ulcc.ac.uk IRMS Cymru October 2015 From EDRMS to digital archive: a wish-list for ways to preserve digital records.
Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives.
Here are some things you can do while you wait 1.Open your omeka.net site in your browser (e.g. 2.Open.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Laserfiche Plus AA103 Eric Hu, Software QA Engineer Raymond Cruz, Software Support Engineer.
COMP 143 Web Development with Adobe Dreamweaver CC.
Explore Various Options for Bulk File Transfer out of Alfresco Craig Tan Technical Account Manager.
ETERE A Cloud Archive System. Cloud Goals Create a distributed repository of AV content Allows distributed users to access.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
A strategic view of document and digital object management for the University of the Witwatersrand, Johannesburg Prof Derek W. Keats Deputy Vice Chancellor.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
Course: Cluster, grid and cloud computing systems Course author: Prof
Open Exeter Project Team
Chapter 8 Browsing and Searching the Web
Topics in Born Digital Archiving
The importance of being Connected
DAITSS and the Florida Digital Archive
? What is Institutional Repository for Rutgers University
Microsoft SharePoint Server 2016
Bentley Project Reel Digitization Bentley Historical Library t
This work is licensed under a Creative Commons Attribution 3
InLoox PM Web App product presentation
Storage Basic recommendations:
Managing a Web Server and Files
Dissemination and Communication Introductory course
Introducing MagicInfo 6
Presentation transcript:

Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies

Why do I need a long term strategy?

White: Blank

Defining an Archive – What it is

Defining an Archive – What it isn’t

What should I archive? What is driving your archive requirement? Regulatory requirements? Sarbanes Oxley / HIPAA / FDA CFR Business Continuity requirements? Cultural Preservation?

What should I archive? Document Content Do you have multiple content streams? Document Metadata Do you need it all, or will a subset suffice? What metadata is required to locate a document? Related Documents By association? By related search? How do you identify what is related? Version history Audit trail? How will you present audit in a self-contained manner?

How do I prepare it for the long term? What good is the data if you can’t view it? Say yes to: Well defined formats Open specifications Broad vendor support Stable, strong governing bodies (ISO, etc) Human readable text Say no to: Single vendor specific formats Closed specifications Binary only data Patent or IP encumbered formats

XML Data Verbosity is your friend Usually considered a weakness of XML, but for long term viability a verbose, descriptive format is desirable. Don’t forget the DTD / Schema External DTD/Schema references may disappear, when packaging for archive, grab a local copy. Prefer multi-vendor standards AniML (Analytical Markup Language) Many others (domain dependent) Avoid embedded binary data

PDF and PDF/A PDF is an excellent choice for long term archiving of documents (with a few caveats) PDF/A adds restrictions designed to make preservation for the long term more reliable No linked fonts, embedded only No audio / video content (must archive independently if required) Device independent color space definition No encryption Many more, depending on conformance level

PDF/A Demo

Images, Audio and Video Most image formats are open specifications and have broad support, choose the one that makes the most sense for your type of content. Use unencumbered video formats wherever possible, trading off size vs quality as required Use unencumbered, open audio formats where possible

Exporting Most archive mechanisms will require getting the files and related artifacts OUT of Alfresco ACP Bulk Export Download as Zip Custom Action ArchiveService

Packaging Not all archives will require packaging Storing of related artifacts together Document Metadata file Audit trail file Related documents Packaging type depends on archive target Simple Folder Zip file Cloud container

Now where do I put it?

Archive In Place Pros Simplest to implement Good support in Alfresco OOTB Easy to move to inexpensive storage (ContentStoreSelector) May be as simple as a marker aspect, property or path No export requried Cons Remains in Alfresco, contributing to DB size and potentially affecting performance Indexing in a separate store requires SOLR changes Backups of live repository grow with archive

Archive Repository Pros Active repository is leaner, faster Potentially no need to export to file Alfresco -> Alfresco transfers supported OOTB Cons Cost (separate repository, separate server) Complexity May need to develop a connector for the remote repository, if it is not Alfresco

Archive To Media Pros Repurpose existing backup equipment Media management and rotation schedules are well understood Easy to move very large volumes of data off site Cons Media degrades over time Requires preserving both software AND hardware Bulky Labor intensive Expensive Requires exporting to a file

Archive To Cloud Pros Cost Simplicity Transfer of risk to third party Cons Loss of control Long retrieval times for some services (< 4 hrs for AWS Glacier)

Glacier Direct

AWS Archive Demo

Glacier via S3 (Cloud Deployment)

Glacier via S3 (Hybrid Model)

References and Additional Reading US Library of Congress – Sustainability of Digital Formats Blog of “The Long Now Foundation”