Stewart Rogers Marketing Communications Director Slide 1 © 2010 PDF/A Competence Center, PDF/A for long-term archival What you need to know about PDF/A… DLM Forum 2010 Stewart Rogers Marketing Communications Director Crawford Technologies
Stewart Rogers Marketing Communications Director Slide 2 © 2010 PDF/A Competence Center, 2
Stewart Rogers Marketing Communications Director Slide 3 © 2010 PDF/A Competence Center, Crawford Technologies Toronto-based global software company Privately held Founded 1995 Locations Offices in Canada, UK, USA, Brazil VAR and OEM partners across the world Product lines Document Manipulation & Re-Engineering Data Mining of Print Streams Print Stream Conversions Archiving & Retrieval Workflow Processing
Stewart Rogers Marketing Communications Director Slide 4 © 2010 PDF/A Competence Center, Agenda Why are long term archives necessary? What are the requirements for a long term archive format? Why choose PDF? What is PDF/A? PDF/A Migration & Archive strategy
Stewart Rogers Marketing Communications Director Slide 5 © 2010 PDF/A Competence Center, Archive Technologies
Stewart Rogers Marketing Communications Director Slide 6 © 2010 PDF/A Competence Center, Archive Questions Media Lifetime? Paper – hundreds of years Microfiche – dozens of years Magnetic – perhaps a decade? Optical – Unknown? Reader lifetime? Paper – while language exists Microfiche – decades Magnetic – decades Optical – perhaps 2-3 OS generations Key Issues in electronic archive & retrieval Obsolete formats Reader software is obsolete – no OS to run it on
Stewart Rogers Marketing Communications Director Slide 7 © 2010 PDF/A Competence Center, Legal/Regulatory Retention Periods
Stewart Rogers Marketing Communications Director Slide 8 © 2010 PDF/A Competence Center, Business Issues How do we meet: legal and regulatory requirements to hold electronic documents… for the mandated length of time? in a cost effective manner? with a defensible plan to manage them? requests, when required, reproduce the ‘original’ enough to satisfy a court of law
Stewart Rogers Marketing Communications Director Slide 9 © 2010 PDF/A Competence Center, Components of a Long Term Archive System Storage Format Archival system HW SW Retrieval/display software Process and procedures
Stewart Rogers Marketing Communications Director Slide 10 © 2010 PDF/A Competence Center, Ideal Storage Format Requirements Accessible No encryption, no proprietary formats Platform, OS, device independent Can be read, understood and displayed on many HW/SW platforms Published specification Open, accepted specification controlled by standards organisation Self-contained No external resources needed – including fonts Transparent Can easily be read, parsed with non-proprietary tools Widely distributed Accepted by both industry and governments
Stewart Rogers Marketing Communications Director Slide 11 © 2010 PDF/A Competence Center, Candidates? Raster/TIFF Broad acceptance, but obsolete Loss of information – no text, structure, individual graphics Creation from current systems involves throwing away information Vendor Formats Proprietary formats Future unsure Not designed to be self-contained XML Ability to exactly duplicate look-and-feel difficult XSL/FO still not widely accepted Too many DTDs, Schemas PDF If tightly constrained Already widely accepted
Stewart Rogers Marketing Communications Director Slide 12 © 2010 PDF/A Competence Center, PDF/A PDF Archive format specification PDF/A Standard, stabilised archive format Retain exact same look-and-feel “a file format based on PDF which provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rending the files.” The first step in a long term archival system Specifies only the format not archive system, HW or SW, or process
Stewart Rogers Marketing Communications Director Slide 13 © 2010 PDF/A Competence Center, PDF/A Specification Effort started in 2002 by: AIIM (Association for Information and Image Management) NPES (National Printing Equipment Association) Administrative Office of the U.S. Courts. ISO standard – 2005 ISO : Document management – Electronic document file format for long-term preservation Part 1: Use of PDF 1.4 (PDF/A-1)”. Today AIIM is lead on ISO Standard PDF/A Competence Center is industry association
Stewart Rogers Marketing Communications Director Slide 14 © 2010 PDF/A Competence Center, PDF/A Levels (3 currently) PDF/A-1a (Level A Conformance) Full compliance with the currently approved PDF/A Standard ISO : Part 1 PDF/A-1a ensures the preservation of a document’s logical structure and content text stream in natural reading order. The text extraction is especially important when the document is displayed: on a mobile device (for example a PDA) the text must be reorganised on the limited screen size (re-flow). On a device which provide access for the disabled the text must be read appropriately (e.g., table headers or multiple columns) This feature is also known as “Tagged PDFs”.
Stewart Rogers Marketing Communications Director Slide 15 © 2010 PDF/A Competence Center, PDF/A Levels (3 currently) PDF/A-1b (Level B Conformance) Minimal compliance to ensure that the rendered visual appearance of the file is reproducible over the long- term. PDF/A-1b ensures that the text (and additional content) can be correctly displayed (e.g. on a computer monitor), but does not guarantee that extracted text will be legible or comprehensible. It therefore does not guarantee use by mobile devices or compliance with disability access regulations.
Stewart Rogers Marketing Communications Director Slide 16 © 2010 PDF/A Competence Center, PDF/A Levels (3 currently) PDF/A Part 2 Based on selected functionality from PDF 1.5, 1.6, & 1.7 Backwards compatible but not forward compatible Working in parallel with the Part 1 formats PDF/A-2 is based on ISO PDF JPEG2000: better compression and quality (scans) Embedded PDF/A files via Collections Transparency PDF “Optional Content” or Layers OpenType Fonts New Conformance Level 2u – u for Unicode Object Level XMP metadata Further innovations for developers
Stewart Rogers Marketing Communications Director Slide 17 © 2010 PDF/A Competence Center, PDF/A Internals Must be totally self-contained No external resources, pointers, links to external content Fonts must be included – even Acrobat Base 14 Some functionality forbidden Audio, video media inclusions No encryption, LZW compression Transparencies But some critical functions retained Digital signatures Metadata
Stewart Rogers Marketing Communications Director Slide 18 © 2010 PDF/A Competence Center, PDF/A Products Types of PDF/A Products PDF/A compliance verification Acrobat 9.1 added significantly more detailed checking and fixed some issues PDF/A creation Creating PDF/A files from scratch PDF/A conversion Converting existing files into PDF/A Target processes Workstation (low volume) Enterprise (high volume, production) Most solutions produce PDF/A-1b To produce PDF/A-1a requires a much more sophisticated production process
Stewart Rogers Marketing Communications Director Slide 19 © 2010 PDF/A Competence Center, Migration to a PDF/A Archive For existing archives Requires conversion What is the migration process? For new archives Requires new processes, products Consider parallel processes for enterprises PDF for short-term archives Customer viewing, advanced marketing PDF/A for long-term archive of record Downside PDF/A files are larger Not as relevant for small archives Needs to be considered for enterprise archives
Stewart Rogers Marketing Communications Director Slide 20 © 2010 PDF/A Competence Center, Location of change affects cost Changing at the application level Most expensive Highest level of maintenance Highest level of disruption Changing using middleware Less expensive than application changes Lower level of maintenance Higher level of documentation/skills Lower level of disruption The further towards the print-stream you get, the more cost-effective moving to PDF/A becomes.
Stewart Rogers Marketing Communications Director Slide 21 © 2010 PDF/A Competence Center, Archive strategy PDF/A is one component Also requires: Archive system design and implementation Corporate processes and procedures Detailed knowledge of what is to be archived Current production processes Future production processes Legacy data and documents
Stewart Rogers Marketing Communications Director Slide 22 © 2010 PDF/A Competence Center, Summary PDF/A format meets the needs for long-term archive Functional and legal PDF/A is a format only Also requires planning and implementation of an overall long-term archive strategy
Stewart Rogers Marketing Communications Director Slide 23 © 2010 PDF/A Competence Center, Stewart Rogers Marketing Communications Director Crawford Technologies Follow us on For More Information
Stewart Rogers Marketing Communications Director Slide 24 © 2010 PDF/A Competence Center, Join us on November 19 th 2010 in London for our first UK Chapter Seminar pdfauk.eventbrite.com UK Chapter Seminar