PDF and Long Term Preservation May 17, 2005 Susan J. Sullivan, CRM

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
PDF/A The Development of a Digital Preservation Standard Stephen Abrams Harvard University Betsy Fanning Association for Information and Image Management.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
ISO 9001:2000 Documentation Requirements
The International Security Standard
ANSI/ASQ E Overview Gary L. Johnson U.S. EPA
Technical update on ISO 9001:2015 Colin MacNee Duncan MacNee Limited
0 Jim Suderman Member Canadian Research Team, InterPARES 2 / Archives of Ontario Jim Suderman Member Canadian Research Team, InterPARES 2 / Archives of.
GMP Document and Record Retention
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
TAMING THE WILD FRONTIER: IMPROVING SYSTEMS AND PROCESSESS IN CREATING DIGITAL RECORDS IN GOVERNMENT BY AZIMAH MOHD ALI THE NATIONAL ARCHIVES OF MALAYSIA.
Developing a Records & Information Retention & Disposition Program:
ISO Current status of development
3. Technical and administrative metadata standards Metadata Standards and Applications.
InterPARES Project Joanne Evans, School of Information Management and Systems, Monash University Description Cross-domain Description Cross Domain - Metadata.
IRS XML Standards & Tax Return Data Strategy For External Discussion June 30, 2010.
Author(s): David A. Wallace and Margaret Hedstrom, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative.
Prepared by Long Island Quality Associates, Inc. ISO 9001:2000 Documentation Requirements Based on ISO/TC 176/SC 2 March 2001.
Portable Document Format PDF. What is PDF? Universal file format developed by Adobe Systems Incorporates fine detail and quality of print publications.
ASPEC Internal Auditor Training Version
Quality Representative Training Version
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Portable Document Format PDF. What is PDF? Universal file format developed by Adobe Systems Incorporates fine detail and quality of print publications.
ISO 9001:2015 Revision overview December 2013
ISO 9001:2015 Revision overview - General users
32 nd Session of SIO Thessaloniki Milovan Misic Tuesday, September 26, 2006 Standards for digital archives and electronic records management.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
European Metadata Initiatives: The METAe Metadata Engine Simon Tanner Higher Education Digitisation Service
Chapter 4 Interpreting the CMM. Group (3) Fahmi Alkhalifi Pam Page Pardha Mugunda.
1 EDMS 101 Speaker: Monica Crocker, DHS EDMS Coordinator Overview of current project(s) Objective of this section: This session outlines EDMS fundamentals.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
IECM Briefing: XML Community of Practice Betsy Fanning AIIM.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Achieving a just and secure society Records and Information Standards This presentation includes work of the ISO TC 171 SC3 JWG Joanna Baker, IIM Conference,
Implementing the Standard on digital recordkeeping.
What is the Archives of Institutional Memory? The Archives of Institutional Memory is a digital repository for disseminating and preserving official Indiana.
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
Preserving PDF Documents: Exploring the PDF/A Standard Susan J. Sullivan, CRM U.S. National Archives and Records Administration
Massella Ducci Teri Italian approach to long-term digital preservation Policies for Digital Preservation ERPANET Training Seminar.
“Guidance on the Selection and Appraisal of Geospatial Content of Enduring Value, April 2014 Draft” groups-subcommittees/hdwg/index_html.
AIIM Standards Betsy Fanning Director, Standards & Content, AIIM.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Introduction 1. Purpose of the Chapter 2. Institutional arrangements Country Practices 3. Legal framework Country Practices 4. Preliminary conclusions.
1 Strategic Plan for Digital Archives Programme DAP PROJECT SCOPE OVERVIEW STATUS.
Categorization Recommendations for Implementing the E-Gov Act of 2002 Richard Huffine U.S. Environmental Protection Agency Co-chair, Categorization Working.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
ISO Current status of development ​ ​ ISO development process ​1​1.
1 ISO/PC 283/N 197 ISO Current status of development November 2015.
HIT Policy Committee NHIN Workgroup HIE Trust Framework: HIE Trust Framework: Essential Components for Trust April 21, 2010 David Lansky, Chair Farzad.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
1 Electronic Document Workflow Stephen P. Levenson Convener TC 171 SC2/JWG5 Chief Technology Office Administrative Office of the U.S. Courts.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
ERM E-Gov Initiative Transfer of Permanent Geospatial Data Records to NARA Briefing for FGDC Coordination Group Bruce Ambacher, NARA September 2, 2003.
Colorado Springs Producer-Archive Interface Specification Status of standardisation project Main characteristics, major changes, items pending.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Geospatial Data Appraisal NDIIPP Meeting Presented by Brett Abrams, Archivist June, 2012.
Stewart Rogers Marketing Communications Director Slide 1 © 2010 PDF/A Competence Center, PDF/A for long-term archival What you.
Introducing ICA-Requirements Module 3: Functional Requirements for Records in Business Systems
Building A Repository for Digital Objects
ISO Standard Development
Metadata for research outputs management
iECM Briefing: XML Community of Practice
Open Archival Information System
Presentation transcript:

PDF and Long Term Preservation May 17, 2005 Susan J. Sullivan, CRM

Introduction Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF) –Explain why long term preservation of electronic documents in PDF is an issue –Describe the draft PDF/A ISO Standard in the context of NARA’s PDF Transfer Guidance for permanent records in PDF…including NARA’s expectations for PDF/A –Explain the roles of both PDF/A and the PDF Transfer Guidance in Federal recordkeeping –Provide an overview of PDF/A and its status in the ISO Process –Quiz at the end (group participation)

Background – Wide Use of PDF PDF is a digital format that electronically reproduces the visual appearance of documents whether they are: –Converted from other electronic formats, or –Digitized from paper or microform Businesses, governments, libraries, archives, and other institutions and individuals around the world use PDF to: –Collect and disseminate information over the Internet, –Store electronic records, and/or –Make scanned images searchable by embedding OCR’d text. As a result, large bodies of important information are maintained in PDF.

Background – PDF Not a Suitable Archival Format PDF itself is not suitable as an archival format. –Adobe  is under no obligation to continue publishing the specification for future versions –Can include features incompatible with current archival requirements Encryption Embedded files –PDF documents not necessarily self-contained Can depend on system fonts and other content drawn from outside the file –Multiple PDF development tools on the market Inconsistency in the file format (all PDFs are not created equal) Long-term solution needed to ensure that digital PDF documents remain accessible for long periods of time –Permanent archival records, in some cases –Administrative Office of U.S. Courts initiated idea for PDF/A

Background – Example Business Case for Long Term Preservation of PDF Administrative Office of the U.S. Courts (AOUSC) –Uses PDF as the electronic format for Electronic Case Filing System –System accepts filings and provides access to filed PDF documents over the Internet –Many AOUSC files must be maintained for long periods of time (e.g., 40 years)  Some will be transferred to the National Archives for permanent retention –Future use of and access to the AOUSC’s PDF documents depends on maintaining the ability to reproduce their visual appearance and other properties over the long term (i.e., across multiple generations of technology)

Background - How NARA is Addressing PDF Issued PDF Transfer Guidance NARA partnered with Federal agencies and issued guidance allowing transfer of permanent records in PDF to NARA (March 2003) –Part of Electronic Records Management E-Gov Initiative –Agency partners identified PDF as one of six priority records transfer formats Participating in PDF/A ISO Standard Development NARA is participating in PDF/A development… –To influence the process so that PDF/A compliant records can be preserved by NARA over the long term, and –To provide information used in developing/maintaining NARA guidance for transferring permanent records in PDF

Background - Transfer Format versus File Format Goal: To ensure that valuable electronic information in PDF is not lost Purpose: Transfer Format - NARA’s PDF Transfer Guidance –Specifies requirements for transferring permanent records in PDF to NARA –Applies to existing and future records in PDF so that NARA can accept and process these records File Format - The PDF/A ISO Standard (PDF/A) –Specifies a file format, based on PDF, that is more suitable than PDF for long term preservation –Will allow PDF records to be maintained longer as PDF (e.g., within agencies)

Scope and Usage - NARA’s PDF Transfer Guidance Scope –Applies to records scheduled as permanent –Supplements existing Federal regulations –Covers existing and future electronic records meeting transfer requirements, but…. Unique circumstances NARA will work with agencies through their Appraisal Archivist to ensure that valuable electronic records are not lost Usage –Agencies will use NARA’s PDF Transfer Guidance to Transfer existing permanent PDF records to NARA

Scope and Usage - PDF/A ISO Standard Scope –Defines a file format based on PDF, that preserves the visual appearance of electronic documents over time Provides a framework for recording and embedding metadata within PDF files Defines a framework for representing the logical structure and other semantic information of electronic documents within PDF/A files Usage –Vendors will use the PDF/A Standard to develop applications that read, write, and otherwise process conforming PDF/A files –Agencies will use PDF/A applications to create and process PDF/A conformant files As part of their strategy for long term preservation of electronic records In conjunction with PDF transfer guidance for transferring permanent records to NARA, as applicable

Scope and Usage - Summary NARA’s PDF Transfer Guidance Applies to records in PDF scheduled as permanent Incorporates file format(s) (e.g., PDF ), Incorporates quality criteria, laws and regulations, transfer documentation, NARA contact information PDF/A ISO Standard Addresses one aspect of the long term preservation of electronic records in PDF (i.e., file format) Should be used as one component of an organization’s electronic archival environment Implementation depends on: –Records management policies and procedures –Additional requirements and conditions necessary to ensure the persistence of electronic documents over time (e.g., including PDF Transfer Guidance). –Quality assurance processes necessary to verify conformance with requirements

Requirements - PDF/A and NARA’s PDF Transfer Guidance Embedded fonts PDF/A and NARA’s PDF Transfer Guidance both require that all referenced fonts be embedded –For documents created before 4/1/04, NARA accepts PDFs that do not have all fonts embedded (i.e., base 14 - resident in operating system) Encryption PDF/A and NARA’s PDF Transfer Guidance both prohibit encryption –For documents created before 4/1/04, NARA accepts PDFs with encryption that does not prevent opening, viewing, printing

Special Features PDF/A restricts special features –Embedded files, external links, Java Script –PDF/A promotes tagged PDF as a higher level of conformance NARA’s PDF Transfer Guidance evaluates special features on a case-by-case basis at the time of scheduling –To evaluate recordkeeping implications and to ensure valuable records are not lost Metadata/Documentation PDF/A requires that embedded metadata must be in Adobe eXtensible Metadata Platform (XMP) NARA’s PDF Transfer Guidance requires transfer documentation (e.g., SF-258), and would evaluate embedded metadata during the scheduling process Requirements - PDF/A and NARA’s PDF Transfer Guidance

Quality Requirements PDF/A as a file format does not address quality/creation requirements –Includes recommended guidelines for exact replication of source material in Informative Annex B –Agencies must implement the guidelines of Informative Annex B to comply with NARA’s PDF transfer guidance NARA’s PDF Transfer Guidance requires minimum scanning quality, prohibits lossy compression and substitution of bitmapped characters with OCR’d text Requirements - PDF/A and NARA’s PDF Transfer Guidance

NARA’s Expectations for PDF/A –PDF/A should address some existing archival issues with PDF and enable records in PDF to be maintained for longer periods of time in that format Standard maintained by external international organization, not just vendors Increased degree of format reliability/decrease in “bells & whistles” –Agencies will need to implement PDF/A in conjunction with records management policies and procedures and any additional requirements and conditions necessary to ensure the persistence of electronic documents over time Examples –NARA’s PDF Transfer Guidance –AOUSC’s document management program

PDF/A ISO Process – International Joint Working Group ISO Joint Working Group (JWG) - PDF/A TC/46 Information & Documentation TC/130 Graphics Technology TC/46 SC11 Archives/ Records Mgmt TC/171* Document Imaging Applications TC/171 SC 2 Application Issues TC/171 SC 2 WG-5 PDF/A PDF/A JWG TC/42 Photography * JWG formed under the auspices of TC/171

PDF/A ISO Process – Progress and Next Steps Early 2002 PDF/A development initiated September 2003 Approval of ISO New Work Item (NWI) October 2003 TC-171 Meeting - JWG prepared Committee Draft (CD) November 2003/February CD ballot circulated to National Bodies (NBs) March JWG reviewed NB comments on CD June 2004/September Second CD ballot circulated to NBs October JWG Meeting - JWG prepared Draft International Standard (DIS) Winter/Spring DIS Balloted to National Bodies –Unanimous affirmative votes - Goes to publication –Up to 25% negative Votes – Goes to FDIS, then 1 month ballot Summer TC-171 Meeting - JWG meeting to deal with DIS comments and discuss new work Summer International Standard/FDIS? Software developers create PDF/A compliant applications

PDF/A - Approach PDF/A specifies: –The subset of PDF components, from the Adobe  published specification for Version 1.4 (i.e., PDF 1.4 Reference), that are either required, restricted, or prohibited, and –How these components may be used by software to render the file PDF/A PDF 1.4 Reference Specifies required features Specifies restricted features Specifies prohibited features

PDF/A - Requirements Prohibit or restrict features that could complicate long term preservation, and Maximize the following PDF attributes: – Device independence  The degree to which a PDF/A file is independent of the platform on which it is interpreted and rendered  The degree to which a PDF/A file is amenable to direct analysis with basic tools, including human readability – Self-containment  The degree to which a PDF/A file contains all resources necessary for its reliable and predictable interpretation and rendering – Self-documentation  The degree to which a PDF/A file documents itself in terms of descriptive, administrative, structural, and technical metadata

PDF/A - Table of Contents 1 Scope 2 Normative References 3 Terms and Definitions 4 Notation 5 Conformance Levels 6 Technical Requirements –6.1 File Structure –6.2 Graphics –6.3 Fonts –6.4 Transparency –6.5 Annotations –6.6 Actions –6.7 Metadata –6.8 Logical Structure –6.9 Interactive Forms Informative annexes –Annex A - PDF/A-1 Conformance Summary –Annex B - Best Practices for PDF/A Bibliography

Annexes of the Draft PDF/A Standard – Informative Annexes Informative Annexes will provide supplemental information including: –PDF/A-1 Conformance Summary Summary tables of PDF objects and keys required, restricted and prohibited in PDF/A –Best Practices for PDF/A Guidelines for capturing or converting electronic documents to PDF/A –For documents created according to specific institutional rules –Replicates the exact quality and content of source documents within the PDF/A file Required for compliance with NARA’s PDF Transfer Guidance

PDF/A - Overview of Requirements Two levels of conformance –Level A (e.g., Tagged PDF, UNICODE Mapping) –Level B (e.g. No Tagged PDF) Uniform file format (header, trailer, no encryption) Device-independent rendering of graphics Embedded fonts, character encoding Transparency prohibited Annotations restricted, content should be displayed by readers External actions restricted, no dependence on external content Readers not required to act on hyperlinks, but may XMP metadata “Adobe XML Metadata Framework” Forms based on appearance, not data

Quiz - True or False? The draft PDF/A ISO Standard… –Provides quality standards for converting electronic documents to PDF False –Should enable electronic documents in PDF to be maintained longer as PDF True –Is intended for use as one component of an organization's electronic archival environment for long-term retention of documents True

For permanent records in PDF, agencies need to understand that: –Records in PDF/A are guaranteed to be readable forever False –PDF/A, by itself, does not guarantee exact replication of source material True –Agencies must implement PDF/A in conjunction with additional requirements to meet NARA standards for transferring permanent records to NARA (i.e., NARA’s PDF Transfer Guidance) True Everyone is now excited to learn more about PDF….. –True! Quiz - True or False?

More Information is Available More information on NARA’s PDF Transfer Guidance on NARA’s Web Site – ds.html More information on PDF/A on AIIM Web Site – Contact Susan Sullivan at

Questions/Discussion