Presentation is loading. Please wait.

Presentation is loading. Please wait.

Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001.

Similar presentations


Presentation on theme: "Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001."— Presentation transcript:

1 Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001

2 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 2 Overview A proposal (IBM to Koninklijke Bibliotheek) –Save original “executable” object –Save specification on how to extract data from object –Encapsulate enough information to allow the creation of a extraction program in the future Provides a starting point

3 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 3 Size of the Problem to Address Multiple levels of document complexity –Simple linear data, single data type –Moderately complex data, multiple data types and some arbitrary structure –Complex data relationships requiring preservation of environment Moderately complex proposed for demonstration

4 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 4 Graphic of Proposed Solution

5 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 5 What Happens Now? Metadata are created that describe all data in the file (based on XML model) Methods are added that when given the file as an input, produce the original output Methods are based a “Universal Virtual Computer” (UVC)

6 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 6 What Happens in the Future? Specification for UVC are “well known” A UVC is created IAW some version level The UVC “reads” the file and creates the original output Allow future users to make queries against the document

7 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 7 So What Happened Next? Original reading was a proposal Follow up reading was a test case “The UVC: A Method for Preserving Digital Documents, Proof of Concept” –IBM/KB Long Term Preservation Study –December 2002 –Raymond Lori –ISBN: 90-6259-157-4 –http://www.kb.nl/kb/hrd/dd/dd_onderzoek/reports/4- uvc.pdf

8 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 8 PDF Document Type Was Selected “… because of its importance in the publishing community. …” Difficulty extracting textual information from encoded file –Letter “A” is not stored as an ASCII A –Parameters stored to allow an “A” to be drawn

9 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 9 Clever Solution to Solve Text Extraction

10 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 10 How it Works: GSview is a graphical interface for Ghostscript. Ghostscript is an interpreter for the PostScript page description language used by laser printers. PDF is printed to a PostScript file for GSview to read goBCL converts PDF files to HTML. Application merges GSview images with HTMLish tags. Allows text queries to display related page.

11 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 11 How well did it work? Didn’t state how many files were converted Identified a few bugs in goBCL Alluded to problems decoding JPEG files Executed queries Claimed success

12 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 12 Miscellenea Appendix with notational UVC architecture Appendix with marcos to support UVC software development Appendix containing a logical view of a PDF document

13 18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 13 Additional Links Lorie appears to have published a fair amount about relational database systems A list of Lorie’s publications – http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Lorie:Raymond_A=.html Yet another UVC paper (15 June 2001) –http://www.rlg.org/preserv/diginews/diginews5-3.html Page with all sorts of preservation links: –http://sunsite.berkeley.edu/Longevity/


Download ppt "Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001."

Similar presentations


Ads by Google