Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001.

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

1 Virtualization Manager Marco Fulcoli, ACS Rome, 16 sept 2009.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
DDA and metadata handling Questions Variables Study description Adresses Administrative data related to studies.
Word Processing and Desktop Publishing Software
1 Actuate Corporation © 2010 THE BIRT COMPANY THE BIRT COMPANY THE BIRT COMPANY THE BIRT COMPANY THE BIRT COMPANY THE BIRT COMPANY THE BIRT COMPANY THE.
With Microsoft Access 2010© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Information Retrieval in Practice
Converting Microsoft Office Documents Bill Weber E-Learning Systems Administrator E-Learning Operations.
Technical Tips and Tricks for User Support Mike Gardner
Publishing Workflow for InDesign Import/Export of XML
1 CS1001 Lecture Overview Java Programming Java Programming Midterm Review Midterm Review.
WMES3103 : INFORMATION RETRIEVAL
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
Strategic Thinking and Significant Characteristics Hamish James.
Outline Chapter 1 Hardware, Software, Programming, Web surfing, … Chapter Goals –Describe the layers of a computer system –Describe the concept.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Tutorial 8 Sharing, Integrating and Analyzing Data
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Overview of Search Engines
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Vector A software technology that uses mathematical points based on “vectors" (information giving both magnitude and direction). Because the computer.
CPSC 203 Introduction to Computers Lab 39, 40 By Jie (Jeff) Gao.
Digital Imaging and Remote Sensing Laboratory R.I.TR.I.TR.I.TR.I.T R.I.TR.I.TR.I.TR.I.T Writing Large Documents with LaTeX and WinEdt Emmett Ientilucci.
TERMS TO KNOW. Programming Language A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks. Each language has.
Statistical graphics for publication – easy ways to meet requirements for high resolution Jim Flewelling Growth Model Users Group February 11, 2008.
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
Web Application Architecture and Communication. Displaying a Web page in a Browser
Classroom User Training June 29, 2005 Presented by:
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
CS240 Computer Science II Introduction to Unix Based on “UNIX for Programmers and Users” by G.Class and K. Ables.
HTML and Style. Session overview Leveling-off on the basic concepts of HTML and Styles Discuss Web authoring options.
OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.
2 pt 3 pt 4 pt 5pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2pt 3 pt 4pt 5 pt 1pt 2pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4pt 5 pt 1pt Terms 2 Terms 3 Terms 4 Terms 5 Terms.
Public Domain/Open Source Software Evaluation Photo Organizer.
Fundamentals of Web Design Copyright ©2004  Department of Computer & Information Science Introducing XHTML: Module A: Web Design Basics.
Digital Filing A Simple Way to Digitally Centralize and Distribute Documents.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
 It depends...  If your document will contain predominantly text, Word is the better choice.  If your document will contain many graphics, photos,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
Introducing the World Wide Web Internet- a structure made up of millions of interconnected computers whose users communicate with each other and share.
Convert PDF files to PowerPoint slides Extract specific PDF pages to PowerPoint - Support to convert encrypted PDF files - Convert PDF to PowerPoint 2003/2007/2010.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
Implementation and partnership with industry: a case Koninklijke Bibliotheek & IBM Netherlands Hans Jansen Head Research & Development Koninklijke Bibliotheek.
ITGS Application Software. ITGS Application software (productivity software) –Allows the user to perform tasks to solve problems, such as creating documents,
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
HTML HyperText Markup Language. Text Files An array of bytes stored on disk Each element of the array is a text character A text editor is a user program.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Seybold 2002 Mark Stephens (Managing Director) Ready made solutions. Bespoke development, configuration and consultancy.
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
Digital Data Preservation: a schema-driven model Student: Stacy Kowalczyk Co-Authors: Clare McInerney and Phil Mitchell Digital Data Preservation – the.
A Beginner’s Guide to Preserving Digital Resources in Historic Environment Records Catherine Hardman and Kieron Niven Archaeology Data Service.
Copyright 2007, Paradigm Publishing Inc. BACKNEXTEND 8-1 LINKS TO OBJECTIVES Import data from another Access table Import data from another Access table.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Information Retrieval in Practice
InterLaser Basic requirements.
Concepts Ch 1 Review.
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Zachary Cleaver Semantic Web.
Word Processing and Desktop Publishing Software
Presentation transcript:

Long Term Preservation of Digital Data Raymond A. Lorie JCDL ‘01 June 24-28, 2001

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 2 Overview A proposal (IBM to Koninklijke Bibliotheek) –Save original “executable” object –Save specification on how to extract data from object –Encapsulate enough information to allow the creation of a extraction program in the future Provides a starting point

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 3 Size of the Problem to Address Multiple levels of document complexity –Simple linear data, single data type –Moderately complex data, multiple data types and some arbitrary structure –Complex data relationships requiring preservation of environment Moderately complex proposed for demonstration

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 4 Graphic of Proposed Solution

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 5 What Happens Now? Metadata are created that describe all data in the file (based on XML model) Methods are added that when given the file as an input, produce the original output Methods are based a “Universal Virtual Computer” (UVC)

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 6 What Happens in the Future? Specification for UVC are “well known” A UVC is created IAW some version level The UVC “reads” the file and creates the original output Allow future users to make queries against the document

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 7 So What Happened Next? Original reading was a proposal Follow up reading was a test case “The UVC: A Method for Preserving Digital Documents, Proof of Concept” –IBM/KB Long Term Preservation Study –December 2002 –Raymond Lori –ISBN: – uvc.pdf

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 8 PDF Document Type Was Selected “… because of its importance in the publishing community. …” Difficulty extracting textual information from encoded file –Letter “A” is not stored as an ASCII A –Parameters stored to allow an “A” to be drawn

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 9 Clever Solution to Solve Text Extraction

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 10 How it Works: GSview is a graphical interface for Ghostscript. Ghostscript is an interpreter for the PostScript page description language used by laser printers. PDF is printed to a PostScript file for GSview to read goBCL converts PDF files to HTML. Application merges GSview images with HTMLish tags. Allows text queries to display related page.

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 11 How well did it work? Didn’t state how many files were converted Identified a few bugs in goBCL Alluded to problems decoding JPEG files Executed queries Claimed success

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 12 Miscellenea Appendix with notational UVC architecture Appendix with marcos to support UVC software development Appendix containing a logical view of a PDF document

18 March 2004ODU Spring 2004 CS-891 Digital Data Preservation 13 Additional Links Lorie appears to have published a fair amount about relational database systems A list of Lorie’s publications – Yet another UVC paper (15 June 2001) – Page with all sorts of preservation links: –