1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.

Slides:



Advertisements
Similar presentations
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
Advertisements

Archiving for legal purposes How to implement the new Belgian legislation to destroy physical invoices and use an electronic archive.
E-Discovery New Rules of Civil Procedure Presented by Lucy Isaki January 23, 2007.
CHAPTER 7 Roderick Dickson Kelli Grubb Tracyann Pryce Shakita White.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Disk and Tape Storage Cost Models Richard Moore & David Minor San Diego Supercomputer.
1 Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Preservation Strategies: What do long-term archives do with my data? Jeff Arnfield NOAA’s National Climatic Data Center Version 1.0 Review Date.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Stanford Archival Vault (SAV)
DSpace Rea Devakos and Gabriela Mircea University of Toronto Libraries.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
1 Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
Chapter 12 File Management Systems
Chapter 14 The Second Component: The Database.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
1 Copyright and Intellectual Property Design Issues by Jeremy Rowe
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University www-db.stanford.edu/peers/
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
1 Stanford Archival Repository Project Brian Cooper Arturo Crespo Hector Garcia-Molina Department of Computer Science Stanford University.
1 From Filing Cabinet to Desktop and Network: Records Management in N.C. State Government Ed Southern Government Records Branch N.C. Office of Archives.
Chapter 1 Database and Database Users Dr. Bernard Chen Ph.D. University of Central Arkansas.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Chapter 1 Database and Database Users Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Architecture of information systems Document managment system Peter Záhorák.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Untitled (Hidden Track): Born Digital Content Preservation Service at UIUC Tracy Popp, MS LIS, CAS Digital Preservation Coordinator University Library.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Master Thesis Defense Jan Fiedler 04/17/98
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Elements of a Data Management Plan: Roles and Responsibilities Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date.
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst.
Unix Security.  Security architecture  File system and user accounts  Integrity management  Auditing and intrusion detection.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
VITAL at the National Library of Wales Glen Robson
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
OCLC Online Computer Library Center The ‘Hows’ and ‘Whys’ of Preserving Digital Materials Brian Lavoie Research Scientist OCLC CARL program: “Here Today,
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Audit COM380 University of Sunderland Harry R. Erwin, PhD.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Unit 10 ITT TECHNICAL INSTITUTE NT1330 Client-Server Networking II Date: 2/24/2016 Instructor: Williams Obinkyereh.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
KEEPS – a system for UELMA preservation and security
KEEPS – a system for UELMA preservation and security
Open Exeter Project Team
Trusted Repository Systems Overview
Experiences and Outlook Data Preservation and Long Term Analysis
Emulation: Good or Bad? Emulation as a Digital Preservation Strategy – Stewart Granger Reality and Chimeras in the Preservation of Electronic Records –
Data Preservation During Upgrades
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

2 Motivation Digital information already lost: –Early NASA records –U.S. Census Information –Toxic Waste records Decay Time for common media: –Magnetic Tapes:10-20 years –CD-ROM:5-50 years –Hard Drive:3-5 years Obsolescence of Digital Media is even faster

3 Preservation of Digital Objects Data Preservation Meaning Preservation Our work only addresses Data Preservation

4 A Case Study: Stanford/MIT CSTR StanfordMIT CSTR Scenario: –Need for on-line access of documents –But also for long-term archival of document

5 Is This a Solved Database Problem? Database systems can reliably store objects However: –Need same or compatible system –Migration is problematic Our architecture coordinates database systems, it does not replace them.

6 Contribution An architecture and algorithms for: –Long-term Archival Storage of Digital Objects –Allowing on-line access to Digital Objects –Preserving data as technology and organizations evolve

7 Key Concepts Signatures as Object Handles Deletions are not allowed Reliability is achieved through Replication Layered Architecture Awareness Everywhere Disposable Auxiliary Structures

8 Signatures as Object Handles Object Handles identify objects –Internal to the Digital Library Repository –Users may need high level naming facilities –Traditional approaches Signatures: –Checksum or CRC of the object f ( ) signature object

9 Properties of Signatures as Object handles Each site can generate handles independently Handles can be reconstructed from the object Copies automatically have same handle Objects with different content have high probability of having different handles Cannot modify objects s1 s2  s1s3  s2 s4 = s1

10 Signature Collisions A very rare event if signatures are 128 bits or more. Assumes uniform distribution of handles and objects bigger than signatures Collection Size Probability of having Collisions Signature Size bits bits bits bits

11 No Deletions Objects are never (voluntarily) deleted This simplifies many things: –Distinguishes between deleted and corrupted objects (improving reliability) –Easier recovery from failures However, it complicates others: –“Wasted” space –Version management

12 No Deletions No deletion rule is natural in Digital Libraries Wasted space is not critical as: –Storage cost is low –Only archiving library objects, not all possible data

13 Version Management “Natural” structure for versions Object O2 Object O1 Version Object

14 Versions How can we find the latest version? Object O1 Object O2 Version2 (latest) Version1 Version Object tuple

15 Sets Object O1 Object O2Member 1 Member 2 Set Object tuple

16 Reliability Service Long term persistence is achieved by replication Sites establish Replication Agreements to maintain copies of objects in a given Replication Group StanfordMIT

17 Reliability Service Initial State: StanfordMIT V Version Object V1

18 Reliability Service StanfordMIT V Version Object V1 Version Object

19 Reliability Service StanfordMIT V Version Object V1 V Version Object V1

20 Reliability Service StanfordMIT V Version Object V1V2 V Version Object V1V3

21 Reliability Service Stanford V Version Object V1V2V3 MIT V Version Object V1V2V3

22 Layered Architecture User Access Security and Accounting Import Metadata and Indexing Reliability Complex Object Identity Object Store Data Store

23 Awareness Everywhere Awareness services: standing orders, subscriptions, alerts, etc. Critical for Digital Libraries Should be part of the interface of every layer.

24 Disposable Auxiliary Structures Auxiliary Structures can be reconstructed from the Digital Objects –Avoid potential inconsistencies –Easier to migrate objects Example: Index of disk-ids to object handles D1D2 Identity Layer HandleD1 Index

25 Related Work Other Digital Library architectures Report of the task force on preserving digital information Petal and Frangipani projects COLD systems

26 Conclusions Architecture for long-term archiving of digital objects Allows efficient on-line access Simple, yet powerful repository built by: –using signatures as handles –not allowing deletions –having awareness services everywhere –using only disposable auxiliary structures