Download presentation
Presentation is loading. Please wait.
Published byMariah Cunningham Modified over 6 years ago
1
Preservation DataStores - Storage Assist for Preservation Environments
Presenter: Simona Cohen Haifa Research Lab Team: Simona Cohen, Michael Factor, Kalman Meth, Dalit Naor Leeat Ramati, Petra Reshef, Julian Satran, Yaron Wolfsthal
2
What is Preservation? Challenge: preserve large amounts of heterogeneous data for long periods of time (tens if not hundreds of years) Want to preserve the Information and not only Bits Preservation of Information implies continuing Understandability and Usability Preservation of Information is hard and requires vigilance Changes in technologies Changes in users (Designated Community) The amount of digital preservation data can increase fast and become very large in the future NARA projected that in year 2010, they will have 10,000 TB of data to be preserved forever, in ,000 TB, and in ,000 TB Along with that comes large metadata that needs to be added to the raw data in order to interpret it
3
Is Preservation Needed?
OSHA requires employers to keep records of both medical and other employees who are exposed to toxic substances and harmful agents. Employers must maintain these records for 30 years The retention requirement for the [medical] records of minors varied from 20 to 43 years of age X-rays are often stored for periods of 75 years Medical records should be preserved for the life of the individual and beyond Healthcare Rule 17a-4 requires broker-dealers to retain account record information for six years. The six-year period begins either at the time the account is closed or when the information is replaced or updated Finance Life insurance policies has to be kept for life of policy plus 6-10 years Aerospace Aircraft designs records have to be retained for the lifetime of each aircraft (30+ years) Pharma needs off-line electronic data storage for 50 to 100 years or longer Pharma Petroleum Oil-field data is used over life of field (50+ years) Scientific and Cultural Satellite data is kept for ever We would like to keep Libraries and Art data for ever
4
Preservation Approaches
Museum approach Content and rendering devices are preserved in their original state and maintained operational. Does not allow re-interpretation of the data, requires maintenance of lots of software/hardware Best example: ability to print documents Emulation approach Keep the content in its original form Adapt the rendering device by emulating it to up-to date software and computers UVC (Universal Virtual Computer) approach, pioneered by Raymond Lorie from IBM Almaden Reduces the problem to that of preserving the UVC platform Migration approach Migrate key characteristics content Preserve characteristics ensuring its identity and integrity May introduce noise Descriptive approach Preserve description enabling its reproduction (e.g. artistic data, scores) Do not preserve content or its rendering device
5
What is Preservation DataStores?
Storage assist for preservation environments Supports Open Archival Information System Reference Model ISO Archiving standards (ISO:14721:2002) The storage component of CASPAR ( Generic - agnostic to the type of application, type of stored data, or the physical layer (disk, tape, …) Scaleable Offloading functionality to the storage layer Decrease the probability of data loss Simplify the applications Provide improved performance and robustness Based on Object Storage Supports the various preservation approaches Originated and developed in IBM Haifa Research Lab, Israel
6
Preservation DataStores
The CASPAR Framework Preservation DataStores
7
OAIS Functional Model Preservation DataStores
8
Bit Preservation vs. Logical Preservation
Bit preservation – ability to restore the bits in the presence of storage media degradation, storage media obsolescence, environmental catastrophes like fire, flooding, etc. Products exist and well tested – copy services, refreshment, error correcting codes modules Logical preservation - preserving the understandability and usability of the data in the future current technologies for computer hardware and software may not exist anymore, and the users of the data may be not born yet. Technology is still in research phase Preservation DataStores concentrates on supporting logical preservation
9
Preservation on Tapes vs. Preservation on Disks
Individual disk drives provide: random access sub second performance for 50 Megabytes not reliable and tend to deteriorate approximately every 3 years Tapes provide: serial access transfer time is 10 times slower than that of disks more reliable and their expected lifetime is 3-10 times higher than that of disks consume 25 times less power than disks Less cooling cost Tapes are much more cost-effective than disks Preservation DataStores supports disks and tapes where the disks are used as cache and tapes are the ultimate place of the data
10
Preservation DataStores Functionality
Support migration Load and execute transformations Self-describing export format Strong encapsulation of metadata with the data Complex interrelated objects, context information, provenance information, formats, representation information The association of raw with meta data is integral; otherwise, the association needs to be preserved as well (a recursive problem). Graceful loss of data Minimize the effect of media loss/corruption Self-describing self contained media format Enable the following functions in the presence of long-lived data and multiple migrations Provenance, chain of custody Fixity – authenticity/integrity Future new interpretations and applications for the data
11
Preservation DataStores Architecture
Preservation Web Services OAIS-based Preservation Package Preservation WSDL Preservation application XAM API (*) XAM Package XAM to OSD Package Data Consumer WAS CE web service Higher Level API + Object Store Security Admin backend (*) The API includes application hints to denote OAIS “hot” attributes such as the attribute that links to representation information object Preservation DataStore
12
100 Years Archive Task Force
A task force in Storage Networking Industry Association (SNIA) Aims to define best practices and storage standards for long term digital information retention It conducts now a survey to collect business and IT requirements for long term data retention 63 questions Over 200 responses to date Join at
13
Survey Partial Results
36.8% wants to preserve over 100 years Compliance requirements are the main external factors driving the requirements for long term digital archives The applications that generate data that needs to be preserved are databases, , custom business apps 61% do nothing to assure logical preservation 63% do nothing to deal with legal discovery 83% would like to have interoperable long-term storage systems
14
Summary There is a need for a new storage system that is preservation aware and based on OAIS. It should offload functionality to the storage layer Decrease the probability of data loss Simplify the applications Provide improved performance and robustness Preservation DataStores are such OAIS-based preservation aware storage. Preservation DataStores will be developed and experimented within the CASPAR EU project Preservation DataStores is originated and developed in IBM Haifa Research Lab, Israel.
15
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.