Download presentation
Presentation is loading. Please wait.
Published byDamian Day Modified over 9 years ago
1
Centro Ricerche e Innovazione Tecnologica TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK Monday 12th – Friday 16th May 2008 Giorgio Dimino RAI Research Centre g.dimino@rai.it Storage and repositories
2
Centro Ricerche e Innovazione Tecnologica Reference Model for an Open Archival Information System (OAIS) Consultative Committee for Space Data Systems (CCSDS) This document is a technical Recommendation for use in developing a broader consensus on what is required for an archive to provide permanent, or indefinite long-term, preservation of digital information. This Recommendation establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It allows existing and future archives to be more meaningfully compared and contrasted. It provides a basis for further standardization within an archival context and it should promote greater vendor awareness of, and support of, archival requirements.
3
Centro Ricerche e Innovazione Tecnologica OAIS environment model ProducerConsumer Management OAIS archive Provides content to archive Uses the archive content Decides archive strategic objectives
4
Centro Ricerche e Innovazione Tecnologica Data vs. Information OAIS definition Data object Information object Representation information yelds Interpreted using its 10010111 What we store What we want Knowledge about data interpretation
5
Centro Ricerche e Innovazione Tecnologica Video data formats Uncompressed raster formats YUV and RGB Standard definition 4:2:2 video, 270 Mb/s, requires 120 GB per hour Lossless compression (e.g. JPEG2000) Variable efficiency, on average ½ of the uncompressed Compressed formats (e.g. MPEG2, MPEG4, VC1,DV) Compression depends on the final quality expected, typical bit rates from 3 Mb/s to 50 Mb/s, up to 100 times reduction The “Representation Information” needed to interpret compressed formats is generally extremely complex. Rendering is done using specific software or hardware. The written specification must be seen only as a last resort disaster recovery option
6
Centro Ricerche e Innovazione Tecnologica Video quality, some considerations Digital master Result of digitisation of analogue tapes. It becomes the new master to replace the corresponding analogue tape. It should be stored at maximum quality Publication master If keeping the all the digital masters on line is too expensive, a surrogate master can be generated in some cases at lower quality from which all the subsequent publication copies will be derived by transcoding Publication version The version that is delivered to the user of a particular service (an archive can offer several services based on the same content) Viewing version A version at reduced quality used for content selection
7
Centro Ricerche e Innovazione Tecnologica OAIS Information Package Content Information Preservation Description Information Packaging Information Descriptive Information Provenance Context Reference Fixity Data object Representation information
8
Centro Ricerche e Innovazione Tecnologica Video packaging (wrappers) SMPTE MXF MPEG2 TS Microsoft ASF AVI Apple Quicktime Adobe Flash FLV SWF For reference see http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml
9
Centro Ricerche e Innovazione Tecnologica OAIS collabration diagram
10
Centro Ricerche e Innovazione Tecnologica OAIS functional entities
11
Centro Ricerche e Innovazione Tecnologica Storage technologies Data tapes IBMLTO Ultrium 4 800 GB Quantum DLT-S4 800 GB SonySAIT800 GB StoragetekT10000 500 GB Hard disk Up to 1 TB per disk 3.5” Several RAID configurations possible Solid State Disks Still expensive but becoming interesting Capacity still lower than hd128 GB (announced products) 2.5” Optical Disks DVD RW9 GB Blu-Ray50GB
12
Centro Ricerche e Innovazione Tecnologica Some remarks The choice of storage technologies depends on many factors, including: Total amount of data Expected increase rate Desired throughput Access performance Data security No storage media can last forever No technology can be considered 100% reliable Never keep single copies! Obsolescence occurs very rapidly Data migration must be considered part of the management process, not an emergency operation
13
Centro Ricerche e Innovazione Tecnologica Digital Vs Analogue Archive (Bookshelf meters required for 1000 hours of audio data) 800 GB today 1 TB today
14
Centro Ricerche e Innovazione Tecnologica Flat storage File server User Front end Selection Content Data base NAS
15
Centro Ricerche e Innovazione Tecnologica Storage hierarchy Near-Line On line Fast Hard Disk/RAID Tape (robot) Solid State Disk RAM RAID
16
Centro Ricerche e Innovazione Tecnologica Hierarchical Storage Management (HSM) HD cache Tape robotic storage File server User Front end Selection Content Data base
17
Centro Ricerche e Innovazione Tecnologica Federated storage (GRID) Based on GRID concepts of distributed computing and file system over a WAN Multiple self-contained storage nodes interconnected Each storage node contains its own storage medium, microprocessor, indexing capability, and management layer, generally based on commodity pc Advantages Fault tolerance Scalability Throughput Example: Google File System, Apache HADOOP
18
Centro Ricerche e Innovazione Tecnologica Basic functionalities Virtualization The user sees a single file system Data replication The system automatically manages the desired redundancy Direct access to data Data move from storage node to client without intermediation Dynamic reconfiguration Nodes can be switched on and off while the system is in operation Automatic load balancing Exploiting data replication and direct node access
19
Centro Ricerche e Innovazione Tecnologica Data blocking and replication A data file is divided into fixed length blocks Each block is replicated n times on different nodes File data data data data data data data data data data data data data data data data data data Block 1 Block 2 Block 3 Block 4 Node 1 Node 2 Node 3 Node 4 Node 5
20
Centro Ricerche e Innovazione Tecnologica Architecture Node DataNodes Name Node Name Node user Filename Nodes list Data chunks Cluster 1Cluster 2 Node
21
Centro Ricerche e Innovazione Tecnologica Digital Asset Management (1) A software system that implements all the archive management policies Provides the archive administrator the necessary tools to Monitor the preservation state of the media Restore backup copies when primary media is damaged Monitor the use of the storage Monitor software/hardware failures Define ingestion and access policies Should provide support for technology/system migration
22
Centro Ricerche e Innovazione Tecnologica Digital Asset Management (2) Provides the necessary functionalities to implement the ingestion workflow Receive the SIP (or a batch of) Analyse the SIP, verify that all the vital metadata are valid Assign UMIDs Transcode SIP into AIP Generate proxies (low resolution video, key frames) Provide content documentation Provides the functionalities to implement the access workflow Verify that the user has access rights Provide content selection functionalities (search retrieval and browsing) Verify content associated rights Transcode AIP into DIP (it can depend on user request) Deliver the DIP
23
Centro Ricerche e Innovazione Tecnologica OAIS Functions of Archival Storage
24
Centro Ricerche e Innovazione Tecnologica Business rights management A BRM is a system that manages content associated usage rights Without an automated BRM system the reuse of content can be slowed down by manual rights clearing operations Depending on the type of archive it can be convenient to have BRM closely coupled with DAM
25
Centro Ricerche e Innovazione Tecnologica Digital archive design (1) Analyse and state clearly your business requirements What is your archive primary goal Who are your users Producers Consumers … and what are their needs Assess your content Amount of items Conservation status Increase rate usage
26
Centro Ricerche e Innovazione Tecnologica Select archive video formats and quality Target archived quality depends on foreseen usage and preservation issues Define the AIP (Archive Information Package) Video coding File formats Associated metadata Extimate storage requirements Amount of data Level of security of data Increase rate Input output performace Digital archive design (2)
27
Centro Ricerche e Innovazione Tecnologica Define ingestion workflow and SIP Ingestion procedures are particularly critical if your content needs digitization and restoration Define access workflow and DIP Access is heavily dependent on proper documentation and retrieval tools Properly dimension throughput Affected by video bitrate and transcoding from AIP to DIP Define archive maintenance procedures Consistency check Media replacement Disaster recovery Digital archive design (3)
28
Centro Ricerche e Innovazione Tecnologica Consider migration Storage technology Media capacity follows Moore’s law … but sometime there is a technology leap (e.g. from tape library to hd arrays) Coding formats Compression schemes become more efficient allowing grater bit saving at a given quality –Older formats become obsolete –Transcoding generally implies possible loss of quality Software/hardware Proprietary formats often pose upgrade constraints Digital archive design (4)
29
Centro Ricerche e Innovazione Tecnologica Consider needs to interfacing to other systems Federated libraries Account systems Production Digital rights management … and finally design or commission a system Digital archive design (5)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.