Download presentation
Presentation is loading. Please wait.
Published byAlaina Shepherd Modified over 9 years ago
1
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 1 WARC as Package Format for all Preserved Digital Material by Eld Zierau The Royal Library of Denmark Except from WARC update record these are slides from - iPres 2012 and - PiF 2014
2
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 2 Motivation Bit preservation on considerable part of digital material: Digitally born materials Substitution digitalization Web archive There are many types of digital materials We need packaging: Preserve link between identifier and object (incl. files) Avoid many different package formats – Preferable one Explained later
3
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 3 How to evaluate 1. Find requirements ◦ Package and storage related ◦ Preservation related requirements (from formats) ◦ Identification related requirements 2. List of possible formats ◦ AFF ◦ ARC ◦ BagIt ◦ METS ◦ RAR ◦ TAR ◦ WARC ◦ ZIP ◦…◦… 3. Evaluate which format fits the requirements best Looking for one format
4
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 4 List of Preservation related requirements Req.1: Must be Independent of storage platform Req.1: Must be Independent of storage platform Req. 2: Must allow flexible packaging Req. 2: Must allow flexible packaging Req. 3: Must allow update records Req. 3: Must allow update records Req. 4: Must be standardized format Req. 4: Must be standardized format Req. 5: Must be open Req. 5: Must be open Req. 6: Must be easy to understand Req. 6: Must be easy to understand Req. 7: Must be widely used in bit repositories Req. 7: Must be widely used in bit repositories Req. 8: Must be supported by existing tools Req. 8: Must be supported by existing tools Req. 9: Must be able to include digital files unchanged Req. 9: Must be able to include digital files unchanged Req. 10: Must facilitate identifiers for a digital object Req. 10: Must facilitate identifiers for a digital object Not an exhaustive list
5
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 5 Requirement 10: Must facilitate identifiers for a digital object Requirement 10: Must facilitate identifiers for a digital object Identification related requirements Especially a challenge when the object is ’a file’ 15AE9513 15AE9513 Service Provider Producer Con- sumer Object Object id. Object id. & Service Object 15AE9513 ?
6
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 6 Reasons for id. requirements 1. Leave it 100% to the bit preservation solution ◦ Risk since it is crucial information in preservation – outsourcing of responsibility ◦ Eliminate possible optimisation of packaging more files or files and metadata in the same package 2. Naming files with the identifier ◦ file name is not part of the file itself ◦ restrictions to how files are named ◦ may not make same sense in the future 3. Put identifier into files as inherited metadata ◦ knowledge of how to extract identifiers from file formats ◦ would need to change original bits 4. Wrap files and identifier in a package format ◦ requirements for the abilities of the package format put the id. with the file 15AE9513 15ae9513.abc 15 9513 ? Year 2052 15AE9513 FileId: 15AE9513… Year 2052 ? 15AE9513.ABC 15AE9513
7
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 7 Position based like ARC and WARC WARC is an ISO standardised enhancement of ARC WARC/1.0 WARC-Type: warcinfo WARC-Date: 2012-08-27T15:50:16Z WARC-Record-ID: Content-Type: application/warc-fields Content-Length: 46 application: id.kb.dk/gatekeeper/releasetest17 WARC/1.0 WARC-Type: resource WARC-Target-URI: urn:uuid:15AE9513 WARC-Date: 2012-08-27T15:50:14Z WARC-Record-ID: Content-Type: image/tiff Content-Length: 139803706 II*1214i eeciRGB v2 P`p¡²ÃÔå,>PcuÁÕèü$8Ma … 15AE9513 WARC package ID
8
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 8 Packaging the preservation metadata WARC/1.0 WARC-Type: warcinfo WARC-Date: 2013-01-18T19:27:59Z WARC-Record-ID: Content-Type: application/warc-fields Content-Length: 79 description: http://id.kb.dk/authorities/agents/kbDkDBIngest.html revision: v4 WARC/1.0 WARC-Type: resource WARC-Target-URI: urn:uuid:15AE9513 WARC-Date: 2013-01-18T19:27:59Z WARC-Block-Digest: md5:3f349a40b0c47bb070ea6bdd2759a731 WARC-Record-ID: Content-Type: image/tiff Content-Length: 139803706 II*1214i eeciRGB v2 P`p¡²ÃÔå,>PcuÁÕèü$8Ma … 15AE9513 WARC package ID
9
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 9 Metadata Standards and use in Preservation WARC/1.0 WARC-Type: metadata WARC-Target-URI: urn:uuid:c9db2170-619c-11e2-911b-005056887b67 WARC-Date: 2013-01-18T19:27:59Z WARC-Refers-To: WARC-Block-Digest: sha1:62cc454ef47c7d54b77f871ab1ffd3f580307414 WARC-Record-ID: Content-Type: text/xml Content-Length: 13926 … UUID UUID 41d153d1-0099-11e2-9397-005056887b67 41d153d1-0099-11e2-9397-005056887b67 …</mets> IE ID for ‘landing page’ of different representations
10
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 10 Update via Preservation Metadata At this stage only metadata (to be implemented) Two ways: 1.In preservation metadata 2.Using a ”new” uodate WARC-record 1. Using PREMIS – the event is an update referring back 2. WARC allows for other record types than source and metadata In both cases use ’concurrentTo’ as shortcut
11
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 11 Package Formats - fulfilment of requirem. Formats Requirements AFFARCBagItMETSRARTarWARCZIP 1. Platform independent 2. Flexible packaging 3. Supports update pack. 4. Standardised 5. Open 6. Easily understandable 7. Widely used in BRs 8. Tools available 9. Include files unchanged 10. Identifiers for files FulfilledNearly thereMiddleTo some extentNot at all
12
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 12 Conclusion Recommends WARC as the best suited format for long term preservation of varied digital materials. Strong on: ◦ applying identifiers to files ◦ easily understandable ◦ one of few formally standardised formats ◦ extendible with record definition for updates Weak on: ◦ Not well supported by tools ◦ Only widely used in web archiving WARC 1. Platform independent 2. Flexible packaging 3. Supports update pack. 4. Standardised 5. Open 6. Easily understandable 7. Widely used in BRs 8. Tools available 9. Include files unchanged 10. Identifiers for files But may change With given requirements
13
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 13 Questions Images of this style from digitalbevaring.dk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.