Download presentation
Presentation is loading. Please wait.
Published byLeon Oliver Modified over 9 years ago
1
XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne
2
What are “significant characteristics”? Those properties of a digital file which have to be known to enable the processing of the file within a specific setup.
3
Why extract them by software? To create technical metadata as required by organizational models for long term preservation. (NLNZ)
4
Within Planets … … served by solutions to identify formats: formats registry / PRONOM / DROID. … and a solution for extracting and processing such characteristics: XCL.
5
Migrator tiff png Extractor tiff XCELpng XCEL Comparator png XCDL tiff XCDL 93% A Vision
6
Extractor Appropriate XCELs Comparator C-Set A Vision
7
1 million objects: use one second for each. == 16666.7 minutes == 277.8 hours == 11.57 working days of a computer == 34.7 8-hour days for a Human == 7 working weeks Why automate?
8
1 million objects: use five minutes for each. == 416 666.7 hours == 52 803.4 8-hour days for a Human Why automate?
9
Assumption: Preservation is only feasible, if the content of two digital objects can be compared without human intervention, giving a numerical estimate of their degree of similarity. Why automate?
10
(1)Language to represent the complete content of a digital object. XCDL (2)Language to describe any machine readable format in a formal language. XCEL (3)Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4)Software to compare two such content descriptions. “comparator” Abstract solution I
11
....... 4 height 1 greyscale 0 imageType 1 zlibDeflateInflate 0 compression...... compression 0... height 0 0 1 ad 429 uint32 imageType.....
12
32
13
(1)Language to represent the complete content of a digital object. XCDL (2)Language to describe any machine readable format in a formal language. XCEL (3)Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4)Software to compare two such content descriptions. “comparator” Abstract solution I
14
Are the following two items equal: VIII 8
15
VIII 8 eight
16
VIII 8 eight otto
17
VIII 8 eight otto acht
18
VIII 8 eight otto acht 8.0
19
VIII 8 eight otto acht Information model: „an image“
20
VIII 8 information model: „an image“ format ontology: „what terms are used in formats to describe image properties“
21
Extraction language: “how to get the terms describing an image out of a file” Information model: „what is an image“ Format ontology: „what terms are used in formats to describe image properties“
22
(1)A theoretical model of information (not: data) types – “image”, “text”, “audio”... (2)Ontologies, which map existing file format terminologies onto these model. (3)A language – XCDL – which allows to express the content of files in different formats using the vocabulary of the ontologies and the “grammar” of the information model. Abstract solution II
23
eXtensible Characterisation Definition Language Purpose: Describe the contents of a file in terms of an abstract model. XCDL
24
XCDL: text model (1) A text (= ) is composed of data (= ) plus interpretations of data according to the underlying format specification (= ).
25
XCDL: text model (2) Or, one level of abstraction higher, a text is composed of content carrying tokens, accompanied by rendering info plus deployment info plus historical info.
26
This is a text 54 68 69 73 20 69 73 20 61 20 74 65 78 74 … fontsize 48 unsignedInt8
27
This is a text 54 68 69 73 20 69 73 20 61 20 74 65 78 74 … fontsize 48 unsignedInt8
28
Thank you! Questions? Manfred.thaller@uni-koeln.de
29
XC(E/D)L - & related issues (originally from Sebastian Beyl)
30
Already known XCEL Machine readable format description XCDL Normdatas and properties from original file ORIGINAL FILE Extractor
31
Problem: propertySets and relation to normdatas normdatas original file property 1 property 2
32
Problem: propertySets and relation to normdatas pSet. 3 pSet. 3 propertySet 2 again! propertySet 2 again! propertySe t 2 propertySet 1 again! propertySet 1 again! propertySet 1 propertySet 1 normdatas XCDL property 1 property 2
33
Problem: propertySets and relation to normdatas pSe t. 3 pSe t. 3 propertySet 2 again! propertySet 2 again! propertyS et 2 propertySet 1 again! propertySet 1 again! propertySet 1 propertySet 1 normdatas XCDL property 1 property 2 Rules: - Relation to normdata ONLY with propertySet - No overlapping relations - every propertySet-definition (in one object) only once
34
Problem: recursive structures Footnote example from koffice.org
35
Problem: recursive structures Footnote example from koffice.org normdata
36
Problem: recursive structures Footnote example from koffice.org Property fontsize normdata Property fontSize
37
Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote
38
Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote normdata of property?
39
Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote property of normdata of property? How to bring it in XCDL?
40
Problem: recursive structures Property „Object B“ as footnote Footnote example from koffice.org Rules: properties and propertySets only for ONE object Upper object always points to lower object, so lower object can exists itself Object A normdata Property fontSize Object B normdata Property fontSize Object A Object B
41
Problem: embedded objects Example from wikipedia.de
42
Problem: embedded objects Example from wikipedia.de Original (container) file Text datas Picture datas as embedde d file
43
Problem: embedded objects Example from wikipedia.de extraction XCDL-Object A (text datas) XCDL-Object B (image datas) Object A handles object B as an „image property“ Original (container) file Text datas Picture datas as embedde d file
44
XCDL-Object A (text datas) XCDL-Object B (image datas) Object A handles object B as an „image property“ Problem: embedded objects Example from wikipedia.de Standalone Image-XCDL Rules: If upper object (A) is not readable or cannot use for comparison, the embedded object can be Handled as a „ Standalone “ -XCDL
45
Problem: embedded objects Example from wikipedia.de XCDL-Object A (text datas) XCDL-Object B UNKNOWN IMAGE FORMAT Second Parsing, if known Image format Rules: If lower object (B) cannot be parsed, raw datas can be stored for later parsing, without data-loss or comparison problems for upper object (A)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.