Presentation is loading. Please wait.

Presentation is loading. Please wait.

XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne.

Similar presentations


Presentation on theme: "XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne."— Presentation transcript:

1 XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne

2 What are “significant characteristics”? Those properties of a digital file which have to be known to enable the processing of the file within a specific setup.

3 Why extract them by software? To create technical metadata as required by organizational models for long term preservation. (NLNZ)

4 Within Planets … … served by solutions to identify formats: formats registry / PRONOM / DROID. … and a solution for extracting and processing such characteristics: XCL.

5 Migrator tiff png Extractor tiff XCELpng XCEL Comparator png XCDL tiff XCDL 93% A Vision

6 Extractor Appropriate XCELs Comparator C-Set A Vision

7 1 million objects: use one second for each. == 16666.7 minutes == 277.8 hours == 11.57 working days of a computer == 34.7 8-hour days for a Human == 7 working weeks Why automate?

8 1 million objects: use five minutes for each. == 416 666.7 hours == 52 803.4 8-hour days for a Human Why automate?

9 Assumption: Preservation is only feasible, if the content of two digital objects can be compared without human intervention, giving a numerical estimate of their degree of similarity. Why automate?

10 (1)Language to represent the complete content of a digital object. XCDL (2)Language to describe any machine readable format in a formal language. XCEL (3)Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4)Software to compare two such content descriptions. “comparator” Abstract solution I

11 ....... 4 height 1 greyscale 0 imageType 1 zlibDeflateInflate 0 compression...... compression 0... height 0 0 1 ad 429 uint32 imageType.....

12 32

13 (1)Language to represent the complete content of a digital object. XCDL (2)Language to describe any machine readable format in a formal language. XCEL (3)Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4)Software to compare two such content descriptions. “comparator” Abstract solution I

14 Are the following two items equal: VIII  8

15 VIII  8 eight

16 VIII  8 eight otto

17 VIII  8 eight otto acht

18 VIII  8 eight otto acht 8.0

19 VIII  8 eight otto acht Information model: „an image“

20 VIII  8 information model: „an image“ format ontology: „what terms are used in formats to describe image properties“

21 Extraction language: “how to get the terms describing an image out of a file” Information model: „what is an image“ Format ontology: „what terms are used in formats to describe image properties“

22 (1)A theoretical model of information (not: data) types – “image”, “text”, “audio”... (2)Ontologies, which map existing file format terminologies onto these model. (3)A language – XCDL – which allows to express the content of files in different formats using the vocabulary of the ontologies and the “grammar” of the information model. Abstract solution II

23 eXtensible Characterisation Definition Language Purpose: Describe the contents of a file in terms of an abstract model. XCDL

24 XCDL: text model (1) A text (= ) is composed of  data (= ) plus  interpretations of data according to the underlying format specification (= ).

25 XCDL: text model (2) Or, one level of abstraction higher, a text is composed of content carrying tokens, accompanied by rendering info plus deployment info plus historical info.

26 This is a text 54 68 69 73 20 69 73 20 61 20 74 65 78 74 … fontsize 48 unsignedInt8

27 This is a text 54 68 69 73 20 69 73 20 61 20 74 65 78 74 … fontsize 48 unsignedInt8

28 Thank you! Questions? Manfred.thaller@uni-koeln.de

29 XC(E/D)L - & related issues (originally from Sebastian Beyl)

30 Already known XCEL Machine readable format description XCDL Normdatas and properties from original file ORIGINAL FILE Extractor

31 Problem: propertySets and relation to normdatas normdatas original file property 1 property 2

32 Problem: propertySets and relation to normdatas pSet. 3 pSet. 3 propertySet 2 again! propertySet 2 again! propertySe t 2 propertySet 1 again! propertySet 1 again! propertySet 1 propertySet 1 normdatas XCDL property 1 property 2

33 Problem: propertySets and relation to normdatas pSe t. 3 pSe t. 3 propertySet 2 again! propertySet 2 again! propertyS et 2 propertySet 1 again! propertySet 1 again! propertySet 1 propertySet 1 normdatas XCDL property 1 property 2 Rules: - Relation to normdata ONLY with propertySet - No overlapping relations - every propertySet-definition (in one object) only once

34 Problem: recursive structures Footnote example from koffice.org

35 Problem: recursive structures Footnote example from koffice.org normdata

36 Problem: recursive structures Footnote example from koffice.org Property fontsize normdata Property fontSize

37 Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote

38 Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote normdata of property?

39 Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote property of normdata of property? How to bring it in XCDL?

40 Problem: recursive structures Property „Object B“ as footnote Footnote example from koffice.org Rules: properties and propertySets only for ONE object Upper object always points to lower object, so lower object can exists itself Object A normdata Property fontSize Object B normdata Property fontSize Object A Object B

41 Problem: embedded objects Example from wikipedia.de

42 Problem: embedded objects Example from wikipedia.de Original (container) file Text datas Picture datas as embedde d file

43 Problem: embedded objects Example from wikipedia.de extraction XCDL-Object A (text datas) XCDL-Object B (image datas) Object A handles object B as an „image property“ Original (container) file Text datas Picture datas as embedde d file

44 XCDL-Object A (text datas) XCDL-Object B (image datas) Object A handles object B as an „image property“ Problem: embedded objects Example from wikipedia.de Standalone Image-XCDL Rules: If upper object (A) is not readable or cannot use for comparison, the embedded object can be Handled as a „ Standalone “ -XCDL

45 Problem: embedded objects Example from wikipedia.de XCDL-Object A (text datas) XCDL-Object B UNKNOWN IMAGE FORMAT Second Parsing, if known Image format Rules: If lower object (B) cannot be parsed, raw datas can be stored for later parsing, without data-loss or comparison problems for upper object (A)


Download ppt "XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne."

Similar presentations


Ads by Google