EXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006.

Slides:



Advertisements
Similar presentations
PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06.
Advertisements

XCEL / XCDL Tools Jan Schnasse PLANETS: Den Haag,
PC/4 Manfred Thaller PLANETS TB meeting, DenHaag, Sept 29th. '06.
Characterisation Adrian Brown The National Archives, UK.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
HTML popo.
Tutorial 12: Enhancing Excel with Visual Basic for Applications
Development of Accessible E- documents and Programs for the Visually Impaired Accessibility in electronic documents (V2010)
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
What is a text within the Digital Humanities, or some of them at least? Manfred Thaller, Universität zu Köln Digital Humanities 2012, July 20 th 2012.
Joachim Bauer Senior System Engineer, CCS
ETD 2003, Berlin 1 LaTeX as an Archiving Format: Benefits and Problems Experiences from the MathDiss International Project and the EMANI project.
Information Retrieval in Practice
EE442—Multimedia Networking Jane Dong California State University, Los Angeles.
WMES3103 : INFORMATION RETRIEVAL
© 2010 Microsoft Corporation. All rights reserved. Quality Assurance: Towards Tools for Characterizing and Comparing Digital Documents Natasa Milic-Frayling.
Overview of Search Engines
On Error Preserving Encryption Algorithms for Wireless Video Transmission Ali Saman Tosun and Wu-Chi Feng The Ohio State University Department of Computer.
Data starts with width and height of image Then an array of pixel values (colors) The number of elements in this array is width times height Colors can.
The PLANETS-Ontology in the context of the PLANETS-Testbed and the XCL-Software.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Measurement theory - for the interested student Erland Jonsson Department of Computer Science and Engineering Chalmers University of Technology.
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
The XCL Languages Digital Preservation – The Planets Way Dresden, April 23 rd 2010 Manfred Thaller, Universität zu Köln.
Presented by Nassib Awad
JRN 440 Adv. Online Journalism Raster file formats for print Wednesday, 2/8/12.
File Formats, Significant Properties Manfred Thaller Universität zu* Köln February 19 th, 2009 *University at not of Cologne.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Introduction to Interactive Media 03: The Nature of Digital Media.
Analysing the Impact of File Formats on Data Integrity Volker Heydegger University of Cologne Archiving 2008 Bern, 23rd – 27th June 2008.
Digital Image Capture of Musical Scores Jenn Riley, Indiana University Digital Library Program Ichiro Fujinaga, McGill University.
PREMIS Implementation Fair – SF 2009 PREMIS use in Rosetta Yair Brama – Ex Libris.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
THE SUPPORTING ROLE OF ONTOLOGY IN A SIMULATION SYSTEM FOR COUNTERMEASURE EVALUATION Nelia Lombard DPSS, CSIR.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Dalhousie Libraries Digital Collections Migration from Joomla! to CQ5.
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D
File Formats in the Context of Archiving Dr. Thomas Fischer EMANI – Project Meeting February 14 th - 16 th, 2002 Springer-Verlag Heidelberg Göttingen State.
XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne.
Digital Graphics for Computer Games Pixels Types of Digital Graphics (Raster and Vector) Compression.
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.
Auszug aus: What is a text within the Digital Humanities, or some of them at least? Manfred Thaller, Universität zu Köln Digital Humanities 2012, July.
Learning Outcome 01 : Be able to prepare for the production of dynamic products Unit R007: Creating dynamic products using sound and vision Cambridge Nationals.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
HTML5 and CSS3 Illustrated Unit F: Inserting and Working with Images.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
Information Retrieval in Practice
Search Engine Architecture
Inserting and Working with Images
Madam Hazwani binti Rahmat
3 Be able to repurpose and test a range of digital media assets
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Chapter III, Desktop Imaging Systems and Issues: Lesson IV Working With Images
Overview What is Multimedia? Characteristics of multimedia
Attributes and Values Describing Entities.
Digital Preservation Planning:
Cascading Style Sheets™ (CSS)
Understand basic HTML and CSS terminology, concepts, and basic operations. Objective 3.01.
Text processing Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 12.1.
Cascading Style Sheets
Presentation transcript:

eXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006

M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

Questions … M. Thaller DPP meeting, Glasgow, Nov. 23 rd Is all information contained within oldFormat also contained within newFormat?

Questions … M. Thaller DPP meeting, Glasgow, Nov. 23 rd Is all information contained within oldFormat also contained within newFormat? 2. Is all information, which is relevant for the usage of the information, within oldFormat also contained within newFormat?

Questions … * M. Thaller DPP meeting, Glasgow, Nov. 23 rd Is all information contained within oldFormat also contained within newFormat? 2. Is all information, which is relevant for the usage of the information, within oldFormat also contained within newFormat? 3. Is the conversion process a(oldFormat, newFormat) better than b(oldFormat, newFormat), i.e. does it preserve more of the information contained within oldFormat?

Building Block I: XCEL M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 A language, which allows a program to read "any file specification" based on a ==> "eXtensible Characterisation Extraction Language" Formulate the humanly readable specifications of TIFF, RTF, WAV …in a language, which a general purpose program can read. General enough that any existing format specification can be expressed in it. (LATeX, MAX, VRML …)

XCEL – Structuring Elements M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 range item subitem item symbol property

XCEL – Structuring Elements M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Byte offsets: 1000, 1248 Truly binary files: Most sound, image formats Binary addressable files: PDF, Max

XCEL – Structuring Elements M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Procedures: p(begin, trigger) q(trigger,filter,implication) Encoded / mark up files: RTF, TeX, SVG, VRML …

XCEL – Structuring Elements * M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Procedures: p(current_Position, ”). q(“ ”,pair(“ ”,” ”), implyBy(“ ”)) Encoded / mark up files: RTF, TeX, SVG, VRML …

Building Block II: XCDL M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 A language, which allows a program to describe "any file content" using a ==> "eXtensible Characterisation Definition Language" Formulate the content of any file in an abstract language, which captures the complete information contained in it. General enough that any existing content can be expressed in it.

XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd Sequences of bytes 2. With properties applicable to subsequences

XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Ashes to Ashes once more {\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\f charset0 Arial;}}\viewkind4\uc1\pard\f0\fs20 \b Ashes\b0 to \b Ashes\b0 once \b more\b0.\par}

XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Ashes to Ashes once more. boldFace Ashes more

XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Assumption 1: A file format is a set of rules which formalize all knowledge needed to process the binary information contained within a distinct and complete block of binary information, traditionally called a file.

XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Assumption 2: The extensible characterisation extraction language is designed to be able to express all such rules within a given file format. The extensible characterisation definition language is designed to be able to describe all the information contained within a file the format of which is described by a valid XCEL description.

XCDL: Basic Architecture *M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Assumption 3: A specific XCEL description is not required to express all the rules within a specific file format. A XCDL derived from such a partial XCEL will, therefore, potentially also contain only part of the information of a file encoded in that format. Even when the XCEL describes a format completely, an extractor is not required to extract all characteristics of a file. Some characteristics are only important for processing: compression method not important, after decompression succeeded.

Building Block III: Metrics M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006

Building Block III: Metrics M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Starting in month 13. However...

Metrics: Basic Assumptions M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Currently bottom up approach: Observe characteristics occuring within files … … and build name libraries from them. {"color depth", "# of planes"} => colorDepth

Metrics: Basic Assumptions M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Later parallel top down approach: Create file characteristics ontology … … and link it to the name libraries. "width" in image file != "width" in text file.

Metrics: Example I M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Percentage of bytes in a binary stream which are preserved within range of +/- 5 of original. (Images: Would scarcely be observable on screen.) E.g. relevant when colorspace appropriate for printing is transformed into a colorspace optimized for screen.

Metrics: Example II M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Degree to which font applied recreates the original typesetting characteristics. (Texts:Derived metric from comparison of font metrics.)

Metrics: Problem M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Problem not so much individual metrics but summation rules. An image migration step preserves 98 % of the image bytes within +/- 1 %. It also preserves 4 of 20 ( = 25 %) boolean properties (creator, scanning equipment …). Quality of the migration: ( ) / 2 =.615?

Metrics: Problem *M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Possible solution: " weights derived from PP. An image migration step preserves 98 % of the image bytes within +/- 1 %. It also preserves 4 of 20 ( = 25 %) boolean properties (creator, scanning equipment …). Weight engineering metrics by "arbitrary Quality of the migration: 0.98*w *w 2 / 2 =

M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006

Thank you! M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006