XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne.

Slides:



Advertisements
Similar presentations
PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06.
Advertisements

XCEL / XCDL Tools Jan Schnasse PLANETS: Den Haag,
PC/4 Manfred Thaller PLANETS TB meeting, DenHaag, Sept 29th. '06.
Characterisation Adrian Brown The National Archives, UK.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
B ASIC T ERMINOLOGY. W ORDS W E W ILL C OVER T ODAY Digital Camera Digital Pixels Pixelization File Formats Focus & Clarity.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library Even v.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
What is a text within the Digital Humanities, or some of them at least? Manfred Thaller, Universität zu Köln Digital Humanities 2012, July 20 th 2012.
UNDERSTANDING JAVA APIS FOR MOBILE DEVICES v0.01.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
Content Types: Markup and Multimedia. Introduction Markup languages use extra textual syntax to encode: –Formatting / display information –Structure information.
3. Technical and administrative metadata standards Metadata Standards and Applications.
Analysis Stage (Phase I) The goal: understanding the customer's requirements for a software system. n involves technical staff working with customers n.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
HTTP Overview Vijayan Sugumaran School of Business Administration Oakland University.
International User Group Information Delivery Manuals: General Overview Courtesy:This presentation is based on material provided by AEC3 and AEC Infosystems.
 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Catherine Jones Science and Technology Facilities Council SCAPE Training Statsbiblioteket, Aarhus, November 2013 Control Policy formulation The why.
What is Software Architecture?
DIGITIZATION OF RARE LIBRARY MATERIALS Metadata Format Access to Digital Documents © Adolf Knoll, National Library of the Czech Republic.
The PLANETS-Ontology in the context of the PLANETS-Testbed and the XCL-Software.
2005 Adobe Systems Incorporated. All Rights Reserved. 1 Ontolog Forum Gunar Penikis Sr. Product Manager Adobe Systems.
XML Technologies Surekha Akula
The Lifecycle of Embedded Image Metadata within Digital Photographs: Challenges and Best Practices. - or - The Secret Life of Photo Metadata To promote,
MPEG-21 : Overview MUMT 611 Doug Van Nort. Introduction Rather than audiovisual content, purpose is set of standards to deliver multimedia in secure environment.
Foundations of Computer Science Computing …it is all about Data Representation, Storage, Processing, and Communication of Data 10/4/20151CS 112 – Foundations.
Preservation and Archiving Special Interest Group Spring Meeting San Francisco, May 2008 Preservation Characterization Stephen Abrams California.
Semantic Web Applications GoodRelations BBC Artists BBC World Cup 2010 Website Emma Nherera.
The XCL Languages Digital Preservation – The Planets Way Dresden, April 23 rd 2010 Manfred Thaller, Universität zu Köln.
A CIDOC CRM – compatible metadata model for digital preservation
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
EXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006.
File Formats, Significant Properties Manfred Thaller Universität zu* Köln February 19 th, 2009 *University at not of Cologne.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
CMPS 1371 Introduction to Computing for Engineers FILE Input / Output.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
METS Application Profiles Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Connecting Preservation Planning and Plato with Digital Repository Interfaces David Tarrant
File Analysis Dr. John P. Abraham Professor UTPA.
The Role of Embedded Metadata in Visual Resources Mira Basara DSPS January 2015.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Softwaretechnologie für Fortgeschrittene Teil Thaller Stunde VI: Information revisited … Köln 9. Januar 2014.
1 Problem Solving using Computers “Data....Representation, and Storage.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
MULTIMEDIA DATA MODELS AND AUTHORING
Auszug aus: What is a text within the Digital Humanities, or some of them at least? Manfred Thaller, Universität zu Köln Digital Humanities 2012, July.
Utilizing the Benefits of Native XML Database Technologies Alan Cornish Systems Librarian Washington State University Libraries.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Getting it together! Automating Standardized Technical Metadata for Images and Audio Jody L. DeRidder University of Alabama Libraries DLF 2015 October.
#SummitNow Metadata Madness Ray Gauss II Digital Asset Management Architect.
XML QUESTIONS AND ANSWERS
C Language VIVA Questions with Answers
Formal Language Theory
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Digital Project Lifecycle Curating Across the Curriculum
Digital Preservation Planning:
Metadata in Digital Preservation: Setting the Scene
CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS
EPrints Preservation.
Presentation transcript:

XCL-Tools in relation to Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne

What are “significant characteristics”? Those properties of a digital file which have to be known to enable the processing of the file within a specific setup.

Why extract them by software? To create technical metadata as required by organizational models for long term preservation. (NLNZ)

Within Planets … … served by solutions to identify formats: formats registry / PRONOM / DROID. … and a solution for extracting and processing such characteristics: XCL.

Migrator tiff png Extractor tiff XCELpng XCEL Comparator png XCDL tiff XCDL 93% A Vision

Extractor Appropriate XCELs Comparator C-Set A Vision

1 million objects: use one second for each. == minutes == hours == working days of a computer == hour days for a Human == 7 working weeks Why automate?

1 million objects: use five minutes for each. == hours == hour days for a Human Why automate?

Assumption: Preservation is only feasible, if the content of two digital objects can be compared without human intervention, giving a numerical estimate of their degree of similarity. Why automate?

(1)Language to represent the complete content of a digital object. XCDL (2)Language to describe any machine readable format in a formal language. XCEL (3)Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4)Software to compare two such content descriptions. “comparator” Abstract solution I

height 1 greyscale 0 imageType 1 zlibDeflateInflate 0 compression compression 0... height ad 429 uint32 imageType.....

32

(1)Language to represent the complete content of a digital object. XCDL (2)Language to describe any machine readable format in a formal language. XCEL (3)Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4)Software to compare two such content descriptions. “comparator” Abstract solution I

Are the following two items equal: VIII  8

VIII  8 eight

VIII  8 eight otto

VIII  8 eight otto acht

VIII  8 eight otto acht 8.0

VIII  8 eight otto acht Information model: „an image“

VIII  8 information model: „an image“ format ontology: „what terms are used in formats to describe image properties“

Extraction language: “how to get the terms describing an image out of a file” Information model: „what is an image“ Format ontology: „what terms are used in formats to describe image properties“

(1)A theoretical model of information (not: data) types – “image”, “text”, “audio”... (2)Ontologies, which map existing file format terminologies onto these model. (3)A language – XCDL – which allows to express the content of files in different formats using the vocabulary of the ontologies and the “grammar” of the information model. Abstract solution II

eXtensible Characterisation Definition Language Purpose: Describe the contents of a file in terms of an abstract model. XCDL

XCDL: text model (1) A text (= ) is composed of  data (= ) plus  interpretations of data according to the underlying format specification (= ).

XCDL: text model (2) Or, one level of abstraction higher, a text is composed of content carrying tokens, accompanied by rendering info plus deployment info plus historical info.

This is a text … fontsize 48 unsignedInt8

This is a text … fontsize 48 unsignedInt8

Thank you! Questions?

XC(E/D)L - & related issues (originally from Sebastian Beyl)

Already known XCEL Machine readable format description XCDL Normdatas and properties from original file ORIGINAL FILE Extractor

Problem: propertySets and relation to normdatas normdatas original file property 1 property 2

Problem: propertySets and relation to normdatas pSet. 3 pSet. 3 propertySet 2 again! propertySet 2 again! propertySe t 2 propertySet 1 again! propertySet 1 again! propertySet 1 propertySet 1 normdatas XCDL property 1 property 2

Problem: propertySets and relation to normdatas pSe t. 3 pSe t. 3 propertySet 2 again! propertySet 2 again! propertyS et 2 propertySet 1 again! propertySet 1 again! propertySet 1 propertySet 1 normdatas XCDL property 1 property 2 Rules: - Relation to normdata ONLY with propertySet - No overlapping relations - every propertySet-definition (in one object) only once

Problem: recursive structures Footnote example from koffice.org

Problem: recursive structures Footnote example from koffice.org normdata

Problem: recursive structures Footnote example from koffice.org Property fontsize normdata Property fontSize

Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote

Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote normdata of property?

Problem: recursive structures Footnote example from koffice.org normdata Property fontSize Property footnote property of normdata of property? How to bring it in XCDL?

Problem: recursive structures Property „Object B“ as footnote Footnote example from koffice.org Rules: properties and propertySets only for ONE object Upper object always points to lower object, so lower object can exists itself Object A normdata Property fontSize Object B normdata Property fontSize Object A Object B

Problem: embedded objects Example from wikipedia.de

Problem: embedded objects Example from wikipedia.de Original (container) file Text datas Picture datas as embedde d file

Problem: embedded objects Example from wikipedia.de extraction XCDL-Object A (text datas) XCDL-Object B (image datas) Object A handles object B as an „image property“ Original (container) file Text datas Picture datas as embedde d file

XCDL-Object A (text datas) XCDL-Object B (image datas) Object A handles object B as an „image property“ Problem: embedded objects Example from wikipedia.de Standalone Image-XCDL Rules: If upper object (A) is not readable or cannot use for comparison, the embedded object can be Handled as a „ Standalone “ -XCDL

Problem: embedded objects Example from wikipedia.de XCDL-Object A (text datas) XCDL-Object B UNKNOWN IMAGE FORMAT Second Parsing, if known Image format Rules: If lower object (B) cannot be parsed, raw datas can be stored for later parsing, without data-loss or comparison problems for upper object (A)