Analysing the Impact of File Formats on Data Integrity Volker Heydegger University of Cologne Archiving 2008 Bern, 23rd – 27th June 2008.

Slides:



Advertisements
Similar presentations
Unit 30- Digital Graphics THEORY P2 and D2
Advertisements

Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Are You Smarter Than a 5 th Grader? 1,000,000 5th Grade Topic 1 Productivity Tools 5th Grade Topic 1 Productivity Tools 5th Grade Topic 2 Compression.
Raster graphics. Colour depth 01 1 bit pr pixel = 2 combinations (2 1 ): 2 bits pr pixel = 4 combinations (2 2 ): bits pr pixel = 16 combinations(2.
1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
Nice Calm Scene. (You’ll need it.). Skills Getting files off of the camera or scanner Getting files off of the camera or scanner Archiving Archiving Choosing.
ISYS 3074 Graphics File Formats File formats have developed with applications. At least 50 currently in use. Examples include: GIF, JPEG, TIFF, BMP, DIB,
Losslessy Compression of Multimedia Data Hao Jiang Computer Science Department Sept. 25, 2007.
Page Image Compression for Mass Digitization Harvard Test Suite Images JPEG
Graphics in the web Digital Media: Communication and Design
Welcome to a New Quarter Class Rules and Responsibilities What will be learning? 106-Static and Dynamic Visualization 105-Synthesize Data for SciVis Video-Real.
File Formats The most common image file formats, the most important for cameras, printing, scanning, and internet use, are JPG, TIF, PNG, and GIF.
Manipulating Images Image A visual representation of something that is seen in real life. It can be two-dimensional or three-dimensional A visual representation.
Data starts with width and height of image Then an array of pixel values (colors) The number of elements in this array is width times height Colors can.
{ Lossy Compression William Dayton Nick Trojanowski.
Prepared by George Holt Digital Photography BITMAP GRAPHIC ESSENTIALS.
School of Computer Science & Information Technology G6DPMM - Lecture 4 Graphics & Still Image Representation.
Compression Algorithms Robert Buckley MCIS681 Online Dr. Smith Nova Southeastern University.
Lecture 1 Contemporary issues in IT Lecture 1 Monday Lecture 10:00 – 12:00, Room 3.27 Lab 13:00 – 15:00, Lab 6.12 and 6.20 Lecturer: Dr Abir Hussain Room.
Measurements in Fluid Mechanics 058:180:001 (ME:5180:0001) Time & Location: 2:30P - 3:20P MWF 218 MLH Office Hours: 4:00P – 5:00P MWF 223B-5 HL Instructor:
Bit-Mapped Graphic Data: Input (Capture) Hardware Multimedia – Section 2.
Information hiding in stationary images staff corporal Piotr Lenarczyk Military Uniwersity of Technology Institute of Electronics and Telecomunication.
Common file formats  Lesson Objective: Understanding common file formats and their differences.  Learning Outcome:  Describe the type of files which.
EXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006.
1 Imaging Techniques for Flow and Motion Measurement Lecture 2 Lichuan Gui University of Mississippi 2011 Digital Image & Image Processing.
Digital Image Formats: An Explanation Guilford County SciVis V
Image formats. Basic terminologies… Pixels: Pixels are the building blocks of every digital image. a pixel is a single point in a graphic image. Resolutions:
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Raster Graphics. An image is considered to be made up of small picture elements (pixels). Constructing a raster image requires setting the color of each.
Digital Image Processing Image Compression
Graphics workshop Library and Information Services University of St Andrews.
Web Graphics By Chris Harding. Contents  Software  Vector Graphics and Pixel Based  Transparent Images  Compression  GIF vs. JPEG  Animated GIF.
A New Operating Tool for Coding in Lossless Image Compression Radu Rădescu University POLITEHNICA of Bucharest, Faculty of Electronics, Telecommunications.
CS654: Digital Image Analysis
Multimedia in Web Introduction. Multimedia Elements in Web Page Images Voice Music Animation Video Text & Numbers.
Digital Graphics for Computer Games Pixels Types of Digital Graphics (Raster and Vector) Compression.
Lesson 2: Introduction to Digital Imaging Digital Photography MITSAA IAP 2003 Rob Zehner.
Image File Formats By Dr. Rajeev Srivastava 1. Image File Formats Header and Image data. A typical image file format contains two fields namely Dr. Rajeev.
Image File Formats Harrow Computer Club – Wed, 1 Dec 2010 Bob Watson MA CMath MIMA MBCS.
Digital Image Formats: An Explanation Guilford County SciVis V
Image Editing Vocabulary Words Pioneer Library System Norman Public Library Nancy Rimassa, Trainer Thanks to Wikipedia ( help.
Information Systems Design and Development Media Types Computing Science.
Computer Sciences Department1. 2 Data Compression and techniques.
Computer Graphics Lesson 2 July 12, 2005 Image Formats What are some formats you are familiar with? There are 4 basic image format types: Uncompressed.
13 June – Session : Graphics Different types of Graphics for the web Features of image editing software Good practice for image editing.
8 April Risk Management of Digital Information: A File Format Investigation Gregory W. Lawrence, et. al June 2000 Council on Library and Information.
Component 1.9 Security and Data Management
Yingfang Zhang Department of Computer Science UCCS
Exploring Computer Science - Lesson 3-4
Exploring Computer Science - Lesson 3-4
Vocabulary byte - The technical term for 8 bits of data.
Raster Images CPSC 1030.
Digital Image Compression Using Bit Plane Slicing Method
Vocabulary byte - The technical term for 8 bits of data.
2D Drawing Basics 1.
Exploring Computer Science - Lesson 3-4
"Digital Media Primer" Yue-Ling Wong, Copyright (c)2013 by Pearson Education, Inc. All rights reserved.
A computer display is made up of small squares, called pixels.
Image File Size and File Compression
Chapter 3:- Graphics Eyad Alshareef Eyad Alshareef.
Digital Image Formats: An Explanation
File Formats V
Representing Images 2.6 – Data Representation.
Web Design and Development
COMS 161 Introduction to Computing
Image Compression Techniques
Do Now! Convert the following sequence of bits into an image using the protocol we discussed (first 8 bits are lengthxwidth, Then fill in the rows pixel.
Chapter 8 – Compression Aims: Outline the objectives of compression.
"Digital Media Primer" Yue-Ling Wong, Copyright (c)2013 by Pearson Education, Inc. All rights reserved.
Presentation transcript:

Analysing the Impact of File Formats on Data Integrity Volker Heydegger University of Cologne Archiving 2008 Bern, 23rd – 27th June 2008

Overview Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Introduction File format data and information loss  What happens if data is corrupted in files?  Categories of file format data Measuring Information Loss  Robustness Indicators  Study results for different file formats

Overview Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Introduction File format data and information loss  What happens if data is corrupted in files?  Categories of file format data Measuring Information Loss  Robustness Indicators  Study results for different file formats

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Background EU-founded project “ Planets ”  characterisation of file format content University of Cologne, Computer Science for the Humanities (Historisch-Kulturwissenschaftliche Informationsverarbeitung (HKI))  Planets partner

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Context Long-term preservation of digital information Which file format to choose? Criteria, e.g.: Open standard Spread of usage Hard-/Software-Dependencies Authenticity … Robustness

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Robustness ::= Error resilience of file formats against bit- stream corruption

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Issues/ Research topics Is there any correlation between file format and data integrity? If so, are there any differences among file formats concerning the degree of robustness? Which file format based factors are responsible for varying degrees of robustness? How can we improve the robustness of file formats?

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Benefits Digital preservation: Decision support for choosing file format for long-term preservation Contribution to file format research Improvement of existing file formats Design of future file formats

Overview Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Introduction File format data and information loss  What happens if data is corrupted in files?  Categories of file format data Measuring Information Loss  Robustness Indicators  Study results for different file formats

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland File Format Data and Information loss What is “ File Format ” in our context? Set of rules, constituting the logical organisation of data Set of rules, indicating how to interpret data Set of rules  file format specification File Format Data::= Binary data, formatted according to the rules of a file format

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland What happens if data is corrupted in files? G Testimage: Tif, Greyscale, 32x32 pixel, 8 bit per pixel

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland G First 224 Byte of testfile FF

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland G Plain information loss: 1 byte data = = 1 Pixel

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland What happens if data is corrupted in files? G Testimage: Tif, Greyscale, 32x32 pixel, 8 bit per pixel

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland G Part of the TIF Image File Directory, Tag: Photometric Interpretation 00

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland G Conditional information loss: 1 bit changes == 100% information changed

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Categories of File Format Data Technical data (data for processing): Image width: 277 Image length: 339 Compression: uncompressed

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland “ Payload ” data (basic data of usage): Pixel data, starting from byte #0x008

Overview Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Introduction File format data and information loss  What happens if data is corrupted in files?  Categories of file format data Measuring information loss  Robustness Indicators  Study results for different file formats

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Robustness Indicators (1) R B = Δ (b0,b1) / m where i.b0 is the basic data of usage before being corrupted, ii.b1 is the basic data of usage after being corrupted, iii.m is the number of corruption procedures. R B indicates an average information loss.

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Example A file X may have 2000 byte of payload data. Presuming the number of byte changed after the file has been corrupted 3 times is per each corruption procedure 1. Δ (b0,b1) = 200 byte 2. Δ (b0,b1) = 150 byte 3. Δ (b0,b1) = 250 byte The average information loss for file X based on 3 corruption procedures is then R B = 600 / 3 = 200

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland R B related to the total number of payload data: (2) R Bt = R B / n where n is the total number of basic data of usage (payload data). (3) R Bt = R B / n * 100 = R Bt expressed in percentage Interpretation: R Bt = 0 % : max. Robustness (min. Information loss)

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Example (continued) (2)R Bt = 200 / 2000 = 0.1 (3)R Bt = 200 / 2000 * 100 = 10 (%)

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Study on Robustness for various file formats: Example Results TIF - uncompressed - LZW - JPEG (2 different compression levels) - ZIP PNG (filtered, unfiltered) JPEG2000 (lossless, lossy) BMP (uncompressed) G

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Study on Robustness for various file formats: Example Results Method - simulation of file corruption: every file is corrupted up to 3000 times (3000 corruption procedures) - applying 3-5 different corruption ratios:  less than 0.01%  0.01%  0.1%  1.0%  more than 1.0% G

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Method - compressed payload data is decompressed - original payload data and corrupted one is compared - computing Robustness Indicators Values G

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland G

Example: Jp2 formatted image, corruption of 1 Byte, “ bad case ”

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Example: Jp2 formatted image, corruption of 1 Byte, “ good case ”

Volker Heydegger | Archiving 2008 | 25 th June 2008 | Bern, Switzerland Example: Jp2 formatted image, corruption of 1 Byte, “ good case ” with visualized differences in pixel data

Thank you very much! Volker Heydegger University of Cologne Archiving 2008 Bern, 23rd – 27th June 2008