Mastering File Compression Part #1

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

 Wisegeek.com defines Data Compression as:  “Data compression is a general term for a group of technologies that encode large files in order to shrink.
Guilford County SciVis V106.01
V Obtained from a Guildford County workshop-Summer, 2014.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Welcome to a New Quarter Class Rules and Responsibilities What will be learning? 106-Static and Dynamic Visualization 105-Synthesize Data for SciVis Video-Real.
File Formats By Jack Turner. Raster (Bitmap) Raster or bitmap is a dot matrix data structure, containing columns of dots and rows, of a graphics image.
Zinnia Bell. RAWimages are image files that have not yet processed, they contain minimally processed data from the image sensor of either a image scanner,
Lecture 10 Data Compression.
1 Ethics of Computing MONT 113G, Spring 2012 Session 11 Graphics on the Web Limits of Computer Science.
Task 01 – Explain how different types of graphical images relate to file formats, file conversions, formats and compression. Emily Riley.
File Formats COM 366 Web Design & Layout. Native file format –Format native to software program –.psd > PhotoShop default Preserves layers –Use “Save.
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
 Refers to sampling the gray/color level in the picture at MXN (M number of rows and N number of columns )array of points.  Once points are sampled,
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 14 Introduction to Computer Graphics.
Data Compression. How File Compression Works If you download many programs and files off the Internet, you've probably encountered ZIP files before. This.
Common file formats  Lesson Objective: Understanding common file formats and their differences.  Learning Outcome:  Describe the type of files which.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Data Compression. Compression? Compression refers to the ways in which the amount of data needed to store an image or other file can be reduced. This.
Digital Image Formats: An Explanation Guilford County SciVis V
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
By Millie and Ellece. D IFFERENT T YPES OF G RAPHICS Bitmap Formats JPEG GIF PNG PSD TIFF.
Unit 1: Task 1 By Abbie Llewellyn. Vector Graphic Software (Corel Draw) Computer graphics can be classified into two different categories: raster graphics.
Image File Formats. What is an Image File Format? Image file formats are standard way of organizing and storing of image files. Image files are composed.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Digital Graphics for Computer Games Pixels Types of Digital Graphics (Raster and Vector) Compression.
Digital Image Formats: An Explanation Guilford County SciVis V
Information Systems Design and Development Media Types Computing Science.
Computer Sciences Department1. 2 Data Compression and techniques.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Component 1.9 Security and Data Management
AP CSP: Lossy Compression and File Formats
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
Exploring Computer Science - Lesson 3-4
Lesson Objectives Aims You should be able to:
Vocabulary byte - The technical term for 8 bits of data.
Exploring Computer Science - Lesson 3-4
Vocabulary byte - The technical term for 8 bits of data.
Lesson 2-2 AP Computer Science Principles
Binary Representation in Audio and Images
Computer Science Higher
Data Compression.
Image Formats.
Lesson Objectives Aims You should know about: 1.3.1:
Graphics Bitmap Vector
UNIT 2 – LESSON 2 TEXT COMPRESSION.
2.01 Investigate graphic image design.
File Compression-overview
Vocabulary byte - The technical term for 8 bits of data.
Exploring Computer Science - Lesson 3-4
AP CSP: Bytes, File Sizes, and Text Compression
Data Compression CS 147 Minh Nguyen.
Digital Images.
Image File Size and File Compression
Chapter 3:- Graphics Eyad Alshareef Eyad Alshareef.
File size and image quality
Digital Image Formats: An Explanation
Representing Images 2.6 – Data Representation.
Web Design and Development
Topic 3: Data Compression.
Working with images EIT, Author Gay Robertson, 2018.
2.01 Investigate graphic image design.
Do Now! Convert the following sequence of bits into an image using the protocol we discussed (first 8 bits are lengthxwidth, Then fill in the rows pixel.
Computer Networks Lesson 4.
2.01 Investigate graphic image design.
Chapter 8 – Compression Aims: Outline the objectives of compression.
GCSE COMPUTER SCIENCE Topic 3 - Data 3.9 Data Compression.
WJEC GCSE Computer Science
2.01 Investigate graphic image design.
Creating Digital Graphics
Presentation transcript:

Mastering File Compression Part #1 Theory and Context www.teachingcomputing.com Mastering File Compression Part #1 What is file compression Lossy and Lossless compression Algorithms for data compression *Please note: If you are delivering the OCR GCSE, this guide is for teachers only. Please do not share Pseudocode or solutions with your students.

5,2,4,3,1,6 First, see if you can solve this puzzle! Index of words Refer to the list on the left: What does this say?! Such Computer Is Science Teaching Fun 5,2,4,3,1,6 5. Teaching 2. Computer 4. Science 3. Is 1. Such 6. Fun Teaching Computer Science is such fun

Teaching Computer Science is such fun Compare the space taken up by both Such Computer Is Science Teaching Fun At first glance you may not see the benefit of this sort of ‘compression’ (storing words as a sequence of numbers) but imagine an entire book with the same sentence repeated, and you’ll appreciate where we’re going with this! 5,2,4,3,1,6 VS Teaching Computer Science is such fun

Space is precious! Data compression is all about finding a way to save space. There are 3 main principles. Find repeating patterns in a file Replace these patterns with a reference to a dictionary entity Create a dictionary of repeating patterns 5,2,4,3,1,6 Reference Dictionary (list that we refer to for compression and extraction) VS Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun. Teaching Computer Science is such fun.

So what is File Compression? It’s highly likely that you’ve come across file compression in some form or the other. Have you ever tried to ‘compress’ a JPEG image? (Compression, in this instance, merely refers to reducing the file size, which has obvious advantages) Note the picture on the right. The resolution decreases but the file size is also reduced.

LOSSY Lossy vs Lossless These are two very peculiar words, I know. They have very simple meanings however, and it would be useful for you to know before moving on with compression. In Lossy compression the exact sequence is not retained after compression. The reason this standard is called "lossy" is because a picture can be saved into smaller and smaller files but each time the image is degraded with the structure still visible but the details being lost. This means that when the file is recreated it is not identical to the original. LOSSY

LOSSLESS Lossy vs Lossless In Lossless data compression the algorithms allow the original data to be perfectly reconstructed from the compressed data. With lossless compression, every bit of data that was originally in the file remains after the file is uncompressed. All information is restored. This is generally the technique of choice for text or spreadsheet files, where losing words or financial data could pose a problem. The Graphics Interchange File (GIF) is an image format used on the Web that provides lossless compression.

Quick recap – fill in the blanks! Lossless and lossy compression are terms that describe whether or not, in the compression of a file, all original data can be recovered when the file is uncompressed With lossless compression, every single bit of data that was originally in the file remains after the file is uncompressed. All of the information is completely restored. On the other hand, lossy compression reduces a file by permanently eliminating certain information, especially redundant information. When the file is uncompressed, only a part of the original information is still there (although the user may not notice it). LOSSY vs LOSSLESS

Animation to illustrate the difference LOSSY vs LOSSLESS Animation to illustrate the difference ORIGINAL ORIGINAL Compressed Compressed RESTORED RESTORED

Lossy compression: Uses Lossy compression is generally used for video and sound, where a certain amount of information loss will not be detected by most users. The JPEG image file, commonly used for photographs and other complex still images on the Web, is an image that has lossy compression. Using JPEG compression, the creator can decide how much loss to introduce and make a trade-off between file size and image quality.

Lossless compression: Uses Lossless data compression is used in many applications. For example, it is used in the ZIP file format and in the GNU tool gzip. Lossless compression is used in cases where it is important that the original and the decompressed data be identical. Typical examples are executable programs, text documents, and source code. Some image file formats, like PNG or GIF, use only lossless compression, while others like TIFF and MNG may use either lossless or lossy methods. Lossless audio formats are most often used for archiving or production purposes, while smaller lossy audio files are typically used on portable players and in other cases where storage space is limited or exact replication of the audio is unnecessary. Source of image: www.stoimen.com

If you’re interested: The Hutter Prize If you get really into this you may want to have a look at the HUTTER PRIZE https://en.wikipedia.org/wiki/Hutter_Prize The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers further believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test. Thus, progress toward one goal represents progress toward the other.[4] They argue that predicting which characters are most likely to occur next in a text sequence requires vast real-world knowledge. A text compressor must solve the same problem in order to assign the shortest codes to the most likely text sequences. The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 100 MB English text file. Specifically, the prize awards 500 euros for each one percent improvement (with 50,000 euros total funding)[1] in the compressed size of the file enwik8, which is the smaller of two files used in the Large Text Compression Benchmark; enwik8 is the first 100,000,000 characters of a specific version of English Wikipedia.[2] The ongoing competition is organized by Hutter, Matt Mahoney, and Jim Bowery.

So, we’ve established that Compression is a pretty important thing to be able to do in Computer Science There are prizes out there for effective algorithms and being able to code compression programs BUT HOW DO WE GO ABOUT CODING A COMPRESSION PROGRAM?!

Coding a compression program The JPEG standards are mathematically rather complex but all compression uses certain underlying basic principles. You can understand these basic principles by looking at how a text file would appear if logically similar techniques (to Lossy file compression) are applied to it. Image source: http://www.ntchosting.com/web_hosting_images/jpeg-example.jpg Let’s explore what this would look like and how it works …

A compression program Analyse the text on the right carefully. It is from a very famous war time speech by Winston Churchill. You’ll notice he likes to repeat things! Can you spot some words that are repeated? We Shall the fight If we needed to create a DICTIONARY, it would be important to use the words that are repeated enough to make it worth while.

Creating a Dictionary To make a long story short, upon analysis, we would be able to add just three phrases to the dictionary: We shall Fight On the

Note: we are looking for repetitions in the text Applying these rules would result in considerable savings as they are many repetitions of certain phrases. Code Repeated Phrase Count Size Notes

Note: we are looking for repetitions in the text Applying these rules would result in considerable savings as they are many repetitions of certain phrases. Code Repeated Phrase Count Size Notes

Compression and restoration The file that is output after compression consists of the reduced file with the tags inserted plus the dictionary of phrases and the codes that now represent them. The figures shown in brackets are the equivalents for the Complex Lossless compression. In the example we just looked at, we could have compressed it as follows: The original file consisted of 391 characters of which 259 have been taken out leaving 132 characters. Then we created the dictionary which had 12 entries and hence code tags and 114 characters for the phrases giving a total for the dictionary of 126 characters. Therefore the whole file is 285 characters long. Compared with the original file of 391 characters this is compressed to 73% of its original size.