CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Text Compression 1 Assigning 16 bits to each character in a document uses too much file space We need ways to store and transmit text efficiently Text.
Data Compression Michael J. Watts
Compression & Huffman Codes
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
SWE 423: Multimedia Systems
Huffman Coding: An Application of Binary Trees and Priority Queues
Department of Computer Engineering University of California at Santa Cruz Data Compression (1) Hai Tao.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
A Data Compression Algorithm: Huffman Compression
Computer Science 335 Data Compression.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
Chapter 1 Data Storage. 2 Chapter 1: Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.4 Representing Information as Bit Patterns.
Lossless Data Compression Using run-length and Huffman Compression pages
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Huffman Codes Message consisting of five characters: a, b, c, d,e
Dale & Lewis Chapter 3 Data Representation
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
1 Part I: Machine Architecture 4 A major process in the development of a science is the construction of theories that are confirmed or rejected by experimentation.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
Chapter 11 Fluency with Information Technology 4 th edition by Lawrence Snyder (slides by Deborah Woodall : 1.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Chapter 1: Data Storage.
CS 111 – Sept. 10 Quiz Data compression –text –images –sounds Commitment: –Please read rest of chapter 1. –Department picnic next Wednesday.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Chapter 3 Data Representation. 2 Compressing Files.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Computer Sciences Department1. 2 Data Compression and techniques.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression Michael J. Watts
Chapter 1: Data Storage.
Computer Science: An Overview Eleventh Edition
3.3 Fundamentals of data representation
HUFFMAN CODES.
Compression & Huffman Codes
Assignment 6: Huffman Code Generation
Data Compression.
Lesson Objectives Aims You should know about: 1.3.1:
Data Compression.
Data Compression CS 147 Minh Nguyen.
Advanced Algorithms Analysis and Design
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Huffman Coding Greedy Algorithm
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

CSCI 3 Chapter 1.8 Data Compression

Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the size of the data involved.  The technique for accomplishing this is called data compression.

Generic Data Compression Techniques  Run-length encoding  Relative encoding  Frequency-dependent encoding  Adaptive dictionary encoding

Run-length encoding  When the data being compressed consist of long sequences of the same value, the run- length encoding produces its best results.  Run Length Encoding (RLE) is a simple and popular data compression algorithm. It is based on the idea to replace a long sequence of the same symbol by a shorter sequence and is a good introduction into the data compression field for newcomers.

Run-length encoding  It replaces the consist of long sequences of the same value with a code indicating the value that is repeated and the number of times it occurs in the sequence.

Run-length encoding  Example: abcdeeeeeeeeeefghi  And noticing that the letter “e” is repeated 10 times in a row. RLE compression would look at this and say "there are 4 non-repeating bytes (abcd) followed by 10 'e' characters which are then followed by 4 non-repeating bytes (fghi)".

Run-length encoding Example of run-length encoding. Each run of zeros is replaced by two characters in the compressed file: a zero to indicate that compression is occurring, followed by the number of zeros in the run

Relative encoding  Its approach is to record the differences between consecutive data blocks rather than entire blocks.  Each block is encoded in terms of its relationship to the previous block.

Frequency-dependent encoding  In English language the letters e, t, a, and I are used more frequently than the letters z, q, and x.  So, when constructing a code for text in the English language, space can be saved by using short bit patterns to represent the former letters and longer bit patters to represent the latter ones.

Frequency-dependent encoding  The result would be a code the English text would have shorter representations than would be obtained with uniform-length codes.  Example: Huffman code.  This method is named after D.A. Huffman, who developed the procedure in the 1950s.

Frequency-dependent encoding  Huffman codes.  It is the most frequency-dependent codes in use today are Huffman codes.

Huffman codes  The following fig. shows a histogram of the byte values from a large ASCII file. More than 96% of this file consists of only 31 characters: the lower case letters, the space, the comma, the period, and the carriage return. This observation can be used to make an appropriate compression scheme for this file.

Huffman codes

 Assign a few, one or two bits to characters that occur most often.  Infrequently characters such as: #, $ and %, may require a dozen or more bits.  In mathematical terms, the optimal situation is reached when the number of bits used for each character is proportional to the logarithm of the character's probability of occurrence.

Huffman codes  Huffman encoding. The encoding table assigns each of the seven letters used in this example, based on its probability of occurrence.  The original data stream composed of these 7 characters is translated by this table into the Huffman encoded data.  Since each of the Huffman codes is a different length, the binary data need to be regrouped into standard 8 bit bytes for storage and transmission.

Huffman codes

Compression  IMAGE  GIF (“Jiff”)  JPEG (“JAY-peg”)  Audio & Video  MPEG  MP3

1.9 Communication Errors  Parity Bits  Simple method is based on the principle that if each bit pattern has an odd and even number of 1s.  Encode system in which each pattern contains odd number of 1s : odd parity even number of 1s : even parity

1.9 Communication Errors  Parity Bits  Odd parity