Chapter 3 Data Representation. 2 Compressing Files.

Slides:



Advertisements
Similar presentations
Greedy Algorithms (Huffman Coding)
Advertisements

Text Compression 1 Assigning 16 bits to each character in a document uses too much file space We need ways to store and transmit text efficiently Text.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
SWE 423: Multimedia Systems
Chapter 03 Data Representation
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Computer Science 335 Data Compression.
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
09/17/06 Hofstra University – Overview of Computer Science, CSC005 1 Chapter 3 Data Representation.
Chapter Chapter Goals Distinguish between analog and digital information. Explain data compression and calculate compression ratios.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Data Compression.
Lossless Data Compression Using run-length and Huffman Compression pages
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Dale & Lewis Chapter 3 Data Representation
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS105 INTRODUCTION TO COMPUTER CONCEPTS DATA REPRESENTATION Instructor: Cuong (Charlie) Pham.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
CPS120 Introduction to Computer Science Lecture 4
Chapter 3 The Information Layer: Data Representation.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Chapter 3 Representation. Key Concepts Digital vs Analog How many bits? Some standard representations Compression Methods 3-2.
3-1 Data and Computers Computers are multimedia devices, dealing with a vast array of information categories. Computers store, present, and help us modify.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Msccomputerscience.com. Compression is a key aspect of multimedia It is a technique used in almost all multimedia application Compression is done on source.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
1 MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - TEXT. 2 What is Text? the basic element of most multimedia the basic element of most multimedia consisting of.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
Chapter 03 Data Representation. 2 Chapter Goals Distinguish between analog and digital information Explain data compression and calculate compression.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Lecture 3 Data Representation. The Transition from Text-Based Computing to the Graphical OS In the early 1980’s desktop computers began to be introduced.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Chapter 03 Nell Dale & John Lewis. 3-2 Chapter Goals Distinguish between analog and digital information. Explain data compression and calculate compression.
Characters CS240.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computer Sciences Department1. 2 Data Compression and techniques.
1 SWE 423 – Multimedia System. 2 SWE Multimedia System Introduction  Compression is the process of coding that will effectively reduce the total.
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Representation CT101 – Computing Systems.
CSIS-110 Introduction to Computer Science
3.3 Fundamentals of data representation
Design & Analysis of Algorithm Huffman Coding
Lecture 3: Data representation
Lesson Objectives Aims You should know about: 1.3.1:
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Data Compression.
Huffman Coding Greedy Algorithm
Presentation transcript:

Chapter 3 Data Representation

2 Compressing Files

3 Data Compression It is important that we find ways to store and transmit data efficiently, which leads computer scientists to find ways to compress it. Data compression is a reduction in the amount of space needed to store a piece of data. Compression ratio is the size of the compressed data divided by the size of the original data.

4 Data Compression A data compression technique can be  lossless, which means the data can be retrieved without any loss of the original information,  lossy, which means some information may be lost in the process of compaction. As examples, consider these 3 techniques:  keyword encoding  run-length encoding  Huffman encoding

5 Keyword Encoding Frequently used words are replaced with a single character. For example… Note, that the characters used to encode cannot be part of the original text.

6 Keyword Encoding Consider the following paragraph, The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must all systems work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how these separate systems work in concert.

7 Keyword Encoding This version highlights the words that can be replaced. The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must each system work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how those separate systems work in concert.

8 Keyword Encoding This is the encoded paragraph: The human body is composed of many independent systems, such ^ ~ circulatory system, ~ respiratory system, + ~ reproductive system. Not only & each system work independently, they & interact + cooperate ^ %. Overall health is a function of ~ %- being of separate systems, ^ % ^ how # separate systems work in concert.

9 Keyword Encoding There are a total of 349 characters in the original paragraph including spaces and punctuation. The encoded paragraph contains 314 characters, resulting in a savings of 35 characters. The compression ratio for this example is 314/349 or approximately 0.9.

10 Keyword Encoding A compression ratio of.9 (90%) is NOT very good. The compressed file is 90% the size of the original. However, there are several ways this can be improved. Can you think of some?

11 Run-Length Encoding A single character may be repeated over and over again in a long sequence. This type of repetition doesn’t generally take place in English text, but often occurs in large data streams. In run-length encoding, a sequence of repeated characters is replaced by:  a flag character,  followed by the repeated character,  followed by a single digit that indicates how many times the character is repeated.

12 Run-Length Encoding Some examples: AAAAAAA would be encoded as *A7 *n5*x9ccc*h6 some other text *k8eee can be decoded into the following original text: nnnnnxxxxxxxxxccchhhhhh some other text kkkkkkkkeee

13 Run-Length Encoding In the second example, the original text contains 51 characters, and the encoded string contains 35 characters, giving us a compression ratio of 35/51 or approximately Since we are using one character for the repetition count, it seems that we can’t encode repetition lengths greater than nine. However, instead of interpreting the count character as an ASCII digit, we could interpret it as a binary number.

14 Huffman Encoding Why should the blank, which is used very frequently, take up the same number of bits as the character “X”, which is seldom used in text? Huffman codes use variable-length bit strings to represent each character. A few characters may be represented by five bits, and another few by six bits, and yet another few by seven bits, and so forth.

15 Huffman Encoding If we use only a few bits to represent characters that appear often and reserve longer bit strings for characters that don’t appear often, the overall size of the document being represented will be smaller.

16 Huffman Encoding An example of a Huffman alphabet

17 Huffman Encoding DOORBELL would be encoded in binary as If we used a fixed-size bit string to represent each character (say, 8 bits), then the binary from of the original string would be 64 bits. The Huffman encoding for that string is 25 bits long, giving a compression ratio of 25/64, or approximately An important characteristic of any Huffman encoding is that no bit string used to represent a character is the prefix of any other bit string used to represent a character.