Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Compression Michael J. Watts
Compression & Huffman Codes
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
A Data Compression Algorithm: Huffman Compression
CS336: Intelligent Information Retrieval
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Data Compression Basics & Huffman Coding
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC Data Compression.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Characters CS240.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Computer Sciences Department1. 2 Data Compression and techniques.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression Michael J. Watts
Design & Analysis of Algorithm Huffman Coding
COMP261 Lecture 22 Data Compression 2.
Data Coding Run Length Coding
Data Compression.
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression.
Data Compression CS 147 Minh Nguyen.
ESE534: Computer Organization
Data Structure and Algorithms
ESE534: Computer Organization
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
CSE 589 Applied Algorithms Spring 1999
CPS 296.3:Algorithms in the Real World
Presentation transcript:

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression

Today Basic Idea Example Systolic Tree Reclaiming space in tree CAMs Penn ESE534 Spring DeHon 2

Lossless vs. Lossy Lossless – can reconstruct source perfectly (bit- identical) Examples –Huffman –Run Length Coding –Lempel-Ziv –Unix compress/gzip Lossy – capture important elements of original, but maybe not all bits Examples –MP3 –JPEG –MPEG Penn ESE534 Spring DeHon 3 uncompress(compress(x))=x

Dictionary Idea Send id for long string rather than all the characters Penn ESE534 Spring DeHon 4

Dictionary Example “the instruction controls the behavior of the ALU, data memory, and interconnect on each cycle.” Characters? –Bits at 8b/character Encoding with dictionary? Bits? CodeWord 000ALU 001memory 010interconnect 011instruction 100data 101cycle 110control 111the Penn ESE534 Spring DeHon 5

Dictionary Usability When can we do this? What might prevent us from pulling this trick? Penn ESE534 Spring DeHon 6

Big Idea Use data already sent as the dictionary –Don’t need to pre-arrange dictionary –Adapt to common phrases/idioms in a particular document Penn ESE534 Spring DeHon 7

Example First line of Dr. Suess’s Green Eggs and Ham –I AM SAM SAM I AM Recurring substrings? Penn ESE534 Spring DeHon 8

Example An encoding: I AM S Decode. Characters in original? –Bits based on 8b characters? Penn ESE534 Spring DeHon 9

Example An encoding: I AM S Encode: –Add 1 bit to identify character vs 9b characters – : 1b says this + 4b for x, 4b for y Also 9b How many bits? Penn ESE534 Spring DeHon 10

Technical Issue How many bits assign to x and y? Issues? What if the document is huge? –What problems might that pose? Penn ESE534 Spring DeHon 11

Windows Pragmatic solution –Only keep the last D characters –D is window size –Need log 2 (D) bits to specify a position –Parameterize encoder based on D –Typically larger D  Greater compression Penn ESE534 Spring DeHon 12

Encoding Greedy simplification –Encode by successively selecting the longest match between the head of the remaining string to send and the current window Penn ESE534 Spring DeHon 13

Algorithm Concept While data to send –Find largest match in window –If length=1 Send character –Else Send = –Shift data encoded into window Penn ESE534 Spring DeHon 14

Run Algorithm Use D=8 I AM SAM SAM I AM How many bits? Penn ESE534 Spring DeHon 15

What’s challenging to implement? While data to send –Find largest match in window –If length=1 Send character –Else Send = –Shift data encoded into window Penn ESE534 Spring DeHon 16

Systolic Algorithm Give character in window a PE Broadcast characters to PE While (some PE has match-out=true) match-out=new-search*match OR cont-search*match*match-in Len=len+1 Broadcast next character Send Shift last set of characters into window Penn ESE534 Spring DeHon 17

Systolic Hardware While (some PE has match-out=true) match-out=new-search*match OR cont-search*match*match-in Len=len+1 Broadcast next character Penn ESE534 Spring DeHon 18

Simulate Systolic Each student is a PE in the window –Identify left and right neighbors –Raise right hand for match-out –Note left neighbor’s hand at end of previous cycle to know match-in I AM SAM SAM I AM Penn ESE534 Spring DeHon 19

Contemplate Solution How complicated is each PE? How fast PE? How fast does encoding operate? How much area do we need? How much energy? Penn ESE534 Spring DeHon 20

Contemplate Solution What’s inefficient or unsatisfying about this solution? Penn ESE534 Spring DeHon 21

Tree Based Penn ESE534 Spring DeHon 22

Idea Avoid need to track multiple substrings Compress storage BY Storing common prefixes together in a tree Penn ESE534 Spring DeHon 23

Tree Example THEN AND THERE, THEY STOOD… Penn ESE534 Spring DeHon 24 T H E N RY E

Idea Avoid need to track multiple substrings Compress storage BY Storing common prefixes together in a tree Penn ESE534 Spring DeHon 25

Tree Algorithm Root for each character Follow tree according to input until no more match Send Extend tree with new character Start over with this character Penn ESE534 Spring DeHon 26

Run Algorithm I AM SAM SAM I AM Penn ESE534 Spring DeHon 27

Encoding Encoding bits assuming D=512 –So, 9b to encode tree node Penn ESE534 Spring DeHon 28

Finite Window How can we maintain a finite window in this case? Penn ESE534 Spring DeHon 29

Finite Window Clear and start over LRU on tree nodes Maintain two areas –Encode from one (perhaps both) –Add to new –When new fills, New->old, clear old Pick old leaf node to replace Penn ESE534 Spring DeHon 30

Complexity How much work per character to encode? Penn ESE534 Spring DeHon 31

Tree Node Representation Penn ESE534 Spring DeHon 32 T H E N RY E

Tree Node Representation Encode in memory Penn ESE534 Spring DeHon 33 T H E N RY E

Content Addressable Memory What’s a CAM? Penn ESE534 Spring DeHon 34

Penn ESE534 Spring DeHon 35 PLA

CAM Memory with Programmable Addresses –Capacity < 2 (matchbits) PLA with both planes writeable Penn ESE534 Spring DeHon 36

Penn ESE534 Spring DeHon 37 PLA and Memory

Contemplate What value do Bunton and Borriello get from using a CAM? Penn ESE534 Spring DeHon 38

Admin Reading for Monday on Web FM1 for Monday –Implement tree version on processor and estimate energy Penn ESE534 Spring DeHon 39

Penn ESE534 Spring DeHon 40 Big Ideas [MSB Ideas] Can often compress data without loss of information Exploit structure in data to encode Build dictionary based on data already sent Code repeating substrings compactly in terms of data already seen in recent past