Shannon Entropy Shannon worked at Bell Labs (part of AT&T)

Slides:



Advertisements
Similar presentations
CREATING a HUFFMAN CODE EVERY EGG IS GREEN E ///// V/V/ R // Y/Y/ I/I/ S/S/ N/N/ Sp /// V/V/ Y/Y/ I/I/ S/S/ N/N/ R // Sp /// G /// E /////
Advertisements

Entropy and Information Theory
Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Lecture04 Data Compression.
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
UCB Claude Shannon – In Memoriam Jean Walrand U.C. Berkeley
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
STATISTIC & INFORMATION THEORY (CSNB134)
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
(Important to algorithm analysis )
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Compression: Why? Shannon! H = ²log S^n = n log S H: information S: number of symbols n: messagelength But what if we know what to expect? So S = 2, n.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Information & Communication INST 4200 David J Stucki Spring 2015.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
Additive White Gaussian Noise
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Bahareh Sarrafzadeh 6111 Fall 2009
Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Presented by Minkoo Seo March, 2006
Huffman Coding The most for the least. Design Goals Encode messages parsimoniously No character code can be the prefix for another.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
1 CSCD 433 Network Programming Fall 2016 Lecture 4 Digital Line Coding and other...
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Information theory Data compression perspective Pasi Fränti
Chapter 4 Information Theory
Introduction to Lossless Compression
Expression Tree The inner nodes contain operators while leaf nodes contain operands. a c + b g * d e f Start of lecture 25.
Assignment 6: Huffman Code Generation
Increasing Information per Bit
Data Compression.
Introduction to Information theory
Applied Algorithmics - week7
Data Compression.
Data Compression CS 147 Minh Nguyen.
Information Theory Michael J. Watts
Context-based Data Compression
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Arithmetic coding Let L be a set of items.
A Brief Introduction to Information Theory
Chapter 11 Data Compression
CSE 589 Applied Algorithms Spring 1999
Image Transforms for Robust Coding
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Source Encoding and Compression
CSCD 433 Network Programming
ESE 150 – Digital Audio Basics
Entropy CSCI284/162 Spring 2009 GWU.
Lecture 2: Basic Information Theory
Presentation transcript:

Shannon Entropy Shannon worked at Bell Labs (part of AT&T) Major question for telephone communication: How to transmit signals most efficiently and effectively across telephone wires? Shannon adapted Boltzmann’s statistical mechanics ideas to the field of communication. Claude Shannon, 19162001

Shannon’s Formulation of Communication Message Source Message Receiver Message (e.g., a word) Message source : Set of all possible messages this source can send, each with its own probability of being sent next. Message: E.g., symbol, number, or word Information content H of the message source: A function of the number of possible messages, and their probabilities Informally: The amount of “surprise” the receiver has upon receipt of each message

Message source: One-year-old Messages: “Da” Probability 1 No surprise; no information content Message source: One-year-old Messages: “Da” Probability 1 InformationContent (one-year-old) = 0 bits

Message source: Three-year-old More surprise; more information content Message source: Three-year-old Messages: 500 words (w1 , w2 , ... , w500) Probabilities: p1 , p2 , ... , p500 InformationContent (three-year-old) > 0 bits

Shannon information (H): If all messages have the same probability, then Units = “bits per message” Example: Random bits (1, 0) Example: Random DNA (A, C, G, T) [meaning in “bits per message”] Example: Random notes in an octave (C, D, E, F, G, A, B, C’) [meaning in “bits per message”]

General formula for Shannon Information Content

General formula for Shannon Information Content Let M be the number of possible messages, and pi be the probability of message i.

General formula for Shannon Information Content Let M be the number of possible messages, and pi be the probability of message i.

Example: Biased coin Example: Text

Relation to Coding Theory: Information content = average number of bits it takes to encode a message from a given message source, given an “optimal coding”. This gives the compressibility of a text.

Huffman Coding An optimal (minimal) and unambiguous coding, based on information theory. Algorithm devised by David Huffman in 1952 Online calculator: http://planetcalc.com/2481/ David Huffman

Phrase: to be or not to be Huffman code of phrase: Huffman Coding Example Name:_____________________________ Frequency 5 4 3 2 1 Phrase: to be or not to be Huffman code of phrase: (remember to include sp code for spaces) Average bits per character in code: Shannon entropy of phrase:

Phrase: to be or not to be Huffman code of phrase: Huffman Coding Example Name:_____________________________ Frequency 5 4 3 2 1 Phrase: to be or not to be Huffman code of phrase: (remember to include sp code for spaces) Average bits per character in code: Shannon entropy of phrase:

Clustering C c3 c1 c2 What is the entropy of each cluster? What is the entropy of the clustering?