Richard W. Hamming Learning to Learn The Art of Doing Science and Engineering Session 13: Information Theory ` Learning to Learn The Art of Doing Science.

Slides:



Advertisements
Similar presentations
DCSP-10 Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
CY2G2 Information Theory 1
parity bit is 1: data should have an odd number of 1's
Sampling and Pulse Code Modulation
I NFORMATION CAUSALITY AND ITS TESTS FOR QUANTUM COMMUNICATIONS I- Ching Yu Host : Prof. Chi-Yee Cheung Collaborators: Prof. Feng-Li Lin (NTNU) Prof. Li-Yi.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Information Theory EE322 Al-Sanie.
CSC Dr. Gary Locklair Exam #4 … CSC Dr. Gary Locklair update date on slides 5, 6, 7.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Copyright © 2009 Cengage Learning 9.1 Chapter 9 Sampling Distributions.
Chapter 6 Information Theory
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
A Bit of Information Theory Unsupervised Learning Working Group Assaf Oron, Oct Based mostly upon: Cover & Thomas, “Elements of Inf. Theory”,
Fundamental limits in Information Theory Chapter 10 :
The 1’st annual (?) workshop. 2 Communication under Channel Uncertainty: Oblivious channels Michael Langberg California Institute of Technology.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
2/28/03 1 The Virtues of Redundancy An Introduction to Error-Correcting Codes Paul H. Siegel Director, CMRR University of California, San Diego The Virtues.
1 Chapter 1 Introduction. 2 Outline 1.1 A Very Abstract Summary 1.2 History 1.3 Model of the Signaling System 1.4 Information Source 1.5 Encoding a Source.
Variable-Length Codes: Huffman Codes
Reliability and Channel Coding
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Basic Concepts in Information Theory
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
STATISTIC & INFORMATION THEORY (CSNB134)
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Richard W. Hamming Learning to Learn The Art of Doing Science and Engineering Session 10: Coding Theory I Learning to Learn The Art of Doing Science and.
Session 11: Coding Theory II
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Review of modern noise proof coding methods D. Sc. Valeri V. Zolotarev.
Mathematics in Management Science
Channel Capacity
Mathematical Preliminaries. 37 Matrix Theory Vectors nth element of vector u : u(n) Matrix mth row and nth column of A : a(m,n) column vector.
Chapter 18: Sampling Distribution Models
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Channel Capacity.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Coding & Information theory
Richard W. Hamming Learning to Learn The Art of Doing Science and Engineering Session 27: Unreliable Data Learning to Learn The Art of Doing Science and.
EE 6332, Spring, 2014 Wireless Communication Zhu Han Department of Electrical and Computer Engineering Class 11 Feb. 19 th, 2014.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Computer Vision – Compression(1) Hanyang University Jong-Il Park.
Information Theory The Work of Claude Shannon ( ) and others.
DIGITAL COMMUNICATIONS Linear Block Codes
Coding Theory Efficient and Reliable Transfer of Information
Basic Concepts of Encoding Codes and Error Correction 1.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Lecture Focus: Data Communications and Networking  Transmission Impairment Lecture 14 CSCS 311.
CS294-9 :: Fall 2003 Joint Source/Channel Coding Ketan Mayer-Patel.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
1 CSCD 433 Network Programming Fall 2013 Lecture 5a Digital Line Coding and other...
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Error Detecting and Error Correcting Codes
Computer Networks, Fifth Edition by Andrew Tanenbaum and David Wetherall, © Pearson Education-Prentice Hall, 2011 The Data Link Layer Chapter 3.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
1 CSCD 433 Network Programming Fall 2016 Lecture 4 Digital Line Coding and other...
Data Link Layer.
Introduction to Information theory
Introduction to Information Theory- Entropy
Information Theory Michael J. Watts
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
CSCD 433 Network Programming
Watermarking with Side Information
Data Link Layer. Position of the data-link layer.
Presentation transcript:

Richard W. Hamming Learning to Learn The Art of Doing Science and Engineering Session 13: Information Theory ` Learning to Learn The Art of Doing Science and Engineering Session 13: Information Theory `

InformationInformation Shannon identifies information with surprise For example: Telling someone it is smoggy in Los Angeles is not much of a surprise, and therefore not much new information Shannon identifies information with surprise For example: Telling someone it is smoggy in Los Angeles is not much of a surprise, and therefore not much new information

p is the probability of the event I (p) is information gained from that event Information learned from independent events is additive p is the probability of the event I (p) is information gained from that event Information learned from independent events is additive Surprise defined

Definition Confounding Information Theory has not “defined” information It actually measures “surprise” Shannon’s definition may suffice for machines, but it does not represent what we normally think of as information Should have been called “Communication Theory” and not “Information Theory” Information Theory has not “defined” information It actually measures “surprise” Shannon’s definition may suffice for machines, but it does not represent what we normally think of as information Should have been called “Communication Theory” and not “Information Theory”

Definition Confounding Realize how much the definition distorts the common view of information Illustrates a point to examine whenever new definitions presented How far does the proposed definition agree with the original concepts you had, and how far does it differ?How far does the proposed definition agree with the original concepts you had, and how far does it differ? Realize how much the definition distorts the common view of information Illustrates a point to examine whenever new definitions presented How far does the proposed definition agree with the original concepts you had, and how far does it differ?How far does the proposed definition agree with the original concepts you had, and how far does it differ?

Information Entropy: H(P) The average amount of information in the system Not the same as physical entropy even though mathematical form is similar The average amount of information in the system Not the same as physical entropy even though mathematical form is similar

Gibbs Inequality (mathematical interlude) Here q & p are independent probability distributions Note the form is nearly identical to H ( P ) From fig 13.1: so where q i = 1/q and q = 1-p Here q & p are independent probability distributions Note the form is nearly identical to H ( P ) From fig 13.1: so where q i = 1/q and q = 1-p

Kraft Inequality: K (mathematical interlude continues) Given a uniquely decodable code Where l i is length of code segment i Where l i is length of code segment i Define the pseudoprobabilities where It then follows from Gibbs Substituting Q i for q i Substituting Q i for q i And finally the “Noiseless Coding Theorem of Shannon” Given a uniquely decodable code Where l i is length of code segment i Where l i is length of code segment i Define the pseudoprobabilities where It then follows from Gibbs Substituting Q i for q i Substituting Q i for q i And finally the “Noiseless Coding Theorem of Shannon”

Channel Capacity defined The maximum amount of information that can be sent through the channel reliably With n bits sent, expect nQ errors Given a large enough n you can force the probability of falling outside the nQ boundary as small as you pleaseyou can force the probability of falling outside the nQ boundary as small as you please The maximum amount of information that can be sent through the channel reliably With n bits sent, expect nQ errors Given a large enough n you can force the probability of falling outside the nQ boundary as small as you pleaseyou can force the probability of falling outside the nQ boundary as small as you please

Channel Capacity Based upon random encoding with no error correction With M messages of length n there are 2 Mn code books Leaves the possibility for destructive overlapLeaves the possibility for destructive overlap Proved the possibility of overlap is very small by averaging over all 2 Mn code books for the average error Using sufficiently large n’s will reduce the probability of error while simultaneously maximizing the flow of information through the channel Thus if the average error is suitably small, then at least one code will be suitable: “Shannon’s noisy coding theorem” Based upon random encoding with no error correction With M messages of length n there are 2 Mn code books Leaves the possibility for destructive overlapLeaves the possibility for destructive overlap Proved the possibility of overlap is very small by averaging over all 2 Mn code books for the average error Using sufficiently large n’s will reduce the probability of error while simultaneously maximizing the flow of information through the channel Thus if the average error is suitably small, then at least one code will be suitable: “Shannon’s noisy coding theorem”

What does it mean? “Sufficiently large n” necessary to ensure information flow is approaching channel capacity may be so large as to be too slow Error-correcting codes avoid the slowness at the cost of some channel capacity Use computable functions, rather than lengthy random code booksUse computable functions, rather than lengthy random code books When many errors are corrected, the performance compared to channel capacity is quite goodWhen many errors are corrected, the performance compared to channel capacity is quite good “Sufficiently large n” necessary to ensure information flow is approaching channel capacity may be so large as to be too slow Error-correcting codes avoid the slowness at the cost of some channel capacity Use computable functions, rather than lengthy random code booksUse computable functions, rather than lengthy random code books When many errors are corrected, the performance compared to channel capacity is quite goodWhen many errors are corrected, the performance compared to channel capacity is quite good

In Practice If your system provides error correction, use it! Solar-system exploration satellites Extreme total-power limitations of system about 5W, so restricted transmission power and distance/background noise induce errors.Extreme total-power limitations of system about 5W, so restricted transmission power and distance/background noise induce errors. Aggressive error-correcting codes enabled more effective use of available bandwidth as errors were self correctingAggressive error-correcting codes enabled more effective use of available bandwidth as errors were self correcting Hamming codes may not guarantee use near optimal channel capacity, but does guarantee error-correction reliability to a specified level Shannon coding only states a “probably low error” given a long enough series of bits, but will push those bits near channel capacity If your system provides error correction, use it! Solar-system exploration satellites Extreme total-power limitations of system about 5W, so restricted transmission power and distance/background noise induce errors.Extreme total-power limitations of system about 5W, so restricted transmission power and distance/background noise induce errors. Aggressive error-correcting codes enabled more effective use of available bandwidth as errors were self correctingAggressive error-correcting codes enabled more effective use of available bandwidth as errors were self correcting Hamming codes may not guarantee use near optimal channel capacity, but does guarantee error-correction reliability to a specified level Shannon coding only states a “probably low error” given a long enough series of bits, but will push those bits near channel capacity

In Practice Information theory does not tell you how to design, but gives point the way towards efficient designs Remember, information theory applies to data communications and is not necessarily relevant to human communication Information theory does not tell you how to design, but gives point the way towards efficient designs Remember, information theory applies to data communications and is not necessarily relevant to human communication

Final Points Reuse of established terms as definitions in a new area should fit our previous beliefs, but often do not and have some degree of distortion and non-applicability to the way we thought things were. Definitions don’t actually define things, just suggest how those things should be handled. Reuse of established terms as definitions in a new area should fit our previous beliefs, but often do not and have some degree of distortion and non-applicability to the way we thought things were. Definitions don’t actually define things, just suggest how those things should be handled.

Final Points All definitions should be inspected, not only when proposed but later when they apply to the conclusions drawn. Were they framed to get the desired result?Were they framed to get the desired result? Are they applicable under differing conditions?Are they applicable under differing conditions? Beware: initial definitions often determine what you find or see, rather than describe what is actually there. Are they creating results which are circular tautologies vice actual results?Are they creating results which are circular tautologies vice actual results? All definitions should be inspected, not only when proposed but later when they apply to the conclusions drawn. Were they framed to get the desired result?Were they framed to get the desired result? Are they applicable under differing conditions?Are they applicable under differing conditions? Beware: initial definitions often determine what you find or see, rather than describe what is actually there. Are they creating results which are circular tautologies vice actual results?Are they creating results which are circular tautologies vice actual results?