Information Theory Kenneth D. Harris 18/3/2015. Information theory is… 1.Information theory is a branch of applied mathematics, electrical engineering,

Slides:



Advertisements
Similar presentations
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

EE 4780 Huffman Coding Example. Bahadir K. Gunturk2 Huffman Coding Example Suppose X is a source producing symbols; the symbols comes from the alphabet.
Applied Algorithmics - week7
Pattern Recognition and Machine Learning
Information & Entropy. Shannon Information Axioms Small probability events should have more information than large probabilities. – “the nice person”
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
BCS547 Neural Encoding.
Estimating mutual information Kenneth D. Harris 25/3/2015.
Chapter 6 Information Theory
Poorvi Vora/CTO/IPG/HP 01/03 1 The channel coding theorem and the security of binary randomization Poorvi Vora Hewlett-Packard Co.
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Stochastic Processes Dr. Talal Skaik Chapter 10 1 Probability and Stochastic Processes A friendly introduction for electrical and computer engineers Electrical.
A Bit of Information Theory Unsupervised Learning Working Group Assaf Oron, Oct Based mostly upon: Cover & Thomas, “Elements of Inf. Theory”,
Fundamental limits in Information Theory Chapter 10 :
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Machine Learning CMPT 726 Simon Fraser University
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
1 Chapter 1 Introduction. 2 Outline 1.1 A Very Abstract Summary 1.2 History 1.3 Model of the Signaling System 1.4 Information Source 1.5 Encoding a Source.
Circular statistics Maximum likelihood Local likelihood Kenneth D. Harris 4/3/2015.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Information Theory and Security
Noise, Information Theory, and Entropy
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Basic Concepts in Information Theory
Introduction to information theory
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Some basic concepts of Information Theory and Entropy
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1 Discrete Structures ICS-252 Dr. Ahmed Youssef 1.
Tahereh Toosi IPM. Recap 2 [Churchland and Abbott, 2012]
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Prepared by: Amit Degada Teaching Assistant, ECED, NIT Surat
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Wireless Communication Elec 534 Set I September 9, 2007 Behnaam Aazhang.
Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School.
Optimal Bayes Classification
Coding Theory Efficient and Reliable Transfer of Information
9/26 디지털 영상통신 Mathematical Preliminaries Math Background Predictive Coding Huffman Coding Matrix Computation.
Web page: Textbook. Abbott and Dayan. Homework and grades Office Hours.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Presented by Minkoo Seo March, 2006
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.
Fisher Information and Applications MLCV Reading Group 3Mar16.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Information Complexity Lower Bounds
Learning Tree Structures
COT 5611 Operating Systems Design Principles Spring 2012
Arithmetic coding Let L be a set of items.
COT 5611 Operating Systems Design Principles Spring 2014
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Quantum Information Theory Introduction
Problem 1.
Data Exploration and Pattern Recognition © R. El-Yaniv
Presentation transcript:

Information Theory Kenneth D. Harris 18/3/2015

Information theory is… 1.Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information. (Wikipedia) 2.Information theory is probability theory where you take logs to base 2.

Morse code Code words are shortest for the most common letters This means that messages are, on average, sent more quickly.

What is the optimal code? X is a random variable Alice wants to tell Bob the value of X (repeatedly) What is the best binary code to use? How many bits does it take (on average) to transmit the value of X?

Optimal code lengths Value of XProbabilityCode word A½0 B¼10 C¼11

Entropy

Connection to physics

Conditional entropy

Mutual information

Properties of mutual information

Data processing inequality

Kullback-Leibler divergence Length of codewordLength of optimal codeword

Continuous variables Decimal placesEntropy Infinity

K-L divergence for continuous variables

Mutual information of continuous variables

Differential entropy

Maximum entropy distributions

Examples of maximum entropy distributions Data typeStatisticDistribution ContinuousMean and varianceGaussian Non-negative continuousMeanExponential ContinuousMeanUndefined AngularCircular mean and vector strength Von Mises Non-negative IntegerMeanGeometric Continuous stationary process Autocovariance functionGaussian process Point processFiring ratePoisson process

In neuroscience… We often want to compute the mutual information between a neural activity pattern, and a sensory variable. If I want to tell you the sensory variable and we both know the activity pattern, how many bits can we save? If I want to tell you the activity pattern, and we both know the sensory variable, how many bits can we save?

Estimating mutual information

Naïve estimate X=0X=1 Y=001 Y=110