Information and Entropy. Consider W discrete events with probabilities p i such that ∑ i=1 W p i =1. Shannon’s (1) measure of the amount of choice for.

Slides:



Advertisements
Similar presentations
The microcanonical ensemble Finding the probability distribution We consider an isolated system in the sense that the energy is a constant of motion. We.
Advertisements

Lecture 4A: Probability Theory Review Advanced Artificial Intelligence.
Entropy and Information Theory
Random Variables ECE460 Spring, 2012.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Information Theory EE322 Al-Sanie.
The Maxwell-Boltzmann Distribution Valentim M. B. Nunes ESTT - IPT April 2015.
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
Maximum Entropy and Fourier Transformation Nicole Rogers.
Chapter 6 Information Theory
Entropy. Optimal Value Example The decimal number 563 costs 10  3 = 30 units. The binary number costs 2  10 = 20 units.  Same value as decimal.
Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Ya Bao Fundamentals of Communications theory1 Random signals and Processes ref: F. G. Stremler, Introduction to Communication Systems 3/e Probability All.
The Statistical Interpretation of Entropy The aim of this lecture is to show that entropy can be interpreted in terms of the degree of randomness as originally.
Basic Statistics and Shannon Entropy Ka-Lok Ng Asia University.
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Basic Concepts in Information Theory
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
STATISTIC & INFORMATION THEORY (CSNB134)
2. Mathematical Foundations
Pairs of Random Variables Random Process. Introduction  In this lecture you will study:  Joint pmf, cdf, and pdf  Joint moments  The degree of “correlation”
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
§4 Continuous source and Gaussian channel
1 As we have seen in section 4 conditional probability density functions are useful to update the information about an event based on the knowledge about.
Entropy and the Second Law Lecture 2. Getting to know Entropy Imagine a box containing two different gases (for example, He and Ne) on either side of.
Thermodynamics. Spontaneity What does it mean when we say a process is spontaneous? A spontaneous process is one which occurs naturally with no external.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Channel Capacity.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
Computer Vision – Compression(1) Hanyang University Jong-Il Park.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
 How do you know how long your design is going to last?  Is there any way we can predict how long it will work?  Why do Reliability Engineers get paid.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
II. Characterization of Random Variables. © Tallal Elshabrawy 2 Random Variable Characterizes a random experiment in terms of real numbers Discrete Random.
Thermodynamics System: Part of Universe to Study. Open or Closed boundaries. Isolated. Equilibrium: Unchanging State. Detailed balance State of System:
3.The Canonical Ensemble 1.Equilibrium between a System & a Heat Reservoir 2.A System in the Canonical Ensemble 3.Physical Significance of Various Statistical.
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
Lecture 2: Statistical learning primer for biologists
4.3 More Discrete Probability Distributions NOTES Coach Bridges.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Information theory (part II) From the form g(R) + g(S) = g(RS) one can expect that the function g( ) shall be a logarithm function. In a general format,
Interactive Channel Capacity. [Shannon 48]: A Mathematical Theory of Communication An exact formula for the channel capacity of any noisy channel.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.
Information and Statistics in Nuclear Experiment and Theory - Introduction D. G. Ireland 16 November 2015 ISNET-3, ECT* Trento.
Fractiles Given a probability distribution F(x) and a number p, we define a p-fractile x p by the following formulas.
Lecture 21 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics.
Random Variables By: 1.
Statistical methods in NLP Course 2
Statistical Modelling
Probability Theory and Parameter Estimation I
Cumulative distribution functions and expected values
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
3.1 Expectation Expectation Example
Chapter 3 Discrete Random Variables and Probability Distributions
Information Based Criteria for Design of Experiments
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Probability, Random Variables, and Random Processes
Information Theory PHYS 4315 R. S. Rubins, Fall 2009.
11. Conditional Density Functions and Conditional Expected Values
Chapter 3 : Random Variables
Information Theoretical Analysis of Digital Watermarking
11. Conditional Density Functions and Conditional Expected Values
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Information and Entropy

Consider W discrete events with probabilities p i such that ∑ i=1 W p i =1. Shannon’s (1) measure of the amount of choice for the p i is H = -k ∑ i=1 W p i log p i, where k is a positive constant If p i =1/W and k= Boltzmann’s constant, then H = -k W/W log 1/W = k log W, which is the entropy of a system with W microscopic configurations= k log W Hence (using k=1), H =- ∑ i=1 M p i log p i is the Shannon’s information entropy Example: Schneider (2) notes that H is a measure of entropy/disorder/incertitude. It is a measure of information in Shannon’s sense only if considering it as the information gained by complete incertitude removal (i.e. noiseless channel) (1) C. E. Shannon. A mathematical theory of communication. Bell Sys. Tech. J., 1948A mathematical theory of communication (2) T. D. Schneider, Information Theory Primer, last updated Jan 6, 2003Information Theory Primer Shannon information entropy on discrete variables pi=pi= 1/81/21/81/4 H=1.21H=1.39 Second law of thermodynamic: The entropy of a system increases until it reaches equilibrium within the constraints imposed on it.

Information about a random variable x map taking continuous values arises from the exclusion its possible alternatives (realizations) Hence a measure of information for continuous valued x map is Info(x map ) = -log f (x map ) The expected information is then H(x map ) = -∫d  map f (  map ) log f (  map ) By noting the similarity with H =- ∑ i=1 M p i log p i for discrete variables, we see that H(x map ) = -∫d  map f (  map ) log f (  map ) is Shannon’s information entropy associated with the PDF f (  map ) for continuous variables x map Information Entropy on continuous variables

Example 1: Given knowledge that “two blue toys are in the corner of a room”, consider the following two arrangements Example 2: Given knowledge that “the PDF has mean  =0 and variance  2 =1 ”, consider the following uniform and Gaussian PDFs Hence, the prior stage of BME aims at informativeness by using all but no more general knowledge than is available, i.e. we will seek to maximize information entropy given constraints expressing general knowledge. Maximizing entropy given knowledge constraints (a) (b) Out of these two arrangements, arrangement (a) maximizes entropy given the knowledge constraint, hence given our knowledge, it is the most likely toy arrangement (would kids produce (b)?) Out of these two PDFs, the Gaussian PDF maximizes information entropy given the knowledge constraint that  =0 and  2 =1 Uniform:  2 =1, H= 1.24 Gaussian:  2 =1, H= 1.42