Information theory (part II) From the form g(R) + g(S) = g(RS) one can expect that the function g( ) shall be a logarithm function. In a general format,

Slides:



Advertisements
Similar presentations
Grand Canonical Ensemble and Criteria for Equilibrium
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
The microcanonical ensemble Finding the probability distribution We consider an isolated system in the sense that the energy is a constant of motion. We.
Heat capacity The electronic heat capacity C e can be found by taking the derivative of Equation (19.18): For temperatures that are small compared with.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 3 Classical Statistics of Maxwell-Boltzmann
Entropy in the Quantum World Panagiotis Aleiferis EECS 598, Fall 2001.
Maximum Entropy and Fourier Transformation Nicole Rogers.
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Optimization using Calculus
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Entropy. Optimal Value Example The decimal number 563 costs 10  3 = 30 units. The binary number costs 2  10 = 20 units.  Same value as decimal.
Evaluating Hypotheses
1 Probability Distributions GTECH 201 Lecture 14.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
The Statistical Interpretation of Entropy The aim of this lecture is to show that entropy can be interpreted in terms of the degree of randomness as originally.
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
1 Seventh Lecture Error Analysis Instrumentation and Product Testing.
Information Theory and Security
STATISTIC & INFORMATION THEORY (CSNB134)
Thermodynamic principles JAMES WATT Lectures on Medical Biophysics Dept. Biophysics, Medical faculty, Masaryk University in Brno.
Unit 4: Mathematics Introduce the laws of Logarithms. Aims Objectives
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Lecture for Week Spring.  Numbers can be represented in many ways. We are familiar with the decimal system since it is most widely used in everyday.
Bell Work: Find the values of all the unknowns: R T = R T T + T = 60 R = 3 R =
Entropy and the Second Law Lecture 2. Getting to know Entropy Imagine a box containing two different gases (for example, He and Ne) on either side of.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Probability Rules!. ● Probability relates short-term results to long-term results ● An example  A short term result – what is the chance of getting a.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Logarithmic Functions & Graphs, Lesson 3.2, page 388 Objective: To graph logarithmic functions, to convert between exponential and logarithmic equations,
Section 4.4 Indeterminate Forms and L’Hospital’s Rule Applications of Differentiation.
Channel Capacity.
Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture /09/05 Prof. Pushpak Bhattacharyya.
Chemical Reactions in Ideal Gases. Non-reacting ideal gas mixture Consider a binary mixture of molecules of types A and B. The canonical partition function.
THEORY The argumentation was wrong. Halting theorem!
CHAPTER 10: Introducing Probability ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
ID3 Algorithm Michael Crawford.
Prepared by: Engr. Jo-Ann C. Viñas 1 MODULE 2 ENTROPY.
Quantum Two 1. 2 Angular Momentum and Rotations 3.
Entropy (YAC- Ch. 6)  Introduce the thermodynamic property called Entropy (S)  Entropy is defined using the Clausius inequality  Introduce the Increase.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Applications of Differentiation Section 4.9 Antiderivatives
Stat 31, Section 1, Last Time Big Rules of Probability –The not rule –The or rule –The and rule P{A & B} = P{A|B}P{B} = P{B|A}P{A} Bayes Rule (turn around.
Entropy Change (at Constant Volume) For an ideal gas, C V (and C P ) are constant with T. But in the general case, C V (and C P ) are functions of T. Then.
An Introduction to Statistical Thermodynamics. ( ) Gas molecules typically collide with a wall or other molecules about once every ns. Each molecule has.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
R.Kass/F02 P416 Lecture 1 1 Lecture 1 Probability and Statistics Introduction: l The understanding of many physical phenomena depend on statistical and.
President UniversityErwin SitompulPBST 4/1 Dr.-Ing. Erwin Sitompul President University Lecture 4 Probability and Statistics
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Chapter 2. Thermodynamics, Statistical Mechanics, and Metropolis Algorithm (2.1~2.5) Minsu Lee Adaptive Cooperative Systems, Martin Beckerman.
Ch 9.6: Liapunov’s Second Method In Section 9.3 we showed how the stability of a critical point of an almost linear system can usually be determined from.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Announcements Topics: -sections 6.4 (l’Hopital’s rule), 7.1 (differential equations), and 7.2 (antiderivatives) * Read these sections and study solved.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Chapter 13 Classical & Quantum Statistics
Copyright © Cengage Learning. All rights reserved.
Chapter 7. Classification and Prediction
Chapter-2 Maxwell-Boltzmann Statistics.
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Chapter 20 Information Theory
Virtual University of Pakistan
Homework Solution Draw out all 15 possible microstates for a system of three molecules with equally-spaced (1 quantum) energy levels and four quanta of.
Information Theory PHYS 4315 R. S. Rubins, Fall 2009.
Statistical Mechanics and Canonical Ensemble
Information Theoretical Analysis of Digital Watermarking
Grand Canonical Ensemble and Criteria for Equilibrium
Presentation transcript:

Information theory (part II) From the form g(R) + g(S) = g(RS) one can expect that the function g( ) shall be a logarithm function. In a general format, the function can be written as g(x) = A ln(x) + C, where A and C are constants. From the earlier transformation g(x) = x f(1/x), one gets that the uncertainty quantity, H, shall be (1/p)f(p) = A ln(p) + C, where p = 1/n (n is the total number of event). Therefore, f(p) = A*p*ln(p) + C Given that if the probability is 1, the uncertainty H must be zero, the constant C should be equal to ZERO. Thus, f(p) = A*p*ln(p). Since p is smaller than 1, ln(p) shall be minus and thus the constant A is inherently negative

Following conventional notion, we write f(p) = -K*p*ln(p), where K is a positive coefficient. The uncertainty quantity H(p 1, p 2, …p n ) = Σ f(p i ) Thus, H(p 1, p 2, …p n ) = Σ –K*p i *ln(p i ) = -K Σp i *ln(p i ) Example: H(1/2, 1/3, 1/6) = -K*[1/2ln(1/2) + 1/3*ln(1/3) + 1/6*ln(1/6)] = -K*( – – 0.299) = 1.01K from the decomposed procedure, H(1/2,1/2) + 1/2H(2/3, 1/3) = -K*[1/2ln(1/2) + 1/2ln(1/2)] -1/2*K*[2/3*ln(2/3) + 1/3ln(1/3)] = -K( ) – K/2*(-0.27 – 0.366) = 1.01K For equal probable events, p i = 1/n, H = K*ln(n)

In a binary case, where two possible outcomes of an experiment with probabilities, p 1 and p 2 with p 1 + p 2 = 1 H = - K*[p 1 ln (p 1 ) + p 2 ln (p 2 )] To determine H value when p1 is 0 or 1, one need L’Hopital’s rule lim[u(x)/v(x)] as x approaches 0 equals lim[u’(x)/v’(x)] Therefore, as p1 approaches 0, lim[p 1 ln (p 1 )] = lim[(1/x)/(-1/x 2 )] = 0 The uncertainty is therefore 0 when either p1 or p2 is zero! Under what value of p1 while H reaches the maximum? differentiate eq - K*[p 1 ln (p 1 ) + p 2 ln (p 2 )] against p1 and set the derivative equal 0 dH/dp1 = -K*[ ln (p 1 ) + p 1 / (p 1 ) - ln (1- p 1 ) – (1- p 1 )/ (1- p 1 )] = 0 which leads to p1 = 1/2

Unit of Information Choosing 2 as the basisof the logarithm and take K = 1, one gets H = 1 We call the unit of information a bit for binary event. Decimal digit, H = log2(10) = 3.32, thus a decimal digit contains about 3 and 1/3 bits of information.

Linguistics A more refined analysis works in terms of component syllables. One can test what is significant in a syllable in speech by swapping syllables and seeing if meaning or tense is changed or lost. The table gives some examples of the application of this statistical approach to some works of literature.

Linguistics The type of interesting results that arise from such studies include: (a) English has the lowest entropy of any major language, and (b) Shake speare’s work has the lowest entropy of any author studied. These ideas are now progressing beyond the.scientific level and are impinging on new ideas of criticism. Here as in biology, the thermodynamic notions can be helpful though they must be applied with caution because concepts such as ‘quality’ cannot be measured as they are purely subjective

Maximum entropy The amount of uncertainty: Examples on the connection between entropy and uncertainty (gases in a partitioned container) The determination of the probability that has maximum entropy.

Suppose one knows the mean value of some particular variable x, Where the unknown probabilities satisfy the condition: In general there will be a large number of probability distributions consistent with the above information. We will determine the one distribution which yields the largest uncertainty (i.e. information). We need Lagrange multiplier to carry out the analysis

Where and Then, Solving for ln(pi):

Determine the new Lagrange multipliers λ and u So that We define the partition function Then

Determine multiplier u We have Therefore,

The connection to statistical thermodynamics The entropy is define as Then

A disordered system is likely to be in any number of different quantum states. If Nj = 1 for N different states and Nj= 0 for all other available states, The above function is positive and increases with increasing N. Associating Nj/N with the probability pj The expected amount of information we would gain is a measure of our lack of knowledge of the state of the system. Negative entropy (negentropy)

The boltzmann distribution for non-degenerate energy state Where

Summary Information theory is an extension of thermodynamics and probability theory. Much of the subject is associated with the names of Brillouin and Shannon. It was originally concerned with passing messages on telecom munication systems and with assessing the efficiency of codes. Today it is applied to a wide range of problems, ranging from the analysis of language to the design of computers. In this theory the word ‘information’ is used in a special sense. Sup pose that we are initially faced with a problem about which we have no ‘information’ and that there are P possible answers. When we are given some ‘information’ this has the effect of reducing the number of possible answers and if we are given enough ‘information’ we may get to a unique answer. The effect of increased information is thus to reduce the uncer tainty about a situation. In a sense, therefore, informatiouis the antithesis of entropy since entropy is a measure of the randomness or disorder of a system. This contrast led to the coining of the word rtegentropy to describe information. The basic unit of information theory is the bit—a shortened form of ‘binary digit’.

For example, if one is given a playing card face down without any information, it could be any one of 52; if one is then told that it is an ace, it could be any one of 4; if told that it is also a spade, one knows for certain which card one has. As we are given more information, The situation becomes more certain. In general, to determine which of the P possible outcomes is realized, the required information I is defined as H = K ln P