Lecture 3 Appendix 1 Computation of the conditional entropy.

Slides:



Advertisements
Similar presentations
Grand Canonical Ensemble and Criteria for Equilibrium
Advertisements

ST3236: Stochastic Process Tutorial 3 TA: Mar Choong Hock Exercises: 4.
Binary Symmetric channel (BSC) is idealised model used for noisy channel. symmetric p( 01) =p(10)
II. Linear Block Codes. © Tallal Elshabrawy 2 Last Lecture H Matrix and Calculation of d min Error Detection Capability Error Correction Capability Error.
Chain Rules for Entropy
1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)
Entropy Rates of a Stochastic Process
1. Markov Process 2. States 3. Transition Matrix 4. Stochastic Matrix 5. Distribution Matrix 6. Distribution Matrix for n 7. Interpretation of the Entries.
Class notes for ISE 201 San Jose State University
Information Theory and Security pt. 2. Lecture Motivation Previous lecture talked about a way to measure “information”. In this lecture, our objective.
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
June 1, 2004Computer Security: Art and Science © Matt Bishop Slide #32-1 Chapter 32: Entropy and Uncertainty Conditional, joint probability Entropy.
Calculating the entropy on-the-fly Daniel Lewandowski Faculty of Information Technology and Systems, TU Delft.
Dirac Notation and Spectral decomposition
To accompany Quantitative Analysis for Management, 8e by Render/Stair/Hanna Markov Analysis.
Information Theory and Security
Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.
Lecture II-2: Probability Review
Separate multivariate observations
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
§1 Entropy and mutual information
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Spring 2013 Solving a System of Linear Equations Matrix inverse Simultaneous equations Cramer’s rule Second-order Conditions Lecture 7.
Entropy Rate of a Markov Chain
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
§4 Continuous source and Gaussian channel
The Binomial Theorem.
1 1 Slide Decision Theory Professor Ahmadi. 2 2 Slide Learning Objectives n Structuring the decision problem and decision trees n Types of decision making.
Channel Capacity.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
Mean and Standard Deviation of Discrete Random Variables.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Dominance Since Player I is maximizing her security level, she prefers “large” payoffs. If one row is smaller (element- wise) than another,
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 5.2: Recap on Probability Theory Jürgen Sturm Technische Universität.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
1 Topic 5 - Joint distributions and the CLT Joint distributions –Calculation of probabilities, mean and variance –Expectations of functions based on joint.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Lecture 14 Prof. Dr. M. Junaid Mughal Mathematical Statistics 1.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Presented by Minkoo Seo March, 2006
Jointly distributed Random variables Multivariate distributions.
Hidden Markov Models Sean Callen Joel Henningsen.
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
7.2 Means & Variances of Random Variables AP Statistics.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
Table of Contents Matrices - Definition and Notation A matrix is a rectangular array of numbers. Consider the following matrix: Matrix B has 3 rows and.
Random Variables If  is an outcome space with a probability measure and X is a real-valued function defined over the elements of , then X is a random.
Section 16.1 Definite Integral of a Function of Two Variables.
2.1 Matrix Operations 2. Matrix Algebra. j -th column i -th row Diagonal entries Diagonal matrix : a square matrix whose nondiagonal entries are zero.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Basics of Multivariate Probability
CSE 4705 Artificial Intelligence
Statistics Lecture 19.
Cumulative distribution functions and expected values
Chapter 32: Entropy and Uncertainty
Some Rules for Expectation
2. Matrix Algebra 2.1 Matrix Operations.
Byoung-Tak Zhang Summarized by HaYoung Jang
Entropy and Uncertainty
Linear Algebra Lecture 6.
Presentation transcript:

Lecture 3 Appendix 1 Computation of the conditional entropy

Example Let (X,Y) have the following joint distribution Then H(X)=(1/8, 7/8)=0,544 bits, H(X|Y=1)=0 bits and H(X|Y=2)=1 bit. We calculate H(X|Y)=3/4 H(X|Y=1)+1/4 H(X|Y=2)=0.25 bits. Thus the uncertainty in X is increased if Y=2 is observed and decreased if Y=1 is observed, but uncertainty decreases on the average. Y X 1 0 3/4 2 1/8 1/8 1 2

Computation of Conditional Entropy To compute the conditional entropy in this example, we recall its definition: Where p(x|y=b) is the conditional probability, defined as: Where p(x=a,y=b) is the joint probability, that is represented in a matrix such as the one in the previous example, and where p(y=b) is the marginal probability: That is, the probability of X taking on a particular value x=a is the sum of the joint probabilities of this outcome for X and all possible outcomes for Y (summing the elements in the corresponding row of the matrix in the example).

Conditional Entropy (Example) Referring to the example in the first slide, we have that: