Download presentation
Presentation is loading. Please wait.
Published byDana Baldwin Modified over 9 years ago
1
Bayesian Reasoning: Maximum Entropy A/Prof Geraint F. Lewis Rm 560: gfl@physics.usyd.edu.au
2
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Common Sense We have spent quite a bit of time exploring the posterior probability distribution, but, of course, to calculate this we need to use the likelihood function and our prior knowledge. However, how our prior knowledge is encoded is the biggest source of argument about Bayesian statistics, with cries of subjective choice influencing outcomes (but shouldn’t this be the case?) Realistically, we could consider a wealth of prior probability distributions that agree with constraints (i.e. the mean is specified), but which do we choose? Answer: we pick the one which is maximally non-committal about missing information.
3
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Shannon’s theorem In 1948, Shannon developed a measure on the uncertainty of a probability distribution which he labeled Entropy. He showed that the uncertainty of a discrete probability distribution is Jaynes argued that the maximally non-committal probability distribution is the one with the maximum entropy; hence, of all possible probability distributions we should choose the one that maximizes S. The other distributions will imply some sort of correlation (we’ll see this in a moment).
4
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Example You are told that an experiment has two possible outcomes; what is the maximally non-committal distribution you should assign to the two outcomes? Clearly, if we assign p 1 =x, then p 2 =(1-x) and the entropy is The maximum value of the entropy occurs at p 1 =p 2 =1/2. But isn’t this what you would have guessed? If we have any further information (i.e. the existence of any correlations between the outcome of 1 and 2) we can build this into our measure above and re-maximize.
5
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture The Kangaroo Justification Suppose you are given some basic information about the population of Australian kangaroos; 1) 1/3 of kangaroos have blue eyes 2) 1/3 of kangaroos are left handed How many kangaroos are blue eyed and left handed? We know that; Blue eyesLeft-Handed TrueFalse True p1p1 p2p2 False p3p3 p4p4
6
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture The Kangaroo Justification What are the options? 1)Independent case (no correlation) 2)Maximal positive correlation 3)Maximal negative correlation
7
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture The Kangaroo Justification So there are a range of potential p 1 values (which set all the other values), but which do we choose? Again, we wish to be non-committal and not assume any prior correlations (unless we have evidence to support any particular prior). What constraint can we put on {p i } to select this particular case; So the variational function that selects the non-committal case is the entropy. As we will see, this is very important for image reconstruction. Variation functionOptimal zImplied Correlation - p i ln( p i ) 1/9uncorrleated - p i 2 1/12negative ln( p i ) 0.1303positive p i 1/2 0.1218positive
8
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Incorporating a prior Section 8.4 of the textbook discusses a justification of the MaxEnt approach, considering the rolling of a weighted die and examining the “multiplicity” of the outcomes (i.e. some potential outcomes are more likely than others). Suppose you have some prior information you want to incorporate some prior information into entropy measure, so we have {m i } prior estimates of our probabilities {p i }, following the arguments we see that the quantity we want to maximize is the Shannon-Jaynes entropy If m i are equal, this has no influence on the maximization; we will see this is important in considering image reconstruction.
9
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Incorporating a prior When considering a continuous probability distribution, the entropy becomes where m(y) is known as the Lebesgue measure. This quantity (which still encodes our prior) ensure that the entropy is insensitive to a change of coordinates (as m(x) and p(x) transform the same way).
10
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Some examples Suppose you are told some experiment has n possible outcomes. Without further information, what prior distribution would you assign the outcomes? Your prior estimates of the outcomes (without additional information) would be to assign {m i } = 1/n; what does MaxEnt say the values of {p i } should be? The quantity we maximize is our entropy with a Lagrange Multiplier to account for the constraint on the probability;
11
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Some examples Taking the (partial) derivative of Sc with respect to the p i and multiplier, we can show that and so All that is left is to evaluate, which we get from the constraint so Given the constraints on {m i }, =-1 and {m i }= {p i }.
12
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Nicer examples What if you have additional constraints, such as knowing the mean of the outcome? Then your constrained entropy is Where we now have two Lagrange multipliers, one each for each of the constraints. Through the same procedure, we can look for the maximum, and find; Generally, solving for either is difficult analytically, but is straight- forward numerically.
13
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Nicer examples Suppose that you are told that a die has a mean score of dots per roll; what is the probability weighting of each face? If these are equal, the die is unweighted and fair. If, however, the probabilities are different, we should suppose that the die is unfair. If =3.5, it’s easy to show from the constraints that 0 =-1 and 1 =0 (write out the two constraints in terms of the previous equation and divide out 0 ). If we have no prior reason to thing otherwise, each face would be weighted equally and so the final result is that {p i } = {m i }. The result is as we expect; the an (unweighted) average of 3.5, the most probable distribution is that all faces have equal weight.
14
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Nicer examples Suppose, however, you were told that the mean was =4.5, what is the most probable distribution for {p i }? We can follow the same procedure as in the previous example, but now find that 0 =-0.37 and 1 =0.49; with this, the distribution in {p i } is As we would expect, the distribution is now skewed to the higher die faces (increasing the mean on a sequence of rolls).
15
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Additional constraints Additional information will provide additional constraints on the probability distribution. If we know a mean and a variance, then; Given what we have seen previously, we should expect the solution to be of the form (when taking the continuum limit) of Which, when appropriately normalized, is the (expected) Gaussian distribution.
16
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction In science, we are interested in gleaning underlying physical properties from data sets, although in general data contains signals which are blurred (through optics or physical effects), with added noise (such as photon arrival time or detector noise). So, how do we extract our image from the blurry, noisy data?
17
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction Naively, you might assume that you can simply “invert” the process and recover the original image. However, the problem is ill-posed, and a “deconvolution” will amplify the noise in a (usually) catastrophic way. We could attempt to suppress the noise (e.g. Wiener filtering) but isn’t there another way?
18
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction Our image consists of a series of pixels, each with a photon count of I i. We can treat this as a probability distribution, such that The value in each pixel, therefore, is the probability that the next photon will arrive in that pixel. Note that for an image, p i ≥0, and so we are dealing with a “positive, additive distribution” (note, this is important, as some techniques like to add negative flux in regions to improve a reconstruction).
19
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction We can apply Bayes theorem to calculate the posterior probability of a proposed “true” image, Im i, from the data. Following the argument given in the text, we see that
20
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction So we aim to maximize The method, therefore, requires us to have a method for generating proposal images (i.e. throwing down blobs of light), convolving with our blurring function (to give I ji ) and comparing to the data through 2. The requirements on p i ensures that proposal image is everywhere positive (which is good!). What does the entropy term do? It provides a “regularization” which drives the solution towards our prior distribution (m i ) while the 2 drives a fit to the data. Note, however, we sometimes need to add additional regularization terms to enforce smoothness on the solution.
21
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction Here is an example of MaxEnt reconstruction with differing point- spread functions (psf) and added noise. Exactly what you get back depends on the quality of your data, in each case you can read the recovered message.
22
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction Reconstruction of the radio galaxy M87 (Bryan & Skilling 1980) using MaxEnt. Note the reduction in the noise and higher detail visible in the radio jet.
23
Lecture 8 http://www.physics.usyd.edu.au/~gfl/Lecture Image Reconstruction Not always a good thing!! (MaxEnt)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.