Presentation is loading. Please wait.

Presentation is loading. Please wait.

Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006.

Similar presentations


Presentation on theme: "Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006."— Presentation transcript:

1 Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006

2 2 Why entropy? Which entropy? Bayes’ rule Maximum Entropy (ME) Overview: Bayes’ theorem vs. Bayes’ rule Compatibility with ME? Bayes’ rule is a special case of ME updating. The logic behind the ME method: axioms, etc. Candidates:relative entropy Renyi Tsallis

3 3 Entropy and heat, multiplicities,... Clausius, Maxwell, Boltzmann, Gibbs,... Entropy as a measure of information: MaxEnt Shannon, Jaynes, Kullback,.... Entropy as a tool for updating: ME Shore & Johnson, Skilling, Csiszar,.... Bayes’ rule as a special case of ME: Williams, Diaconis & Zabell,...

4 4 An important distinction: MaxEnt is a method to assign probabilities. measure prior ME is a method to update probabilities.

5 5 The logic behind the ME method The goal: To update from old beliefs to new beliefs when new information becomes available. ?? Information is what induces a change in beliefs. constraints the prior q(x) the posterior p(x) Information is what constrains beliefs.

6 6 Question: How do we select a distribution from among all those that satisfy the constraints? Skilling: Rank the distributions according to preference. Transitivity: if is better than, and is better than, then is better than. To each P assign a real number S[P] such that

7 7 This answers the question “Why entropy?” Remarks: Entropies are real and are maximized by design. Next question: How do we select the functional S[P] ? Answer: Use induction. We want to generalize from special cases where we know the best distribution to all other cases.

8 8 Skilling’s method of induction: If enough special cases are known the general theory is constrained completely.* The known special cases are called the axioms. If a general theory exists it must apply to special cases. If a special case is known, it can be used to constrain the general theory. * But if too many the general theory might not exist.

9 9 How do we choose the axioms? Basic principle:minimal updating Prior information is valuable; do not waste it. Only update those features for which there is hard evidence. Shore & Johnson, Skilling, Karbelkar, Uffink, A.C.,...

10 10 Axiom 1: Locality Local information has local effects. If the information does not refer to a domain D, then p(x|D) is not updated. Consequence: Axiom 2: Coordinate invariance Coordinates carry no information. Consequence: invariants

11 11 To determine m(x) use Axiom 1 (Locality) again: If there is no new information there is no update. Consequence:, the prior. Axiom 3: Consistency for independent systems When systems are independent it should not matter whether they are treated jointly or separately. Consequence: caution!! To determine we need a new axiom:

12 12 Implementing Axiom 3 Single system 1: Maximize subject to. The selected posterior is. Single system 2: Maximize subject to, to select.

13 13 Combined system 1+2: Maximize subject to the same constraints and. Consequence: Alternative 1: (Shore &Johnson, AC) Require that the posterior be

14 14 Impose the additional constraint Consequence: Alternative 2: (Karbelkar, Uffink) Require that the posterior be

15 15 It appears there is a continuum of η-entropies. How can we live with η-dependent updating? We just need more known special cases to single out a unique S[P,q]. Is this an insurmountable problem? The solution: We are doing induction. NO!!

16 16 What could this “inference index” η be? Is it a property of the system or the subject? Suppose η is a property of the system. But... the derivation implicitly assumed that the independent systems had the same η !! the same !! Different systems could have different ηs.

17 17 Independent systems with different ηs Single system 1: use Single system 2: use Combined system 1+2: use with some undetermined η. But this is equivalent to using and Therefore and

18 18 Consistency requires that η be a universal constant. What is the value of η? Measure it ! For any thermodynamical system Conclusion: The only consistent ranking criterion for updating probabilities is

19 19 Bayesian updating Bayes’ theorem: This is just consistency; there has been no updating. The actual updating occurs when we use the observed data X.

20 20 observed data Bayes’ rule (update) Bayes’ theorem (consistency) priorposterior

21 21 Bayes’ rule from ME Maximize the appropriate entropy subject to the right constraints plus normalization. This is an ∞ number of constraints: one for each x.

22 22 The joint posterior is so that the new marginal for θ is which is Bayes’ rule !!

23 23 Conclusions and remarks Entropy is the unique tool for updating probabilities. Basic principle: Minimal updating Entropy needs no interpretation. Bayes is a special case of ME. Information is what constrains beliefs.


Download ppt "Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006."

Similar presentations


Ads by Google