Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006.

Slides:



Advertisements
Similar presentations
Reasons for (prior) belief in Bayesian epistemology
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Pattern Recognition and Machine Learning
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
An Introduction to Bayesian Inference Michael Betancourt April 8,
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.
Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology Roderick C. Dewar Research School of Biological Sciences The Australian.
Dutch books and epistemic events Jan-Willem Romeijn Psychological Methods University of Amsterdam ILLC 2005 Interfacing Probabilistic and Epistemic Update.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 5: Learning models using EM
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Presenting: Assaf Tzabari
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
Kevin H Knuth Game Theory 2009 Automating the Processes of Inference and Inquiry Kevin H. Knuth University at Albany.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference
Introduction to Bayesian Parameter Estimation
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Review: Probability Random variables, events Axioms of probability
Unique additive information measures – Boltzmann-Gibbs-Shannon, Fisher and beyond Peter Ván BME, Department of Chemical Physics Thermodynamic Research.
The Development of Decision Analysis Jason R. W. Merrick Based on Smith and von Winterfeldt (2004). Decision Analysis in Management Science. Management.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Vesslin I. Dimitrov on Shannon-Jaynes Entropy and Fisher Information July 13, 2007 (154)
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Bayesian statistics Probabilities for everything.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Making sense of randomness
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Univariate Gaussian Case (Cont.)
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Ariel Caticha on Information and Entropy July 8, 2007 (16)
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Univariate Gaussian Case (Cont.)
Review of Probability.
Today.
Data Mining Lecture 11.
Latent Variables, Mixture Models and EM
CHAPTER 4 Designing Studies
Ariel Caticha From Information Geometry to Newtonian Dynamics
More about Posterior Distributions
CSE-490DF Robotics Capstone
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CHAPTER 4 Designing Studies
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
CHAPTER 4 Designing Studies
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 09: BAYESIAN LEARNING
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CS639: Data Management for Data Science
CHAPTER 4 Designing Studies
Applied Statistics and Probability for Engineers
Presentation transcript:

Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006

2 Why entropy? Which entropy? Bayes’ rule Maximum Entropy (ME) Overview: Bayes’ theorem vs. Bayes’ rule Compatibility with ME? Bayes’ rule is a special case of ME updating. The logic behind the ME method: axioms, etc. Candidates:relative entropy Renyi Tsallis

3 Entropy and heat, multiplicities,... Clausius, Maxwell, Boltzmann, Gibbs,... Entropy as a measure of information: MaxEnt Shannon, Jaynes, Kullback,.... Entropy as a tool for updating: ME Shore & Johnson, Skilling, Csiszar,.... Bayes’ rule as a special case of ME: Williams, Diaconis & Zabell,...

4 An important distinction: MaxEnt is a method to assign probabilities. measure prior ME is a method to update probabilities.

5 The logic behind the ME method The goal: To update from old beliefs to new beliefs when new information becomes available. ?? Information is what induces a change in beliefs. constraints the prior q(x) the posterior p(x) Information is what constrains beliefs.

6 Question: How do we select a distribution from among all those that satisfy the constraints? Skilling: Rank the distributions according to preference. Transitivity: if is better than, and is better than, then is better than. To each P assign a real number S[P] such that

7 This answers the question “Why entropy?” Remarks: Entropies are real and are maximized by design. Next question: How do we select the functional S[P] ? Answer: Use induction. We want to generalize from special cases where we know the best distribution to all other cases.

8 Skilling’s method of induction: If enough special cases are known the general theory is constrained completely.* The known special cases are called the axioms. If a general theory exists it must apply to special cases. If a special case is known, it can be used to constrain the general theory. * But if too many the general theory might not exist.

9 How do we choose the axioms? Basic principle:minimal updating Prior information is valuable; do not waste it. Only update those features for which there is hard evidence. Shore & Johnson, Skilling, Karbelkar, Uffink, A.C.,...

10 Axiom 1: Locality Local information has local effects. If the information does not refer to a domain D, then p(x|D) is not updated. Consequence: Axiom 2: Coordinate invariance Coordinates carry no information. Consequence: invariants

11 To determine m(x) use Axiom 1 (Locality) again: If there is no new information there is no update. Consequence:, the prior. Axiom 3: Consistency for independent systems When systems are independent it should not matter whether they are treated jointly or separately. Consequence: caution!! To determine we need a new axiom:

12 Implementing Axiom 3 Single system 1: Maximize subject to. The selected posterior is. Single system 2: Maximize subject to, to select.

13 Combined system 1+2: Maximize subject to the same constraints and. Consequence: Alternative 1: (Shore &Johnson, AC) Require that the posterior be

14 Impose the additional constraint Consequence: Alternative 2: (Karbelkar, Uffink) Require that the posterior be

15 It appears there is a continuum of η-entropies. How can we live with η-dependent updating? We just need more known special cases to single out a unique S[P,q]. Is this an insurmountable problem? The solution: We are doing induction. NO!!

16 What could this “inference index” η be? Is it a property of the system or the subject? Suppose η is a property of the system. But... the derivation implicitly assumed that the independent systems had the same η !! the same !! Different systems could have different ηs.

17 Independent systems with different ηs Single system 1: use Single system 2: use Combined system 1+2: use with some undetermined η. But this is equivalent to using and Therefore and

18 Consistency requires that η be a universal constant. What is the value of η? Measure it ! For any thermodynamical system Conclusion: The only consistent ranking criterion for updating probabilities is

19 Bayesian updating Bayes’ theorem: This is just consistency; there has been no updating. The actual updating occurs when we use the observed data X.

20 observed data Bayes’ rule (update) Bayes’ theorem (consistency) priorposterior

21 Bayes’ rule from ME Maximize the appropriate entropy subject to the right constraints plus normalization. This is an ∞ number of constraints: one for each x.

22 The joint posterior is so that the new marginal for θ is which is Bayes’ rule !!

23 Conclusions and remarks Entropy is the unique tool for updating probabilities. Basic principle: Minimal updating Entropy needs no interpretation. Bayes is a special case of ME. Information is what constrains beliefs.