Ariel Caticha on Information and Entropy July 8, 2007 (16)

Slides:



Advertisements
Similar presentations
Reasons for (prior) belief in Bayesian epistemology
Advertisements

1 WHY MAKING BAYESIAN NETWORKS BAYESIAN MAKES SENSE. Dawn E. Holmes Department of Statistics and Applied Probability University of California, Santa Barbara.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Lahore University of Management Sciences, Lahore, Pakistan Dr. M.M. Awais- Computer Science Department 1 Lecture 12 Dealing With Uncertainty Probabilistic.
Decision Making Under Uncertainty CSE 495 Resources: –Russell and Norwick’s book.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Bayesian Decision Theory
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
An Introduction to Bayesian Inference Michael Betancourt April 8,
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
CS 589 Information Risk Management 6 February 2007.
Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology Roderick C. Dewar Research School of Biological Sciences The Australian.
Dutch books and epistemic events Jan-Willem Romeijn Psychological Methods University of Amsterdam ILLC 2005 Interfacing Probabilistic and Epistemic Update.
1 Empirical Similarity and Objective Probabilities Joint works of subsets of A. Billot, G. Gayer, I. Gilboa, O. Lieberman, A. Postlewaite, D. Samet, D.
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
I The meaning of chance Axiomatization. E Plurbus Unum.
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
Kevin H Knuth Game Theory 2009 Automating the Processes of Inference and Inquiry Kevin H. Knuth University at Albany.
The Statistical Interpretation of Entropy The aim of this lecture is to show that entropy can be interpreted in terms of the degree of randomness as originally.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
The Development of Decision Analysis Jason R. W. Merrick Based on Smith and von Winterfeldt (2004). Decision Analysis in Management Science. Management.
ECO290E: Game Theory Lecture 12 Static Games of Incomplete Information.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
National Accounts and SAM Estimation Using Cross-Entropy Methods Sherman Robinson.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Vesslin I. Dimitrov on Shannon-Jaynes Entropy and Fisher Information July 13, 2007 (154)
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
Uncertainty Management in Rule-based Expert Systems
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Making sense of randomness
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
Once again about the science-policy interface. Open risk management: overview QRAQRA.
4 Proposed Research Projects SmartHome – Encouraging patients with mild cognitive disabilities to use digital memory notebook for activities of daily living.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
SCIENTIFIC METHOD NATURE OF SCIENCE AND EXPERIMENTAL DESIGN VANCE
Review of Probability.
Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.
Short review of probabilistic concepts
Today.
Data Mining Lecture 11.
Information Based Criteria for Design of Experiments
Ariel Caticha From Information Geometry to Newtonian Dynamics
CSE-490DF Robotics Capstone
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 09: BAYESIAN LEARNING
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

Ariel Caticha on Information and Entropy July 8, 2007 (16)

E. T. Jaynes “Information Theory and Statistical Mechanics” Physical Review, 1957

Information and Entropy Ariel Caticha Department of Physics University at Albany - SUNY MaxEnt 2007

4 Preliminaries: Goal: reasoning with incomplete information degrees of rational belief Problem 1: description of a state of knowledge consistent web of beliefs probabilities

5 Problem 2: Updating probabilities when new information becomes available. What is information? Why entropy? Which entropy? Are Bayesian and Entropy methods compatible? Updating methods: Entropy Bayes’ rule

6 Entropy and heat, multiplicities, disorder... Clausius, Maxwell, Boltzmann, Gibbs,... Entropy as a measure of information: MaxEnt Shannon, Jaynes, Kullback, Renyi,.... Entropy as a tool for updating: M.E. Shore & Johnson, Skilling, Csiszar,....

7 An important distinction: MaxEnt is a method to assign probabilities. measure prior Bayes’ rule is a method to update probabilities.

8 Our goal: M.E. is an updating method that allows both. prior MaxEnt allows arbitrary constraints but no priors. Bayes allows arbitrary priors but no constraints.

9 The logic behind the M.E. method The goal: To update from old beliefs to new beliefs when new information becomes available. ?? Information is what induces a change in beliefs. constraints the prior q(x) the posterior p(x) Information is what constrains rational beliefs.

10 An analogy from mechanics: Information is what induces a change of rational beliefs. initial state of motion. final state of motion. Force Force is whatever induces a change of motion:

11 Question: How do we select a distribution from among all those that satisfy the constraints? Skilling: Rank the distributions according to preference. Transitivity: if is better than, and is better than, then is better than. To each p assign a real number S[p,q] such that

12 This answers the question “Why an entropy?” Remarks: Entropies are real numbers designed to be maximized. Next question: How do we select the functional S[p,q] ? Answer: Use induction. We want to generalize from special cases where the best distribution is known to all other cases.

13 Skilling’s method of induction: If enough special cases are known the general theory is constrained completely.* The known special cases are called the axioms. If a general theory exists it must apply to special cases. If a special case is known, it can be used to constrain the general theory. * But if too many the general theory might not exist.

14 How do we choose the axioms? Basic principle:Minimal Updating Prior information is valuable. Beliefs should be revised only to the extent required by new evidence. Shore & Johnson, Skilling, Karbelkar, Uffink, A.C.,...

15 Axiom 1: Locality Local information has local effects. If the information does not refer to a domain D, then p(x|D) is not updated. Consequence: Axiom 2: Coordinate invariance Coordinates carry no information. Consequence: invariants

16 To determine m(x) use Axiom 1 (Locality) again: If there is no new information there is no update. Consequence: is the prior. To determine we need more special cases.

17 Consequence: Axiom 3: Consistency for independent systems When systems are known to be independent it should not matter whether they are treated jointly or separately.

18 We have a one-dimensional continuum of η-entropies. Can we live with η-dependent updating? We just need more known special cases to single out a unique S[p,q]. NO!! Suppose different systems could have different ηs.

19 Independent systems with different ηs Single system 1: use Single system 2: use Combined system 1+2: use with some undetermined η. But this is equivalent to using and Therefore and

20 Conclusion: η must be a universal constant. What is the value of η? We need more special cases! Hint: For large N we do not need entropy. We can use the law of large numbers.

21 Axiom 4: Consistency with large numbers Multinomial distribution: For large N : and (in probability)

22 Conclusion: The only consistent ranking criterion for updating probabilities is Other entropies may be useful for other purposes, but for updating the only candidate of general applicability is the logarithmic relative entropy.

23 Bayes’ rule from ME Maximize the appropriate entropy data constraints: This is an ∞ number of constraints: one for each data point. data

24 The joint posterior is so that the new marginal for θ is which is Bayes’ rule !!

25 A hybrid example Maximize the appropriate entropy posterior:

26 Conclusions and remarks The tool for updating is (relative) Entropy. Minimal updating: Prior information is valuable. Entropy needs no interpretation. Bayes is a special case of M.E. Information is what constrains rational beliefs.

27 Bayes’ rule for repeatable experiments data constraints: posterior:

28 constraint Constraints do not commute in general