Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming): Perfect recall is unrealistic: memory limit, decentralized systems Variational methods: Log-partition function duality: Junction graph BP: approximating and Belief Propagation for Structured Decision Making Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine Abstract Variational inference methods such as loopy BP have revolutionized inference abilities on graphical models. Influence diagrams (or decision networks) are extension of graphical models for representing structured decision making problems. Our contribution: A general variational framework for solving influence diagrams A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees A convergent double-loop algorithm Significant empirical improvement over the baseline algorithm Variational Framework for structured decision Influence Diagram Graphical Models and Variational Methods Graphical models: Factors & exponential family form Graphical representations: Bayes nets, Markov random fields … Inference: answering queries about graphical models Our Algorithms Experiments Junction graph belief propagation for MEU: Construct junction graph over the augmented distribution Main result: Intuition: the last term encourages policies to be deterministic Perfect recall  convex optimization (easier) Imperfect recall  non-convex optimization (harder) Bethe-Kikuchi approximation : locally consistent polytope ed abcbcd abe d bc e ab b e a c d Loopy Junction graph Influence diagram: Chance nodes ( C ): Augmented distribution: Maximum expected utility (MEU): Imperfect recall: No closed form solution Dominant algorithm: single policy updating (SPU), with policy-by- policy optimality If is the maximum, the optimal strategy is Causes policies to be deterministic Significance: Enables converting arbitrary variational methods to MEU algorithms “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops) c1c4d1c1c4d1 c1c2d2c1c2d2 c3d1c3d1 c2c3d3c2c3d3 c4d2d3c4d2d3 Influence diagram Augmented distribution (factor graph) Junction graph For each decision node, identify a unique cluster (called a decision cluster) that includes Decision cluster of d 1 Normal cluster Message passing algorithm ( ) Sum-messages (from normal clusters): MEU-messages (from decision clusters): Optimal policies: Strong local optimality: provably better than SPU Convergent algorithm by proximal point method: Iteratively optimize a smoothed objective, Diagnostic network (UAI08 inference challenge): e.g., calculating (log) partition function: Decentralized Sensor network: 1 Conditional probability: Decision rule: Global utility function: Local utility function: Decision nodes ( D ): Utility nodes ( U ): or d1d1 d2d2 u d1d1 d2d2 u Perfect recall Imperfect recall Additive d1d1 d2d2 utility +1 2 1 +10 +10 Toy example: Multiplicative Weather Activity Forecast Happiness d3d3 d2d2 u c2c2 c3c3 d1d1 c4c4 c1c1 d3d3 d2d2 c2c2 c3c3 d1d1 c4c4 c1c1

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Similar presentations

Presentation on theme: "Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Similar presentations

Presentation on theme: "Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):"— Presentation transcript:

Similar presentations

About project

Feedback