Approximation Techniques bounded inference COMPSCI 276 Fall 2007.

Slides:

Advertisements

Similar presentations

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Lauritzen-Spiegelhalter Algorithm

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Statistical Methods in AI/ML Bucket elimination Vibhav Gogate.

MURI Progress Report, June 2001 Advances in Approximate and Hybrid Reasoning for Decision Making Under Uncertainty Rina Dechter UC- Irvine Collaborators:

Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.

Exact Inference in Bayes Nets

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.

Local structures; Causal Independence, Context-sepcific independance COMPSCI 276 Fall 2007.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Introduction Combining two frameworks

Bayesian network inference

1 Exact Inference Algorithms Bucket-elimination and more COMPSCI 179, Spring 2010 Set 8: Rina Dechter (Reading: chapter 14, Russell and Norvig.

Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.

1 Exact Inference Algorithms for Probabilistic Reasoning; COMPSCI 276 Fall 2007.

Belief Propagation, Junction Trees, and Factor Graphs

Radu Marinescu University College Cork. Uncertainty in medical diagnosis  Diseases produce symptoms  In diagnosis, observed symptoms => disease.

Tree Decomposition methods Chapter 9 ICS-275 Spring 2007.

Tutorial #9 by Ma’ayan Fishelson

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

On the Power of Belief Propagation: A Constraint Propagation Perspective Rina Dechter Bozhena Bidyuk Robert Mateescu Emma Rollon.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

AND/OR Search for Mixed Networks #CSP Robert Mateescu ICS280 Spring Current Topics in Graphical Models Professor Rina Dechter.

Inference in Gaussian and Hybrid Bayesian Networks ICS 275B.

1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos.

Presented by: Ma’ayan Fishelson. Proposed Projects 1.Performing haplotyping on the input data. 2.Creating a friendly user-interface for the statistical.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

For Wednesday Read Chapter 11, sections 1-2 Program 2 due.

Introduction to Bayesian Networks

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

Variable and Value Ordering for MPE Search Sajjad Siddiqi and Jinbo Huang.

Approximation Techniques bounded inference 275b. SP22 Mini-buckets: “local inference” The idea is similar to i-consistency: bound the size of recorded.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Belief Propagation and its Generalizations Shane Oldenburger.

Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Inference Algorithms for Bayes Networks

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Tutorial #9 by Ma’ayan Fishelson. 2 Bucket Elimination Algorithm An algorithm for performing inference in a Bayesian network. Similar algorithms can.

Today Graphical Models Representing conditional dependence graphically

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.

Tree Decomposition methods Chapter 9 ICS-275 Spring 2009.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.

Constraint Optimization And counting, and enumeration 275 class

CSCI 5822 Probabilistic Models of Human and Machine Learning

Exact Inference ..

Bucket Renormalization for Approximate Inference

Class #19 – Tuesday, November 3

Arthur Choi and Adnan Darwiche UCLA

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

Iterative Join Graph Propagation

Presentation transcript:

Approximation Techniques bounded inference COMPSCI 276 Fall 2007

SP22 Approximate Inference Metrics of evaluation Absolute error: give e>0 and a query p= P(x|e), an estimate r has absolute error e iff |p-r|<e Relative error: the ratio r/p in [1-e,1+e]. Dagum and Luby 1993: approximation up to a relative error is NP-hard. Absolute error is also NP-hard if error is less than.5

SP23 Mini-buckets: “local inference” Computation in a bucket is time and space exponential in the number of variables involved Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables (The idea is similar to i-consistency: bound the size of recorded dependencies, Dechter 2003)

SP24 Mini-bucket approximation: MPE task Split a bucket into mini-buckets =>bound complexity

SP255 Mini-Bucket Elimination bucket A: bucket E: bucket D: bucket C: bucket B: min B Σ F(a,b) F(a,d) h E (a) h B (a,c) h B (d,e) F(b,d)F(b,e) F(c,e)F(a,c) h C (e,a) L = lower bound Mini-buckets A BC D E F(b,c) e = 0h D (e,a) min B Σ

SP266 Semantics of Mini-Bucket: Splitting a Node UU Û Before Splitting: Network N After Splitting: Network N' Variables in different buckets are renamed and duplicated (Kask et. al., 2001), (Geffner et. al., 2007), (Choi, Chavira, Darwiche, 2007)

SP27 Approx-mpe(i) Input: i – max number of variables allowed in a mini-bucket Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe

SP28 MBE(i,m), (MBE(i) (also, approx-mpe) Input: Belief network (P1,…Pn) Output: upper and lower bounds Initialize: (put functions in buckets) Process each bucket from p=n to 1 Create (i,m)-mini-buckets Process each mini-bucket (For mpe): assign values in ordering d Return: mpe-tuple, upper and lower bounds

SP29 Properties of approx-mpe(i) Complexity: O(exp(2i)) time and O(exp(i)) space. Accuracy: determined by upper/lower (U/L) bound. As i increases, both accuracy and complexity increase. Possible use of mini-bucket approximations: As anytime algorithms (Dechter and Rish, 1997) As heuristics in best-first search (Kask and Dechter, 1999) Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)

SP210 Anytime Approximation

SP211 Bounded Inference for belief updating Idea mini-bucket is the same: So we can apply a sum in each mini-bucket, or better, one sum and the rest max, or min (for lower-bound) Approx-bel-max(i,m) generating upper and lower-bound on beliefs approximates elim-bel Approx-map(i,m): max buckets will be maximized, sum buckets will be sum-max. Approximates elim-map.

SP212 Empirical Evaluation (Dechter and Rish, 1997; Rish thesis, 1999) Randomly generated networks Uniform random probabilities Random noisy-OR CPCS networks Probabilistic decoding Comparing approx-mpe and anytime-mpe versus elim-mpe

SP213 Causal Independence Event X has two possible causes: A,B. It is hard to elicit P(X|A,B) but it is easy to determine P(X|A) and P(X|B). Example: several diseases causes a symptom. Effect of A on X is independent from the effect of B on X  Causal Independence, using canonical models: Noisy-O, Noisy AND, noisy-max AB X

SP214 Binary OR AB X ABP(X=0|A,B) 001 P(X=1|A,B)

SP215 Noisy-OR “noise” is associated with each edge described by noise parameter  [0,1] : Let q b =0.2, q a =0.1 P(x=0|a,b)= (1- a ) (1- b ) P(x=1|a,b)=1-(1- a ) (1- b ) AB X ABP(X=0|A,B) 001 P(X=1|A,B) 0 a b qi=P(X=0|A_i=1,…else =0)

SP216 Closed Form Bel(X) - 1 Given: noisy-or CPT P(x|u) noise parameters i T u = {i: U i = 1} Define: q i = 1 - I Then:

SP217 Closed Form Bel(X) - 2 Using Iterative Belief Propagation : Set pi ix = pi x (u k =1). Then we can show that:

SP218 Methodology for Empirical Evakuation (for mpe) U/L –accuracy Better (U/mpe) or mpe/L Benchmarks: Random networks Given n,e,v generate a random DAG For xi and parents generate table from uniform [0,1], or noisy-or Create k instances. For each generate random evidence, likely evidence Measure averages

SP219 Random networks Uniform random: 60 nodes, 90 edges (200 instances) In 80% of cases, times speed-up while U/L<2 Noisy-OR – even better results Exact elim-mpe was infeasible; appprox-mpe took 0.1 to 80 sec.

SP220 CPCS networks – medical diagnosis (noisy-OR model) Test case: no evidence anytime-mpe( ), anytime-mpe( ), elim-mpe cpcs422cpcs360 Algorithm Time (sec)

SP221 The effect of evidence More likely evidence=>higher MPE => higher accuracy (why?) Likely evidence versus random (unlikely) evidence

SP222 Probabilistic decoding Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)

SP223 Iterative Belief Proapagation Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic networks No guarantees for convergence Works well for many coding networks

SP224 approx-mpe vs. IBP Bit error rate (BER) as a function of noise (sigma):

SP225 Mini-buckets: summary Mini-buckets – local inference approximation Idea: bound size of recorded functions Approx-mpe(i) - mini-bucket algorithm for MPE Better results for noisy-OR than for random problems Accuracy increases with decreasing noise in coding Accuracy increases for likely evidence Sparser graphs -> higher accuracy Coding networks: approx-mpe outperfroms IBP on low- induced width codes

SP226 Cluster Tree Elimination - properties Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. Time complexity: O ( deg  (n+N)  d w*+1 ) Space complexity: O ( N  d sep ) wheredeg = the maximum degree of a node n = number of variables (= number of CPTs) N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size

SP227 Cluster Tree Elimination - the messages A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) h (1,2) (b,c) B E F p(e|b,f), h (2,3) (b,f) E F G p(g|e,f) EF BC BF sep(2,3)={B,F} elim(2,3)={C,D}

SP228 Mini-Clustering for belief updating Motivation: Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible The basic idea: Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini- cluster The idea was explored for variable elimination (Mini-Bucket)

SP229 Idea of Mini-Clustering Split a cluster into mini-clusters => bound complexity

SP230 Mini-Clustering - MC A B C p(a), p(b|a), p(c|a,b) B E F p(e|b,f) E F G p(g|e,f) EF BC BF Cluster Tree Elimination Mini-Clustering, i=3 G E F C D B A B C D F p(d|b), p(f|c,d) 2 B C D F p(d|b), h (1,2) (b,c), p(f|c,d) sep(2,3)= {B,F} elim(2,3) = {C,D} C D F B C D C D F p(f|c,d) p(d|b), h (1,2) (b,c) p(f|c,d)

SP231 Mini-Clustering - the messages, i=3 A B C p(a), p(b|a), p(c|a,b) B C D p(d|b), h (1,2) (b,c) C D F p(f|c,d) B E F p(e|b,f), h 1 (2,3) (b), h 2 (2,3) (f) E F G p(g|e,f) EF BC BF sep(2,3)={B,F} elim(2,3)={C,D}

SP232 EF BF BC ABC BEF EFG BCDF Mini-Clustering - example

SP233 Cluster Tree Elimination vs. Mini-Clustering ABC BEF EFG EF BF BC BCDF ABC BEF EFG EF BF BC BCDF

SP234 Mini-Clustering Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(X i,e) of each variable and each of its values. Time & space complexity: O(n  hw*  d i ) where hw* = max u | {f | f   (u)   } |

SP235 We can replace max operator by min => lower bound on the joint mean =>approximation of the joint Lower bounds and mean approximations

SP236 MC can compute an (upper) bound on the joint P(X i,e) Deriving a bound on the conditional P(X i |e) is not easy when the exact P(e) is not available If a lower bound would be available, we could use: as an upper bound on the posterior In our experiments we normalized the results and regarded them as approximations of the posterior P(X i |e) Normalization

SP237 Experimental results Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization (approximate) Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks Measures: Normalized Hamming Distance (NHD) BER (Bit Error Rate) Absolute error Relative error Time

SP238 Random networks - Absolute error evidence=0evidence=10

SP239 Noisy-OR networks - Absolute error evidence=10evidence=20

SP240 Grid 15x evidence

SP241 CPCS422 - Absolute error evidence=0evidence=10

SP242 Coding networks - Bit Error Rate sigma=0.22sigma=.51

SP243 Mini-Clustering summary MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms

SP244 What is IJGP? IJGP is an approximate algorithm for belief updating in Bayesian networks IJGP is a version of join-tree clustering which is both anytime and iterative IJGP applies message passing along a join-graph, rather than a join-tree Empirical evaluation shows that IJGP is almost always superior to other approximate schemes (IBP, MC)

SP245 Iterative Belief Propagation - IBP Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic networks No guarantees for convergence Works well for many coding networks One step: update BEL(U 1 ) U1U1 U2U2 U3U3 X2X2 X1X1

SP246 IJGP - Motivation IBP is applied to a loopy network iteratively not an anytime algorithm when it converges, it converges very fast MC applies bounded inference along a tree decomposition MC is an anytime algorithm controlled by i-bound MC converges in two passes up and down the tree IJGP combines: the iterative feature of IBP the anytime feature of MC

SP247 IJGP - The basic idea Apply Cluster Tree Elimination to any join-graph We commit to graphs that are minimal I-maps Avoid cycles as long as I-mapness is not violated Result: use minimal arc-labeled join-graphs

SP248 IJGP - Example A D I B E J F G C H A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AABBC BE C C DECE F H F FGGHH GI a) Belief networka) The graph IBP works on

SP249 Arc-minimal join-graph A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AABABBCBC BEBE C C DEDECECE F H F FGFGGHGHH GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A ABABBCBC C DEDECECE H F FGFGGHGH GI

SP250 Minimal arc-labeled join-graph A ABDE FGI ABC BCE GHIJ CDEF FGH C H A ABBC C DECE H F FGFGGHGH GIGI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A ABBC C DECE H F FGHGH GIGI

SP251 Join-graph decompositions a) Minimal arc-labeled join graphb) Join-graph obtained by collapsing nodes of graph a) c) Minimal arc-labeled join graph A ABDE FGI ABC BCE GHIJ CDEF FGH C H A ABBC C DECE H F FGH GI ABCDE FGI BCE GHIJ CDEF FGH BCBC CDECECE F FGH GI ABCDE FGI BCE GHIJ CDEF FGH BCBC DECECE F FGH GI

SP252 Tree decomposition ABCDE FGHIGHIJ CDEF CDE F GHI a) Minimal arc-labeled join grapha) Tree decomposition ABCDE FGI BCE GHIJ CDEF FGH BC DECE F FGH GI

SP253 Join-graphs A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AABBC BE C C DECE F H F FGGHH GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A ABBC C DECE H F FGH GI ABCDE FGI BCE GHIJ CDEF FGH BC DECE F FGH GI ABCDE FGHIGHIJ CDEF CDE F GHI more accuracy less complexity

SP254 Message propagation ABCDE FGI BCE GHIJ CDEF FGH BCBC CDE CECE F FGH GI ABCDE p(a), p(c), p(b|ac), p(d|abe),p(e|b,c) h(3,1)(bc) BCD CDEF BCBC CDE CECE 13 2 h (3,1) (bc) h (1,2) Minimal arc-labeled: sep(1,2)={D,E} elim(1,2)={A,B,C} Non-minimal arc-labeled: sep(1,2)={C,D,E} elim(1,2)={A,B}

SP255 Bounded decompositions We want arc-labeled decompositions such that: the cluster size (internal width) is bounded by i (the accuracy parameter) the width of the decomposition as a graph (external width) is as small as possible Possible approaches to build decompositions: partition-based algorithms - inspired by the mini-bucket decomposition grouping-based algorithms

SP256 Partition-based algorithms G E F C D B A a) schematic mini-bucket(i), i=3 b) arc-labeled join-graph decomposition CDB CAB BA A CB P(D|B) P(C|A,B) P(A) BA P(B|A) FCD P(F|C,D) GFE EBF BF EF P(E|B,F) P(G|F,E) B CD BF A F G: (GFE) E: (EBF) (EF) F: (FCD) (BF) D: (DB) (CD) C: (CAB) (CB) B: (BA) (AB) (B) A: (A)

SP257 IJGP properties IJGP(i) applies BP to min arc-labeled join-graph, whose cluster size is bounded by i On join-trees IJGP finds exact beliefs IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman, Weiss 2001) Complexity of one iteration: time: O(deg(n+N) d i+1 ) space: O(Nd  )

SP258 Empirical evaluation Algorithms : Exact IBP MC IJGP Measures: Absolute error Relative error Kulbach-Leibler (KL) distance Bit Error Rate Time Networks (all variables are binary): Random networks Grid networks (MxM) CPCS 54, 360, 422 Coding networks

SP259 Random networks - KL at convergence evidence=0 evidence=5

SP260 Random networks - KL vs. iterations evidence=0evidence=5

SP261 Random networks - Time

SP262 Coding networks - BER sigma=.22sigma=.32 sigma=.51sigma=.65

SP263 Coding networks - Time

SP264 IJGP summary IJGP borrows the iterative feature from IBP and the anytime virtues of bounded inference from MC Empirical evaluation showed the potential of IJGP, which improves with iteration and most of the time with i-bound, and scales up to large networks IJGP is almost always superior, often by a high margin, to IBP and MC Based on all our experiments, we think that IJGP provides a practical breakthrough to the task of belief updating

SP265 Heuristic search Mini-buckets record upper-bound heuristics The evaluation function over Best-first: expand a node with maximal evaluation function Branch and Bound: prune if f >= upper bound Properties: an exact algorithm Better heuristics lead to more prunning

SP266 Heuristic Function Given a cost function P(a,b,c,d,e) = P(a) P(b|a) P(c|a) P(e|b,c) P(d|b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension f*(a,e,d) = max b,c P(a,b,c,d,e) = = P(a) max b,c P)b|a) P(c|a) P(e|b,c) P(d|a,b) = g(a,e,d) H*(a,e,d) E E D A D B D D B

SP267 Heuristic Function H*(a,e,d) = max b,c P(b|a) P(c|a) P(e|b,c) P(d|a,b) = max c P(c|a) max b P(e|b,c) P(b|a) P(d|a,b)  max c P(c|a) max b P(e|b,c) max b P(b|a) P(d|a,b) = H(a,e,d) f(a,e,d) = g(a,e,d) H(a,e,d)  f*(a,e,d) The heuristic function H is compiled during the preprocessing stage of the Mini-Bucket algorithm.

SP268 max B P(e|b,c) P(d|a,b) P(b|a) max C P(c|a) h B (e,c) max D h B (d,a) max E h C (e,a) max A P(a) h E (a) h D (a) Heuristic Function The evaluation function f(x p ) can be computed using function recorded by the Mini-Bucket scheme and can be used to estimate the probability of the best extension of partial assignment x p ={x 1, …, x p }, f(x p )=g(xp)  H(x p ) For example, H(a,e,d) = h B (d,a)   h C (e,a) g(a,e,d) = P(a)

SP269 Properties Heuristic is monotone Heuristic is admissible Heuristic is computed in linear time IMPORTANT: Mini-buckets generate heuristics of varying strength using control parameter – bound I Higher bound -> more preprocessing -> stronger heuristics -> less search Allows controlled trade-off between preprocessing and search

SP270 Empirical Evaluation of mini-bucket heuristics