Constraint Optimization And counting, and enumeration 275 class

Constraint Optimization And counting, and enumeration 275 class
Chapter 13 Constraint Optimization And counting, and enumeration 275 class

From satisfaction to counting and enumeration and to general graphical models
ICS-275 Spring 2007

Outline Introduction Optimization tasks for graphical models
Solving optimization problems with inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces Hybrids of search and inference Cutset decomposition Super-bucket scheme

Constraint Satisfaction
Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints: A B red green red yellow green red green yellow yellow green yellow red C A B D E F G Task: consistency? Find a solution, all solutions, counting

Propositional Satisfiability
 = {(¬C), (A v B v C), (¬A v B v E), (¬B v C v D)}.

Constraint Optimization Problems for Graphical Models
Cost 1 2 3 5 f(A,B,D) has scope {A,B,D} A finite COP is defined by a set of discrete variables (X) with their domains of values (D) and a set of cost functions defined on subsets of variables. We also have a global cost function defined over the variables, which has to be optimized (that is minimized or maximized). For the remaining of the talk we will assume a global cost function defined by the sum of all the individual cost function and the objective is to minimize it. On the right hand side we have the graphical representation of a COP instance, where nodes correspond to variables and an edge connects any two nodes that appear in the same function.

Constraint Optimization Problems for Graphical Models
Cost 1 2 3 5 f(A,B,D) has scope {A,B,D} Primal graph = Variables --> nodes Functions, Constraints - arcs f1(A,B,D) f2(D,F,G) f3(B,C,F) A B C D F A finite COP is defined by a set of discrete variables (X) with their domains of values (D) and a set of cost functions defined on subsets of variables. We also have a global cost function defined over the variables, which has to be optimized (that is minimized or maximized). For the remaining of the talk we will assume a global cost function defined by the sum of all the individual cost function and the objective is to minimize it. On the right hand side we have the graphical representation of a COP instance, where nodes correspond to variables and an edge connects any two nodes that appear in the same function. F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f) G

Constrained Optimization
Example: power plant scheduling

Probabilistic Networks
Smoking Bronchitis Cancer X-Ray Dyspnoea P(S) P(B|S) P(D|C,B) P(C|S) P(X|C,S) P(D|C,B) C B D=0 D=1 0.1 0.9 1 0.7 0.3 0.8 0.2 P(S,C,B,X,D) = P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B)

Graphical models A D B C E F A graphical model (X,D,C):
X = {X1,…Xn} variables D = {D1, … Dn} domains C = {F1,…,Ft} functions (constraints, CPTS, cnfs) Primal graph A D B C E F Path width

Graphical models A D B C E F A graphical model (X,D,C):
X = {X1,…Xn} variables D = {D1, … Dn} domains C = {F1,…,Ft} functions (constraints, CPTS, cnfs) Primal graph A D B C E F A COP problem defined by operators sum, product and min,max over consistent solutions MPE: maxX j Pj CSP: X j Cj Max-CSP: minX j Fj Optimization: minX j Fj Path width

Solving by inference and search Inference Bucket elimination, dynamic programming, tree-clustering, bucket-elimination Mini-bucket elimination, belief propagation Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces Hybrids of search and inference Cutset decomposition Super-bucket scheme

P(b|a) P(d|b,a) P(e|b,c)
Finding Algorithm elim-mpe (Dechter 1996) Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A MPE

Generating the MPE-tuple
C: E: P(b|a) P(d|b,a) P(e|b,c) B: D: A: P(a) P(c|a) e=0

P(b|a) P(d|b,a) P(e|b,c)
Complexity Algorithm elim-mpe (Dechter 1996) Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A exp(W*=4) ”induced width” (max clique size) MPE

Complexity of bucket elimination
Bucket-elimination is time and space r = number of functions The effect of the ordering: constraint graph A D E C B B C D E A E D C B A Finding smallest induced-width is hard

Directional i-consistency
B D C B E D C B E D C B E Adaptive d-path d-arc

Mini-bucket approximation: MPE task
Split a bucket into mini-buckets =>bound complexity

MBE-MPE(i) Algorithm Approx-MPE (Dechter&Rish 1997)
Input: i – max number of variables allowed in a mini-bucket Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe

Properties of MBE(i) Complexity: O(r exp(i)) time and O(exp(i)) space.
Yields an upper-bound and a lower-bound. Accuracy: determined by upper/lower (U/L) bound. As i increases, both accuracy and complexity increase. Possible use of mini-bucket approximations: As anytime algorithms As heuristics in search Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)

Solving by inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces Hybrids of search and inference Cutset decomposition Super-bucket scheme

The Search Space Objective function: A B C D E F A B f1 2 1 4 A C f2 3
2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D Objective function: C D F E B A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Arc-cost is calculated based on cost components.
The Search Space A B f1 2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D C D F E B A 1 2 4 1 1 3 1 5 4 2 2 5 1 1 1 1 5 6 4 2 2 4 1 5 6 4 2 2 4 1 1 1 1 1 1 1 1 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Arc-cost is calculated based on cost components.

Value of node = minimal cost solution below it
The Value Function A B f1 2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D 5 C D F E B A 5 7 1 2 4 6 5 1 7 4 1 3 1 5 4 2 2 5 8 5 1 3 1 1 7 4 1 2 1 5 6 4 2 2 4 1 5 6 4 2 2 4 1 3 3 1 3 3 1 1 1 1 1 1 1 2 2 1 2 2 1 1 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 3 3 3 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Value of node = minimal cost solution below it

Value of node = minimal cost solution below it
An Optimal Solution A B f1 2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D 5 C D F E B A 5 7 1 2 4 6 5 1 7 4 1 3 1 5 4 2 2 5 8 5 1 3 1 1 7 4 1 2 1 5 6 4 2 2 4 1 5 6 4 2 2 4 1 3 3 1 3 3 1 1 1 1 1 1 1 2 2 1 2 2 1 1 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 3 3 3 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Value of node = minimal cost solution below it

Basic Heuristic Search Schemes
Heuristic function f(x) computes a lower bound on the best extension of x and can be used to guide a heuristic search algorithm. We focus on 1.Branch and Bound Use heuristic function f(xp) to prune the depth-first search tree. Linear space 2.Best-First Search Always expand the node with the highest heuristic value f(xp). Needs lots of memory f  L L

Classic Branch-and-Bound
Upper Bound UB Lower Bound LB g(n) LB(n) = g(n) + h(n) n Prune if LB(n) ≥ UB h(n) OR Search Tree

How to Generate Heuristics
The principle of relaxed models Linear optimization for integer programs Mini-bucket elimination Bounded directional consistency ideas

Generating Heuristic for graphical models (Kask and Dechter, 1999)
Given a cost function C(a,b,c,d,e) = f(a) • f(b,a) • f(c,a) • f(e,b,c) • P(d,b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension E D A B 1 D f*(a,e,d) = minb,c f(a,b,c,d,e) = = f(a) • minb,c f(b,a) • P(c,a) • P(e,b,c) • P(d,a,b) = g(a,e,d) • H*(a,e,d)

Generating Heuristics (cont.)
H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a) • minb [f(e,b,c) • f(b,a) • f(d,a,b)]] <= minc [f(c,a) • minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a) • minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d) <= f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.

Generating Heuristics (cont.)
H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a) • minb [f(e,b,c) • f(b,a) • f(d,a,b)]] >= minc [f(c,a) • minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a) • minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d) <= f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.

Static MBE Heuristics Given a partial assignment xp, estimate the cost of the best extension to a full solution The evaluation function f(x^p) can be computed using function recorded by the Mini-Bucket scheme A B C D E Belief Network f(a,e,D))=g(a,e) · H(a,e,D ) B: P(E|B,C) P(D|A,B) P(B|A) A: E: D: C: P(C|A) hB(E,C) hB(D,A) hC(E,A) P(A) hE(A) hD(A) f(a,e,D) = P(a) · hB(D,a) · hC(e,a) g h – is admissible E D A B 1

Heuristics Properties
MB Heuristic is monotone, admissible Retrieved in linear time IMPORTANT: Heuristic strength can vary by MB(i). Higher i-bound  more pre-processing  stronger heuristic  less search. Allows controlled trade-off between preprocessing and search

Experimental Methodology
Algorithms BBMB(i) – Branch and Bound with MB(i) BBFB(i) - Best-First with MB(i) MBE(i) Test networks: Random Coding (Bayesian) CPCS (Bayesian) Random (CSP) Measures of performance Compare accuracy given a fixed amount of time - how close is the cost found to the optimal solution Compare trade-off performance as a function of time

Empirical Evaluation of mini-bucket heuristics, Bayesian networks, coding
Random Coding, K=100, noise=0.32 Random Coding, K=100, noise=0.28

Max-CSP experiments (Kask and Dechter, 2000)

Dynamic MB Heuristics Rather than pre-compiling, the mini-bucket heuristics can be generated during search Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node in the search space (a partial assignment, along the given variable ordering)

Dynamic MB and MBTE Heuristics (Kask, Marinescu and Dechter, 2003)
Rather than precompile compute the heuristics during search Dynamic MB: Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node during search Dynamic MBTE: We can compute heuristics simultaneously for all un-instantiated variables using mini-bucket-tree elimination . MBTE is an approximation scheme defined over cluster-trees. It outputs multiple bounds for each variable and value extension at once.

Cluster Tree Elimination - example
ABC 1 G E F C D B A BC BCDF 2 BF BEF 3 EF EFG 4

Mini-Clustering Motivation: The basic idea:
Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible The basic idea: Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini-cluster The idea was explored for variable elimination (Mini-Bucket)

Idea of Mini-Clustering
Split a cluster into mini-clusters => bound complexity

Mini-Clustering - example
ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4

Mini Bucket Tree Elimination
ABC 1 ABC 1 BC BC 2 BCDF 2 BCDF BF BF 3 BEF 3 BEF EF EF 4 EFG 4 EFG

Mini-Clustering Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) for each variable and each of its values. MBTE: when the clusters are buckets in BTE.

Branch and Bound w/ Mini-Buckets
BB with static Mini-Bucket Heuristics (s-BBMB) Heuristic information is pre-compiled before search. Static variable ordering, prunes current variable BB with dynamic Mini-Bucket Heuristics (d-BBMB) Heuristic information is assembled during search. Static variable ordering, prunes current variable BB with dynamic Mini-Bucket-Tree Heuristics (BBBT) Heuristic information is assembled during search. Dynamic variable ordering, prunes all future variables

Empirical Evaluation Algorithms: Complete Incomplete Measures: BBBT
Time Accuracy (% exact) #Backtracks Bit Error Rate (coding) Algorithms: Complete BBBT BBMB Incomplete DLM GLS SLS IJGP IBP (coding) Benchmarks: Coding networks Bayesian Network Repository Grid networks (N-by-N) Random noisy-OR networks Random networks

Average Accuracy and Time. 30 samples, 10 observations, 30 seconds
Real World Benchmarks Average Accuracy and Time. 30 samples, 10 observations, 30 seconds

Empirical Results: Max-CSP
Random Binary Problems: <N, K, C, T> N: number of variables K: domain size C: number of constraints T: Tightness Task: Max-CSP

BBBT(i) vs BBMB(i), N=100 i=2 i=3 i=4 i=5 i=6 i=7 BBBT(i) vs. BBMB(i).

Searching the Graph; caching goods
B f1 2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D 5 A context(A) = [A] 1 2 4 B context(B) = [AB] 1 1 3 1 5 4 2 2 5 C context(C) = [ABC] 1 1 1 1 6 4 4 1 6 4 4 D context(D) = [ABD] 5 2 2 5 2 2 1 1 1 1 E context(E) = [AE] 5 5 1 1 2 3 2 3 3 3 5 3 3 5 F context(F) = [F] 1 1 2 1 3 2 2 4 1

Searching the Graph; caching goods
B f1 2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D 5 A context(A) = [A] 5 7 1 2 4 B context(B) = [AB] 6 5 1 7 4 1 3 1 5 4 2 2 5 C context(C) = [ABC] 8 5 1 3 1 1 7 4 1 2 1 6 4 4 1 6 4 4 1 D context(D) = [ABD] 5 2 2 5 2 2 3 3 1 1 1 1 2 2 1 1 E context(E) = [AE] 5 5 1 1 2 2 3 3 3 3 5 3 3 5 F context(F) = [F] 2 1 1 1 2 1 3 2 2 4 1

Solving by inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination, belief propagation Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces Hybrids of search and inference Cutset decomposition Super-bucket scheme

Classic OR Search Space
D B C E F Ordering: A B E C D F E C F D B A 1 Given a graphical model, the classical way of doing search (I’ll refer to it as OR search from now on) is by instantiating variables in turn following a static/dynamic ordering. This process results in a search tree (OR tree) whose nodes represent states in the space of partial assignments. This search tree does not capture the independencies in the graphical model. One way to capture such independencies is to introduce AND states in the OR space, which will decompose the problem into independent subproblems. This will yield and AND/OR search space, and we will see next how: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

AND/OR Search Space Primal graph DFS tree A D B C E F A A D B C E F A
AND 1 B OR Backbone tree that will determine the structure of the and/or same as ordering determines for OR Captures independence Stop at level 6, independence between problems When full tree is generated, show again the primal graph. We used dfs tree to drive the and/or AND 1 E OR C AND 1 OR D F AND 1

AND/OR vs. OR AND/OR AND/OR size: exp(4), OR size exp(6) OR A D B C E
F A D B C E F A OR AND 1 AND/OR OR B B AND 1 1 OR E C E C E C E C AND 1 1 1 1 1 1 1 1 OR D F D F D F D F D F D F D F D F AND 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 AND/OR size: exp(4), OR size exp(6) Show the dfs tree for the and/or Try to make more clear. Maybe just show the shaded one, skip green and yellow. We capture the ability to skip the shaded trees Only orange nodes are important, the blue are bookkeepping. Overall the size is exponential in the depth of the dfs tree. Show dfs tree E 1 C F D B A E 1 C F D B A E 1 C F D B A OR

OR space vs. AND/OR space
width height OR space AND/OR space Time (sec.) Nodes Backtracks AND nodes OR nodes 5 10 3.154 2,097,150 1,048,575 0.03 10,494 5,247 4 9 3.135 0.01 5,102 2,551 3.124 8,926 4,463 3.125 0.02 7,806 3,903 13 3.104 0.1 36,510 18,255 Random graphs with 20 nodes, 20 edges and 2 values per node.

AND/OR vs. OR AND/OR (A=1,B=1) (B=0,C=0) A D B C E F A D B C E F E C D
1 C D B A OR AND F AND/OR Show the dfs tree for the and/or Try to make more clear. Maybe just show the shaded one, skip green and yellow. We capture the ability to skip the shaded trees Only orange nodes are important, the blue are bookkeepping. Overall the size is exponential in the depth of the dfs tree. Show dfs tree F

AND/OR vs. OR AND/OR Space: linear Time: O(exp(m)) O(w* log n)
(A=1,B=1) (B=0,C=0) A D B C E F A D B C E F E 1 C D B A OR AND F AND/OR Space: linear Time: O(exp(m)) O(w* log n) Show the dfs tree for the and/or Try to make more clear. Maybe just show the shaded one, skip green and yellow. We capture the ability to skip the shaded trees Only orange nodes are important, the blue are bookkeepping. Overall the size is exponential in the depth of the dfs tree. Show dfs tree Linear space, Time: O(exp(n)) F

#CSP – AND/OR Search Tree
B C RABC 1 B C D RBCD 1 A B E RABE 1 A E F RAEF 1 A D B E C F A E C B F D OR A AND 1 OR B B Defined by a variable ordering, Fixed but may be dynamic AND 1 1 OR C E C E C E C E AND 1 1 1 1 1 1 1 1 OR D D F F D D F F D D F F D D F F AND 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

#CSP – AND/OR Tree DFS AND node: Combination operator (product)
RABC 1 B C D RBCD 1 A B E RABE 1 A E F RAEF 1 A D B E C F A E C B F D OR 9 1 E F C D B 2 3 6 5 14 A AND OR B Defined by a variable ordering, Fixed but may be dynamic AND node: Combination operator (product) AND 3 OR 3 C 1 E OR node: Marginalization operator (summation) AND 2 1 1 1 1 OR 2 D 1 D 1 F AND 1 1 1 1 1 1 1

AND/OR Tree Search for COP
B E C F A B f1 2 1 4 A C f2 3 1 A E f3 1 3 2 A F f4 2 1 B C f5 1 2 4 B D f6 4 1 2 B E f7 3 1 2 C D f8 1 4 E F f9 1 2 A E C B F D 5 OR AND A 1 B E F C D 5 6 4 2 7 3 5 5 B 2 1 E F C D 2 4 1 3 5 6 The AND/OR search tree can be traversed in a depth-first manner in order to compute the solution of a constraint optimization problem. Consider the following COP instance with 6 variables and 9 binary cost functions. The goal is to minimize the sum of the individual functions. A dfs algorithm expands alternating levels of OR and AND nodes, starting from the root of the pseudo-tree. Each node in the search tree maintains a value. The value of a node represents the minimal cost solution to the subproblem at that node. The value of a node is computed recursively, based on the values of its successors. OR nodes minimize the values of their children. AND nodes sum up the values of their children. The arc between an OR node and its AND child has a label (the red values) which is the sum of all the relevant cost functions that contain the variable at the OR node and whose scope is contained in the current path from the root. You can think at this label as the cost of that arc. These labels are added to the value of the AND nodes in order to fully determine the value of the AND nodes. For example, the label 3 of the arc between C and (C,0) is the sum of the functions containing C and whose scope is fully assigned along the path from the root, namely A=0 and B=0, C=0. In this case f2(A,C) and f5(B,C). Finally, the value of the root note is the optimal solution. 3 C 3 E 3 1 3 5 5 2 1 2 1 5 D 2 D F 2 F 1 1 1 1 5 6 4 2 3 2 2 OR node = Marginalization operator (minimization) AND node = Combination operator (summation)

Summary of AND/OR Search Trees
Based on a backbone pseudo-tree A solution is a subtree Each node has a value – cost of the optimal solution to the subproblem (computed recursively based on the values of the descendants) Solving a task = finding the value of the root node AND/OR search tree and algorithms are ([Freuder & Quinn85], [Collin, Dechter & Katz91], [Bayardo & Miranker95]) Space: O(n) Time: O(exp(m)), where m is the depth of the pseudo-tree Time: O(exp(w* log n)) BFS is time and space O(exp(w* log n) Ok

An AND/OR Graph: Caching Goods
J A F B B C K E C E D D F G G H J OR A H K AND 1 OR B B AND 1 1 OR E C E C E C E C AND 1 1 1 1 1 1 1 1 OR D F D F D F D F D F D F D F D F =============================================================== More information without talking about it? Do it on the original graph. Reason by looking at the induced graph and talk about independency. Context is dependent on separator set, which is bounded by w*, which yields the theorem We don’t need to generate the tree and then merge, but can do it while we search. We can generate minimal graph in time linear in its size (linear in the output) Context notion provides an algorithm which is linear in the size of the output. AND 1 1 OR G G J J AND 1 1 1 1 OR H H H H K K K K AND 1 1 1 1 1 1 1 1

Context-based Caching
Caching is possible when context is the same context = parent-separator set in induced pseudo-graph = current variable + parents connected to subtree below A D B C E F G H J K context(B) = {A, B} context(c) = {A,B,C} context(D) = {D} context(F) = {F}

Complexity of AND/OR Graph
Theorem: Traversing the AND/OR search graphis time and space exponential in the induced width/tree-width. If applied to the OR graph complexity is time and space exponential in the path-width.

#CSP – AND/OR Search Tree
B C RABC 1 B C D RBCD 1 A B E RABE 1 A E F RAEF 1 A D B E C F A E C B F D OR A AND 1 OR B B Defined by a variable ordering, Fixed but may be dynamic AND 1 1 OR C E C E C E C E AND 1 1 1 1 1 1 1 1 OR D D F F D D F F D D F F D D F F AND 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

#CSP – AND/OR Tree DFS A D B E C F Defined by a variable ordering,
RABC 1 B C D RBCD 1 A B E RABE 1 A E F RAEF 1 A D B E C F A E C B F D OR 9 1 E F C D B 2 3 6 5 14 A AND OR B Defined by a variable ordering, Fixed but may be dynamic AND 3 OR 3 C 1 E AND 2 1 1 1 1 OR 2 D 1 D 1 F AND 1 1 1 1 1 1 1

#CSP – AND/OR Search Graph (Caching Goods)
B C RABC 1 B C D RBCD 1 A B E RABE 1 A E F RAEF 1 A D B E C F A E C B F D OR A AND 1 OR B B Defined by a variable ordering, Fixed but may be dynamic AND 1 1 OR C E C E C E C E AND 1 1 1 1 OR D D F F D D F F AND 1 1

#CSP – AND/OR Search Graph (Caching Goods)
B C RABC 1 B C D RBCD 1 A B E RABE 1 A E F RAEF 1 A D B E C F A E C B F D Time and Space O(exp(w*)) OR A AND 1 OR B B Defined by a variable ordering, Fixed but may be dynamic AND 1 1 OR C E C E C E C E AND 1 1 1 1 OR D D F F D D F F AND 1 1

All Four Search Spaces A A B B C C D D E E F F Full OR search tree
1 C D F E B A 1 C D F E B A Full OR search tree 126 nodes Context minimal OR search graph 28 nodes A OR AND B E F 1 C D A OR AND B E F 1 C D Context minimal AND/OR search graph 18 AND nodes Full AND/OR search tree 54 AND nodes

AND/OR vs. OR DFS algorithms
k = domain size m = pseudo-tree depth n = number of variables w*= induced width pw*= path width AND/OR tree Space: O(n) Time: O(n km) O(n kw* log n) (Freuder85; Bayardo95; Darwiche01) AND/OR graph Space: O(n kw*) Time: O(n kw*) OR tree Space: O(n) Time: O(kn) OR graph Space: O(n kpw*) Time: O(n kpw*)

Searching AND/OR Graphs
AO(i): searches depth-first, cache i-context i = the max size of a cache table (i.e. number of variables in a context) i=0 i=w* Space: O(n) Time: O(exp(w* log n)) Space: O(exp w*) Time: O(exp w*) AO(i) time complexity?

End of class We did not cover the rest of the slides

AND/OR Branch-and-Bound (AOBB)
Associate each node n with a static heuristic estimate h(n) of v(n) h(n) is a lower bound on the value v(n) For every node n in the search tree: ub(n) – current best solution cost rooted at n lb(n) – lower bound on the minimal cost at n In the context of AND/OR search trees, we associate each node with a static heuristic lower-bound estimate of the value of that node. The basic ides is that during search, the algorithm should be able to compute for each node an upper bound on the value of that node, as well as a lower bound on that node, and prune the search space whenever the lower bound is greater than or equal to the upper bound.

Lower/Upper Bounds UB(X) LB(X) UB(X) = best cost below X (i.e. v(X,0))
LB(X) = LB(X,1) v(X,0) 1 LB(X,1) = l(X,1) + v(A) + h(C) + LB(B) v(A) A B C h(C) LB(B) = LB(B,0) I’ll illustrate this idea with the following example. Consider the partially explored AND/OR search tree. The subtree rooted by the AND node (X,0) as well as the subtree rooted by A have been already explored and the values of the corresponding root nodes are determined. These are the value of (X,0) and the value of A. Node C is the search frontier, namely it was generated but not yet explored and we only have available for it the heuristic underestimate of the value. The search is currently at the AND node (B,0) and the current path from X is highlighted. We want to compute an upper bound and a lower bound on the value of X. The upper bound on X is the best cost below X so far, in this case the value of (X,0). The lower bound on X can be computed recursively along the current path from the root based on the values of the descendants that were already explored as well as the heuristic underestimates of the values of the descendents at the search frontier. In this example the LB of X is equal to the LB of (X,1). The lower bound of (X,1) combines the value of A because A is evaluated, the heuristic value of C because C is at the frontier and recursively the LB of B. The process continues until we hit the tip node (B,0). The subspace below (B,0) can be pruned if the updated lower bound of X is greater than or equal to the upper bound of X. A formal description of this process can be found in the paper. h(B,0) LB(B,0) = h(B,0) Prune below AND node (B,0) if LB(X) ≥ UB(X)

Shallow/Deep Cutoffs UB(X) Prune if LB(X) ≥ UB(X) LB(X) UB(X)
1 UB(X) X A B C LB(X) = h(X,1) 1 So it could happen that the search was cut at a shallow level if the heuristic underestimate h of an AND node is no better than the current upper bound of its parent. Or much deeper, if the updated lower bound of one of its ancestors is not better than the current upper bound of that ancestor. These types of cuts resemble the Minimax shallow/deep cutoffs in game trees. Shallow cutoff D E Reminiscent of Minimax shallow/deep cutoffs Deep cutoff

Summary of AOBB Traverses the AND/OR search tree in a depth-first manner Lower bounds computed based on heuristic estimates of nodes at the frontier of search, as well as the values of nodes already explored Prunes the search space as soon as an upper-lower bound violation occurs Summary of AOBB ...

Heuristics for AND/OR In the AND/OR search space h(n) can be computed using any heuristic. We used: Static Mini-Bucket heuristics Dynamic Mini-Bucket heuristics Maintaining FDAC [Larrosa & Schiex03] (full directional soft arc-consistency) In summary, in the context of AND/OR search trees, the static heuristic function h(n) which underestimates (namely is a lower bound) the value of each node in the search tree can be computed using one of the following schemes: static mini-bucket heuristics, dynamic mini-bucket heuristics. We also computed h by maintaining full directional soft arc consistency as shown in Larrosa&Schiex.

Empirical Evaluation Tasks Benchmarks (WCSP) Algorithms Solving WCSPs
Finding the MPE in belief networks Benchmarks (WCSP) Random binary WCSPs RLFAP networks (CELAR6) Bayesian Networks Repository Algorithms s-AOMB(i), d-AOMB(i), AOMFDAC s-BBMB(i), d-BBMB(i), BBMFDAC Static variable ordering (dfs traversal of the pseudo-tree) We compared empirically several AND/OR Branch-and-Bound algorithms, each one using a specific heuristics generator: static mini-buckets, dynamic mini-buckets and mfdac, against their OR space correspondents. We considered two common optimization tasks: solving Weighted CSPs and finding the MPE in belief networks. We used various benchmarks, including random instances, as well as real world instances.

Random Binary WCSPs (Marinescu and Dechter, 2005)
D-AOMB vs D-BBMB S-AOMB vs S-BBMB We first compare the AND/OR BnB with static and dynamic mini-bucket heuristics against the corresponding OR BnB for solving a class of random binary WCSPs with 20 variables, 5 values per variable, 100 constraints and 70% tightness (which is the ratio of forbidden tuples). The X axis gives the i-bound of the mini-bucket heuristic. The Y axis shows the CPU time in seconds. The graph on the left compares AND/OR BnB with static mini-bucket against OR BnB with static mini-buckets. We observe that the AND/OR algorithm is much better for all reported i-bounds, except for i-bound 2 where neither algorithm solves the problem within 180 seconds. Similarly, on the graph on the right, the AND/OR BnB with dynamic mini-buckets outperforms the OR BnB for all reported i-bounds. Random networks with n=20 (number of variables), d=5 (domain size), c=100 (number of constraints), t=70% (tightness). Time limit 180 seconds. AO search is superior to OR search

Random Binary WCSPs (contd.)
dense sparse We also looked at the quality of the heuristics on two classes of random binary WCSPs, relatively dense networks on the left and sparse networks on the right. In both cases, the static mini-buckets with large i-bounds provide the best choice (the blue line). If however large i-bounds are not possible (namely the space is restricted), dynamic mini-buckets with smaller i-bounds are preferable for this class of sparse networks (green line on the right), whereas for denser networks (here on the left) maintaining full directional arc-consistency is the best choice (see red dotted line). n=20 (variables), d=5 (domain size), c=100 (constraints), t=70% (tightness) n=50 (variables), d=5 (domain size), c=80 (constraints), t=70% (tightness) AOMB for large i is competitive with AOMFDAC

Resource Allocation Radio Link Frequency Assignment Problem (RLFAP)
Instance BBMFDAC AOMFDAC time (sec) nodes CELAR6-SUB0 2.78 1,871 1.98 435 CELAR6-SUB1 2,420.93 364,986 981.98 180,784 CELAR6-SUB2 8,801.12 19,544,182 1,138.87 175,377 CELAR6-SUB3 38,889.20 91,168,896 4,028.59 846,986 CELAR6-SUB4 84,478.40 6,955,039 47,115.40 4,643,229 In this experiment we looked the radio link frequency assignment problems. These are communication problems where the goal is to assign frequencies to a set of radio links in such a way that all links can operate together without noticeable interferences. It can be easily formulated as a Weighted CSP. We looked at 5 subinstances of the larger CELAR6 network. We compare OR BnB versus AND/OR BnB both maintaining full directional arc consistency. For each algorithm we report the CPU time it took to solve the instance as well as the number of nodes expanded. We observe that in all test instances the AND/OR BnB was far better. If we look for example at SUB3, AOMFDAC is close to an order of magnitude faster than BBMFDAC, and the size of the search space is far smaller. CELAR6 sub-instances AOMFDAC is superior to ORMFDAC

Bayesian Networks Repository
Algorithm i=2 i=3 i=4 i=5 (n,d,w*,h) time nodes s-AOMB(i) - 8.5M 7.6M 46.22 807K 0.563 9.6K Barley s-BBMB(i) 16M 18M 17M 14M (48,67,7,17) d-AOMB(i) 79K 136.0 23K 12.55 667 45.95 567 d-BBMB(i) 2.2M 1M 346.1 76K 86K 57.36 1.2M 12.08 260K 7.203 172K 1.657 43K Munin1 9M 10M 8M (189,21,11,24) 66.56 185K 12.47 8.1K 10.30 1.6K 11.99 523 405K 430K 235K 14.63 917 5.9M 4.9M 1.313 17K 0.453 6K Munin3 1.4M 316K 1.5M (1044,21,7,25) 2.3M 68.64 58K 3.594 5.9K 2.844 3.8K 33K 125K 52K 31K We also experimented with several belief networks from the Bayesian Networks repository. For each network in the table we compare AND/OR BnB with static and dynamic mini-buckets versus OR BnB. So each line corresponds to an algorithm and the columns of the table are indexed by the i-bound of the mini-bucket heuristics. A dash means that the respective algorithm did not solve the problem in 600 seconds (or 5 minutes). We observe again that the AND/OR algorithms are much better than the OR ones, both in terms of CPU and the size of the search space explored. If we look at Munin3, which was the largest one with more than 1000 variables, and for i-bound 5 of the mini-bucket heuristic, static AOBB solves the network in half a second, whereas static OR BnB exceeds the 600 second time limit. Time limit 600 seconds available at Static AO is better with accurate heuristic (large i)

Solving by inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination, belief propagation Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces Searching trees Searching graphs Hybrids of search and inference Cutset decomposition Super-bucket scheme

From Searching Trees to Searching Graphs
Any two nodes that root identical subtrees/subgraphs can be merged Minimal AND/OR search graph: closure under merge of the AND/OR search tree Inconsistent sub-trees can be pruned too. Some portions can be collapsed or reduced. We may have two nodes that are identical, then we can merge. Highlight: rooting the same tree. Maybe an example of and/or tree, show animation that we merge. Minimal is when we remove all redundancy

AND/OR Search Graph Primal graph Pseudo-tree context(A) = {A}
context(B) = {B,A} context(C) = {C,B} context(D) = {D} context(E) = {E,A} context(F) = {F} A A E C B F D B C E Primal graph D F Pseudo-tree A OR AND B E F 1 C D

AND/OR Search Graph Primal graph Pseudo-tree context(A) = {A}
context(B) = {B,A} context(C) = {C,B} context(D) = {D} context(E) = {E,A} context(F) = {F} A A E C B F D B C E Primal graph D F Pseudo-tree OR A AND 1 OR B B AND 1 1 OR C E C E C E C E AND 1 1 1 1 OR D D F F D D F F AND 1 1

Context-based caching
context(A) = {A} context(B) = {B,A} context(C) = {C,B} context(D) = {D} context(E) = {E,A} context(F) = {F} A E C B F D A B Primal graph C E D F A B E C D 1 5 2 3 6 4 7 Cache Table (C) B C Value 5 1 2 Space: O(exp(2))

Searching AND/OR Graphs
AO(j): searches depth-first, cache j-context j = the max size of a cache table (i.e. number of variables in a context) j=0 j=w* Space: O(n) Time: O(exp(w* log n)) Space: O(exp w*) Time: O(exp w*) AO(j) time complexity?

Graph AND/OR Branch-and-Bound - AOBB(j)
[Marinescu & Dechter, CP2005] Like BB on tree except, during search, merge nodes based on context (caching); maintain cache tables of size O(exp(j)), where j is a bound on the size of the context. Heuristics Static mini-bucket Dynamic mini-bucket Soft directional arc-consistency In the context of AND/OR search trees, we associate each node with a static heuristic lower-bound estimate of the value of that node. The basic ides is that during search, the algorithm should be able to compute for each node an upper bound on the value of that node, as well as a lower bound on that node, and prune the search space whenever the lower bound is greater than or equal to the upper bound.

Pseudo-trees (I) AND/OR graph/tree search algorithms influenced by the pseudo-tree quality Finding the minimal context/depth pseudo-tree is a hard problem Heuristics Min-fill (min context) Hypergraph separation (min depth)

Pseudo-trees (II) MIN-FILL [Kjæaerulff, 1990], [Bayardo & Miranker, 1995] depth first traversal of the induced graph constructed along some elimination order elimination order prefers variables with smallest “fill set” HYPERGRAPH [Darwiche, 2001] constraints are vertices in the hypergraph and variables are hyperedges Recursive decomposition of the hypergraph while minimizing the separator size (i.e. number of variables) at each step Using state-of-the-art software package hMeTiS

Quality of pseudo-trees
Network hypergraph min-fill width depth barley 7 13 23 diabetes 16 4 77 link 21 40 15 53 mildew 5 9 munin1 12 17 29 munin2 32 munin3 30 munin4 18 water 11 10 pigs 20 26 Network hypergraph min-fill width depth spot5 47 152 39 204 spot28 108 138 79 199 spot29 16 23 14 42 spot42 36 48 33 87 spot54 12 11 spot404 19 26 spot408 52 35 97 spot503 20 9 spot505 29 74 spot507 70 122 59 160 Bayesian Networks Repository SPOT5 Benchmarks

Empirical Evaluation Tasks Benchmarks Algorithms Solving binary WCSPs
Finding the MPE in belief networks Benchmarks Random networks Resource allocation (SPOT5) Genetic linkage analysis Algorithms s-AOMB(i,j), d-AOMB(i,j), AOMFDAC(j), VE+C j is the cache bound Static variable ordering

Random Networks Static heuristics Dynamic Heuristics Caching helps for static heuristics with small i-bounds Random networks with n=100 (number of variables), d=3 (domain size), c=90 (number of CPTs), p=2 (number of parents per CPT). Time limit 180 seconds.

Resource Allocation – SPOT5
hypergraph minfill Network Algorithm (w*,h) no cache cache time nodes 29b AOMFDAC (16,22) 5.938 170,823 1.492 40,428 (14,42) 5.036 79,866 3.237 34,123 (83,394) sAOMB(12) 1.002 8,458 1.012 1,033 0.381 997 0.411 940 42b (31,43) 1,043 6,071,390 884.1 3,942,942 (18,62) - 22,102,050 17,911,719 (191, 1151) sAOMB(16) 132.0 2,871,804 127.4 2,815,503 3.254 11,636 3.164 9,030 54b (12,14) 0.401 6,581 0.29 3,377 (9,19) 1.793 28,492 0.121 2,087 (68, 197) sAOMB(10) 0.03 74 0.02 567 381 404b (19,23) 148 0.01 138 (19,57) 2.043 21,406 0.08 1,222 (101, 595) 101 357 208 503b (9,14) 405 307 (8,46) 1077.1 19,041,552 0.05 701 (144, 414) sAOMB(8) 150 1,918 172 505b (19,32) 17.8 368,247 5.20 69,045 (16,98) 9,872,078 15.43 135,641 (241, 1481) sAOMB(14) 5.618 683 6.208 4.997 1912 5.096 831 Time limit 1800 seconds Cache means j=i. we picked bestperforming i The heuristics MB(i) is strong enough to prune so caching is less relevant

6 people, 3 markers L11m L11f L12m L12f X11 X12 S15m S13m L13m L13f L14m L14f X13 X14 S15m S15m L15m L15f L16m L16f S15m S16m X15 X16 L21m L21f L22m L22f X21 X22 S25m S23m L23m L23f L24m L24f X23 X24 S25m S25m L25m L25f L26m L26f And one for the whole pedigree with 3 lcoi is given here. The linkage instances are interesting as graphical models because they involve both probabilistic and deterministic relationships, They can yield highly complex structures and the application will benefit from enhance computation Family trees can grow, and the number of loci we want to analyze at Once grow. It is remarkable that the best linkage analysis tool today is Superlink which is using general purpose Graphical models algorithms. So, in my talk I would like to focus on the principles that underlie reasoning in graphical models and allow to apply efficient computation Or understand when this is not feasible. . S25m S26m X25 X26 L31m L31f L32m L32f X31 X32 S35m S33m L33m L33f L34m L34f X33 X34 S35m S35m L35m L35f L36m L36f S35m S36m X35 X36

Empirical evaluation *averages over 5 runs

Pedigree 23 *averages over 5 runs

Pedigree 30 *averages over 5 runs

Solving by inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination, belief propagation Search Branch and bound and best-first Lower-bounding heuristics AND/OR search spaces Hybrids of search and inference Cutset decomposition Super-bucket scheme

Constraint Optimization And counting, and enumeration 275 class

Similar presentations

Presentation on theme: "Constraint Optimization And counting, and enumeration 275 class"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Constraint Optimization And counting, and enumeration 275 class

Similar presentations

Presentation on theme: "Constraint Optimization And counting, and enumeration 275 class"— Presentation transcript:

Similar presentations

About project

Feedback