1 Book: Constraint Processing Author: Rina Dechter Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Zheying Jane Yang Instructor: Dr. Berthe.

1 Book: Constraint Processing Author: Rina Dechter Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Zheying Jane Yang Instructor: Dr. Berthe Choueiry

2 Outline  Combining Search and Inference--Hybrid Idea - Cycle Cutset Scheme  Hybrids: Conditioning-first - Hybrid Algorithm for Propositional Theories  Hybrids: Inference-first - The Super Cluster Tree Elimination - Decomposition into non-separable Components - Hybrids of hybrids  A case study of Combinational Circuits - Parameters of Primary Join Trees - Parameters Controlling Hybrids  Summary

3 Part I Hybrids of Search and Inference

4 Two primary constraint processing schemes: 1. Conditioning or search Based on depth-first BT search time require may be exponential, but need very little memory 2. Inference or derivation Variable-elimination Both of time and space exponential but in the size of induced width. Good at when problem is sparse (W* is low)

5 Hybrids idea  Two ways to combine them: To use inference procedures as a preprocessing to search, then yield a restricted search space, then use search to find solutions. To alternate between both methods, apply search to subset of variables, then perform inference on the rest CSP CSP’ solutions InferenceSearch CSP subproblems search Inference subsolutions Subsolutions

6 Backtracking (Search) Elimination (Inference) Worst-case time O(exp(n)) O(n·exp(w*)) w*<=n SpaceO(n) O(n·exp(w*)) w*<=n Task Find one solution Knowledge compilation When induce-width is small, such as w*<=b, variable-elimination is polynomial time solvable, thus it is far more efficient than backtracking search. Comparison of BT and VE

7 10.1 The Cycle Cutset Scheme Definition 5.3.5 (cycle-cutset) Given an undirected graph, a subset of nodes in the graph is a cycle-cutset if its removal becomes a graph having no cycles. Example: Figure 10.3. x5x5 x2x2 x4x4 x3x3 x1x1 x2x2 x4x4 x1x1 x3x3 x2x2  x5x5 x2x2 Break cycle here Figure 10.3 An instantiated variable cuts its own cycle

8 When finding a cycle-cutset, The resulting network is cycle free, and can be solved by the inference-based tree-solving algorithms. (complicated problem turned to be easy problem) If a solution to this cycle-cutset is found, then a solution to the entire problem is obtained. Otherwise, another instantiation of the cycle-cutset variables should be considered until a solution is found. The Cycle Cutset Scheme (Cond’) CSP subproblems search Inference subsolutions

9 Tradeoff between Finding a Small Cycle-cutset and Using Search + Tree-Solving A small cycle-cutset is desirable. However, finding a minimal-size cycle-cutset is NP-hard Compromise between BT search and tree-solving algorithm Step1: using BT to keep track of the connectivity status of the constraint graph Step2: As soon as the set of instantiated variables constitutes a cycle-cutset, the search algorithm is switched to the tree-solving algorithm. Step3: Either finding a consistent extension for the remaining variables  find solution. Step4: Or no such extension exists, in this case, BT takes place again and try another instantiation.

10 B A C D E A BE CD C DCD A constraint graph and a constraint-tree generated by the cutset (C,D) Example

11 Example Figure 10.4 * I think Figure 10.4 (C) on the text book is wrong! F D E A B C (a) Constraint graph (b) Its ordered graph A A C CE F D C B Tree structure Cycle Cutset (c) Constraint graph is broken into Tree variables part and cycle-cutset variables part AE D F C B A Backup

12 When the original problem has a tree-constraint graph, the cycle-cutset scheme coincides with a tree-algorithm. When the constraint graph is complete, the algorithms reverts to regular backtracking. (Why?) Two extreme cases:

13 Time and space complexity of the algorithm cycle-cutset In the worst-case, all possible assignments to the cutset variables need to be tested, the time complexity yields: O(nk c+2 ) n is number of variables, c is the cycle-cutset size, k is domain size, k c is the number of tree-structured subproblems, then for each tree-structure requires O((n-c)k 2 ) steps for the tree algorithm. Thus we have time complexity O((n-c)k c+2 ). Then we have O(nk c+2 ) Space complexity is linear.

14 10.1.1 Structure-based recursive search An algorithm that performs search only, but consults a tree-decomposition to reduces its complexity. The algorithm is used in binary constraint network which is a tree. Given a network, Removes a node x 1, Generates two subtrees of size n/2 (appro.), Let T n be the time needed to solve this binary tree starting at x 1, it obeys the recurrence relation, suppose x 1 has at most k values, we have the formula: T n = k  2T n/2 T 1 = k Then we have T n = n  k logn+1

15 Theorem 10.1.3 A constraint problem with n variables and k values having a tree decomposition of tree- width w*, can be solved by recursive search in linear space and in O(nk 2w*(logn+1) time

16 10.2 Hybrids: conditioning first Conditioning in cycle-cutset. This suggestion a framework of hybrid algorithms parameterized by a bound b on the induced-width of subproblems solved by inference. And conditioning in b-cutset.

17 Conditioning Set or b-cutset The algorithm removes a set of cutset variables, gets a constraint graph with an induced-width bounded by b. we call such a conditioning set or b-cutset. Definition 10.2.1 (b-cutset) Given a graph G, a subset of nodes is called a b-cutset iff when the subset is removed the resulting graph has an induced-width less than or equal to b. A minimal b-cutset of a graph has a smallest size among all b-cutsets of the graph. A cycle-cutset is a 1-cutset of a graph.

18 How to Find a b-cutset Step1: Given a ordering d = {x 1,…,x n } of G, a b-cutset relative to d is obtained by processing the nodes from last to first. Step2: When node x is processed, if its induced-width is greater than b, it is added to the b-cutset. Step3: Else, its earlier neighbors are connected. Step4: The adjusted induced-width relative to a b-sutset is b. Step5: A minimal b-cutset is a smallest among all b-cutset. Definition 10.2.3 (finding a b-cutset) Given a graph G = (V,E) and a constant b, find a minimal b-cutset. Namely, find a smallest subset of nodes U, such that G’ = (V-U, E’), where E’ includes all the edges in E that are not incident to nodes in U, has induced-width less than or equal b.

19 The Purpose of Algorithm elim-cond(b) Original problem is divided two smaller parts:  Cutset variables  Remaining variables (subproblems) We can runs BT search on the cutset variables Run bucket elimination on the remaining variables.  This yields elim-cond(b) algorithm The constant b can be used to control the balance between search and variable-elimination, and thus effect the tradeoff between time and space.

20 Algorithm elim-cond(b) Algorithm elim-cond(b), text book page 280, Figure 10.5 Input: A constraint network R= (X,D,C), Y  X which is a b-cutset. d is an ordering with Y such that the adjusted induced-width, relative to Y along d, is bounded by b, Z= X-Y. Output: A consistent assignment, if there is one. 1. While  If next partial solution of Y found by BT, do (a)  adaptive - consistency (R Y= ). (b) if is not false, return solution = (, ). 2. Endwhile. 3. Return: the problem has no solution. y z y z z y

21 The Complicity of elim-cond(b) Theorem 10.2.2 Given R=(X,D,C), if elim-cond(b) is applied along ordering d when Y is a b-cutset of size c b, then the space complexity of elim-cond(b) is bounded by O(n  exp(b)), and its time complexity is bounded by O(n  exp(c b + b)). Proof: If b-cutset assignment is fixed, the time and space complexity of the inference portion (by variable-elimination) is bounded by O(n  k b ). In BT portion, it checks all possible values of the b-cutset, which is in O(k c b ) time. And linear space. Thus the total time complexity is O(n  k b  k c b ) = O(n  k (b+c b ) )

22 Part II Trade-off between search and inference

23  If b  w d *, where d is the ordering used by elim-cond(b) the algorithm coincides with adaptive-consistency.  As b decreases, the algorithm adds more nodes into cutset, where c b is increased, the algorithm requires less space and more time. (Why?) Parameter b can be used to control the trade-off between search and inference

24 Let size of the smallest cutset is 1-cutset, c 1, Smallest induced width is w*, Then we have inequality c 1  w* -1  1+ c 1  w*, The left side b+c b is the exponent that determines the time complexity of elim-cond(b) algorithm. While w* dominates the complexity of the bucket-elimination. Each time we increase the b by 1, and the size of the cutset c b is decreased. 1+ c 1  2+c 2  …b+ c b,...  w* + c w * = w*, When c w * = 0, it means that the subproblem has the induced-width = w*, there are no vertices in cutset. Thus search and variable elimination can also be interleaved dynamically.  Thus we get the a hybrid scheme whose time complexity decrease as its space increase until it reaches the induced-width. Trade-off between search and inference

25 Algorithm DP-DR(b)  A variant Algorithm of elim-cond(b) for processing proposittional CNF. Hybrid of DPLL and DR For Backtracking DPLL algorithm applies look-ahead using unit-propagation at each node Bucket-elimination algorithm applies Directional Resolution (DR) algorithm.  It is special version of elim-cond(b) which incorporates dynamic variable-ordering. Check on Chapter 8, using resolution as its variable-elimination operator, page 232. Figure 10.9 (page 284): Algorithm DP-DR(b).

26 DP-DR ( , b) Input: A CNF theory  over variables X; a bound b. Output: A decision of whether  is satisfiable. If it is, an assignment to its conditioning variables, and the conditional directional extension y E d (  y ). If unit-propagate(  ) = false, return (false) Else X  X- {variables in unit clauses} If no more variables to process, return true Else while  Q  X s.t. degree(Q) <=b in the current conditioned graph resolve over Q if no empty clause is generated, add all resolvents to the theory else return false X  X – {Q} EndWhile Select a variable Q  X; X  X –{Q} Y  Y  {Q} Return(DP-DR(   Q, b)  (   Q, b)).

27 Represent a Propositional Theory as an Interaction Graph The interaction graph of a propositional theory , denoted G(  ). Each propositional variable denotes one node in the graph Each pair of nodes in the same clause denotes an edge in the graph,which yields a complete graph. The theory  conditioned on the assignment Y = is called a conditional theory of  relative to, and is denoted by. Conditional graph of, denoted G (  y ). (which is obtained by deleting the nodes in Y and all their incident edges from G(  ). The conditional induced width of a theory, denoted, is the induced width of the graph G (  y ). y y yy yy Wy*Wy* yy

28 A B C E D A B C E D Resolution over A  1 = {(  C), (A  B  C), (  A  B  E), (  B  C  D)} If we apply resolution over variable A then we should get a new clause (B  C  E) The result graph should has an edge between nodes E and C. (Figure 10.6, in the text book) Example

29 Given a theory  = {(  C  E), (A  B  C  D), (  A  B  E  D), (  B  C  D)} If we do conditioning on A. When A=0  {(  B  C  D), (B  C  D), (  C  E)}. When A=1  {(B  E  D), (  B  C  D), (  C  E)}. Delete A and its incident edges from the interaction graph, De can also delete some other edges B and E, because when A=0 the clause (  A  B  E  D), we can remove edge between is always satisfied. B C D E A=1A=0 B C D E A W*=4W*=2 B C D W*=3 E Example

30 Example: DCDR(b=2) Bucket A Bucket B Bucket C Bucket D Bucket E Input A  B  C  D,  A  B  E  D  B  C  D  C  E C  D A=0 AA A=1 A B  C  D B  E  D BB B C DC D D ED ED ED E D ED E B C D E A W* > 2 Conditioning W*  2 Elimination

31 Complexity of DP-DR(b) Theorem 10.2.5 The time complexity of algorithm DP-DR(b) is O(n  exp(c b + b)), where c b is the largest cutset to be conditioned upon. The space complexity is O(n  exp(b)).

32 Empirical evaluation of DP-DR (b)  Evaluation of DP-DR(b) on Conditioning-first hybrids  Empirial results from experiments with different structured CNFs. Use random uniform 3-CNFs with 100 variables and 400 clauses (2, 5) -trees with 40 cliques and 15 clauses per clique (4, 8) -trees with 50 cliques and 20 clauses per clique In general, (k, m) -trees are trees of cliques each having m+k nodes and separators size of k. The randomly generated 3-CNFs were designed to have an interaction graph that corresponds to (k, m)-trees.  The performance of DP-DR(b) depends on the induced width of the theories.  When b=5, the overall performance is best. See Figure 10.10, on page 285.

33 Part III Non-separable components and tree-decomposition

34 10.3 Hybrids: Inference-first  The another approach for combining conditioning and inference based on structured constraint networks—by using tree-decomposition.  The algorithm CTE (Cluster-Tree-Elimination) computes (Chapter9, p261) Variable elimination on separators first (size is small) Search on tree clusters (size is relative large)  Thus, we can trade even more space for time by allowing large cliques but smaller separators.  This can be achieved by combining adjacent nodes in a tree-decomposition that connected by “fat separators”,  Rule to keep apart only those nodes that are linked by bounded size separators.

35 Tree Decomposition Definition 9.2.5 (tree-decomposition) Let R=(X, D, C) be a CSP problem. A tree-decomposition for R is a triple, where T = (V, E) is a tree, and , and  are labeling functions where associate each vertex v  V with two sets,  (v)  X and  (v)  C, that satisfy the following conditions: For each constraint R i  C, there is at least one vertex v  V such that R i   (v), and scope (R i )   (v). For each variable x  X, the set {v  V | x   (v)} induces a connected subtree of T. (This is the connectness property.) Definition 9.2.6 (tree-width, hyper-width, separtor) The tree-width of a tree-decomposition is tw = max v  V |  (v)|, and its hyper-width is hw = max v  V |  (v)|. Given two adjacent vertices u and v of a tree-decomposition, the separator of u and v is defined as sep(u,v) =  (u)   (v). Definitions from chapter9 Page260

36  Assume a CSP problem has a tree-decomposition, which has tree- width r and separator size s.  Assume the space restrictions do not allow memory space up to O(exp(s))  One way to overcome this problem is to collapse these nodes in the tree that are connected by large separators  let previous connected two nodes include the variables and constraints  The resulting tree-decomposition will has larger subproblems but smaller separators.  As s decrease, both r and hw increase. Tree-Decomposition

37 Given a tree-decomposition T over n variables, separator sizes s 0, s 1,…, s t and secondary tree-decompositions having a corresponding maximal number of nodes in any cluster, r 0, r 1,…, r t. The complexity of CTE when applied to each secondary tree-decompositions T i is O(n·exp(r i )) time, and O(n·exp(s i )) space (i ranges over all the secondary tree-decomposition). Secondary tree decomposition T i generated by combining adjacent nodes whose separator sizes are strictly greater than s i. Theorem 10.3.1

38 Example I A B C G D F E H AB GDEF BCD BDG GEFH B BD GD GFE Fat separator size=3 BD G AB BCD GDEFH B BD GD Separator smaller size=2 AB BCDGEFH B Separator smallest size=1 (a) (b) (c) A primal constraint graph

39 Example II A B C D E F G H I (a) the primal constraint graph (b) Induced triangulated graph A B C D E F G H I A, B, C, D B, C, D, F B, E, F D, F, G F, G, I B, F, H (c) Super tree clusters B,C,D B,F F FG

40  Each clique is processed by search, each solution created by BT search is projected on the separator the projected solutions are accumulated.  We call the resulting algorithm SUPER CLUSTER ELIMINATION(b), or SCTE(B). It takes a primary tree-decomposition and generates a tree-decomposition whose separators’ size is bounded by b, which is subsequently processed by CTE. 10.3.1 Super Cluster Tree Elimination(b)

41 Superbuckets Bucket-elimination algorithm can be extended to bucket-trees (section 9.3) Bucket-tree is a tree-decomposition by merging adjacent buckets to generate a super-bucket-tree (SBT) in a similar way to generate super clusters. In the top-down phase of bucket-elimination several variables are eliminated at once. Figure 10.13: G,F F,B,C A,B,C F B,C G,F F,B,C A,B,C F B,C A,B B,A A D,B,A A,B A G,F D,B,A A,B,C,D,F F Bucket-tree Join-tree Super-bucket-tree

42 A connected graph G = (V, E) is said to have separation node v if there exist nodes a and b such that all paths connection a and b pass through v. A graph that has a separation node is called separable, and one that has none is called non- separable. A subgraph with no separation nodes is called a non-separable component (or a bi-connected component). Definition 10.3.3 (non-separable components)

43 10.3.2 Decomposition into non-separable components  Generally we cannot find the best-decomposition having a bounded separators’ size in polynomial time.  Tree-decomposition--all the separators are singeton variables: It requires only linear space {O(n·exp(sep)), where size of sep=1). The variables of the nodes are those appearing in each component The constraints can be placed into a component that contains their scopes Applying CTE to such a tree requires linear space, (CTE space complexity is O(N·exp(sep)), chapter9, page263) Each node corresponds to a component But it is time exponential in the components’ sizes

44 GC3C4 A B D C F E H I J (a) FE C2 C C1 (b) C,F,E G,F,J E,H,I A,B,C,D Example: Decomposition into non-separable components Textbook, page 289

45 Execution messages passing in super-bucket tree  Because in super-bucket tree, each node C i dictates a super-cluster, i.e., Figure 10.14 on page 289, C 1 includes variables {A, B, C, D}; C 2 includes variables (F, C, E},….  If the C 1 sends message to C 2, it needs to place its message inside the receiving super-bucket C 2.  The message  C1 denotes new Domain of constraints  Computed by bucket C 1 and sent to bucket C 2 is place inside bucket C 2.  How to execute message passing along a super-bucket tree C2 See the example on page 290.

46 Part IV Hybrids of hybrids

47 10.3.3 Hybrids of hybrids Advantage: The space complexity of this algorithm is linear but its time complexity can be much better than the cycle-cutset scheme or the non-separable component alone. Example: Case study for c432, c499,… see Figure 10.19

48  Combine two approaches—conditioning and inference Given a space parameter b 1 Firstly, find a tree-decomposition with separator bounded by b 1 by the super-clustering approach. Instead of pure search in each cluster applying elim-cond(b 2 ), (b 2 ≤b 1 ), Time complexity will be significantly reduced. If C b2 * is the size of the maximum b 2 -cutset in each clique of the b 1 -tree-decomposition The result algorithm is space exponential in b 1 (separator is restricted by b 1 ) But time exponential in C b2 * (cycle-cutsize is bound by b 2 ) Algorithm HYBRID(b1,b2)

49 Two special cases 1. Apply the cycle-cutset scheme (hybrid(b 1,1)) in each clique. Real circuits problem—for circuit diagnosis tasks shows that the reduction in complexity bounds for complex circuits is tremendous. 2. When b 1 =b 2, for b=1, hybrid(1,1) corresponds to applying the non-separable components as a tree-decomposition and utilizing the cycle-cutset in each component.

50 Method  Using triangulation approach to decompose each circuit graph. Selecting an ordering for the nodes Triangulating the graph Generating the induced graph Identifying its maximum cliques (maximal- cardinality) o Using heuristic the min-degree ordering, will yield the smallest cliques sizes and separators.  See Fig 10.18 10.4 A case study of combinatorial circuits Textbook Page 291-294

51 Parameters of primary join trees For each primary join tree generated, three parameters are computed: (1)The size of cliques, (2)The size of cycle-cutsets in each of the subgraphs defined by clique sizes. (3)The size of the separator sets. The nodes of the join tree are labeled by the clique (cluster) sizes In Figure 10.18 shows us, some cliques and separators require memory space exponential in 28 for circuit c432 and exponential in 89 for circuit C3540. This is unacceptable using hybrid(b,1). We can select parameter b to control the balance of time and space. Figure 10.19

52 Parameters controlling hybrids For each separator size, s i, in primary join tree T 0 listed from largest to smallest, A tree decomposition T i generated by combining adjacent clusters whose separators’ sizes are strictly larger than s i. C i denotes the largest cycle-cutset size computed in any cluster of T i. Secondary join tree are indexed by the separator sizes of the primary tree, which range from 1 to 23 ( see Figure 10.20). As the separator size decreases, the maximum clique size increases, the both of the size and depth of the tree decreases. The clique size for the root node is much larger than for all other nodes, and it increases as the size of the separators decreases.

53 Summary  This chapter presents a structured based hybrid scheme that uses inference and search as its two extremes  Using single design parameter b, to allow the users to control the storage-time tradeoff in accordance with the problem domain and the available resources.  Two architectures for hybrids are proposed:  The conditioning first Find a small variable set Y, assign them values, the result subproblems have low induced width, and solve them by variable-elimination.  The inference first Limit space by using tree-decompositions whose separator size is bounded by b.  And further a hybrid of the two hybrids allows even better time complexity for every fixed space restrictions

54 Reference 1.R. Dechter, Constraint Processing. 2.Jeff Bilmes, Introduction to Graphical Models. Spring 2002, scribs 3.R. Dechter. Enhancement schemes for constraint processing: Backjumping, learning and cutset decomposition. AI, 41:273-312, 1990 4.R. Dechter and J. Pearl. The cycle-cutset method for improving search performance in AI applications. In Processing of the 3 rd IEEE Conference on AI Application, pages 224-230, Orlando, Florida, 1987. 5.R. Dechter and J. Peral. Tree clustering for constraint networks. AI, pages 353-366, 1989.

1 Book: Constraint Processing Author: Rina Dechter Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Zheying Jane Yang Instructor: Dr. Berthe.

Similar presentations

Presentation on theme: "1 Book: Constraint Processing Author: Rina Dechter Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Zheying Jane Yang Instructor: Dr. Berthe."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Book: Constraint Processing Author: Rina Dechter Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Zheying Jane Yang Instructor: Dr. Berthe.

Similar presentations

Presentation on theme: "1 Book: Constraint Processing Author: Rina Dechter Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Zheying Jane Yang Instructor: Dr. Berthe."— Presentation transcript:

Similar presentations

About project

Feedback