A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables, and a set of constraints restricting the allowable combinations of values for variables, where the task is to find a solution (i.e., an assignment of a value to each variable satisfying all constraints), or to find all such solutions. Dynamic Bundling for Non-binary CSPs and Databases Anagh Lal and Berthe Y. Choueiry Constraint Systems Laboratory Computer Science & Engineering University of Nebraska-Lincoln alal, S cd, e, f d V1V1 V2V2 Dynamic bundling [1] : Interchangeability sets are updated during search Static bundling [3] : Interchangeability sets are computed before search ce, fd d V1V1 V2V2 S Search without bundling cefd d V1V1 V2V2 S 1.Interchangeability: An algorithm for computing interchangeability in non-binary CSPs. 2.Dynamic bundling: Integration of the above with backtrack search for solving non-binary CSPs. 3.Modeling: A new representation of a database join-query as a CSP. 4. A new sorting-based bundling algorithm. 5. A new sort-merge join algorithm for producing bundled tuples. Contributions C4C4 {1, 2, 3, 4, 5, 6} {1, 2, 3} C2C2 C1C1 C3C3 Constraint Variable Non-binary CSP (nb-CSP) SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C Result: 10 tuples in 3 nested tuples Use new join algorithm when materializing join queries Exploit bundled results in data-analysis/data-mining packages Assist in query size estimation Improve accuracy of sampling operators Interchangeability is about finding values that are equivalent in all solutions of a CSP [2]. Full Interchangeability (FI): d, e, and f can be swapped for V 2 in any solution. Neighborhood Interchangeability (NI): finds e and f but misses d. It efficiently approximates FI. Dynamic bundling is guaranteed never less efficient than non-bundling and static bundling [1]. Interchangeability & Bundling Modeling the join query as a CSP Attributes variables Attribute values domains Relations relational constraints Join conditions join-condition constraints Dynamic Bundling for Databases Dynamic bundling = Search + dynamic interchangeability 1.Dynamic bundling was thought to be an overkill and not practical. We show it is worthwhile: Finding all solutions: theoretically best Finding first solutions: empirical evidence –Bundles solutions –Bundles no-goods (i.e., bundles of inconsistent partial solutions) 2.We exploit nb-DTs in Forward Checking (FC) 3.Search with dynamic bundling yields families of robust and compact solutions. Experiments : Compare the performance of search in FC and DynBndl using dynamic least domain as a variable ordering strategy. For low tightness values: FBS is large: 1200 solutions in 1 bundle However, the cost of DynBndl (CPU time) exceeds the gains due to savings in nodes visited. Around phase-transition (most critical region): DynBundl shows considerable improvement in #NV and CPU time FBS remains significant in most data-sets. For the data-set shown, DynBndl produces 26 robust solutions in less time than FC needs to produce 1 solution. Effect of domain size: Gains of DynBndl increase with domain size. Idea: Explore the use of DynBndl in databases where large domains are typical. Relational constraint Join-condition constraint R1.AR1.BR1.C R2.AR2.BR2.C R1 R2 Sorting-based bundling algorithm Sort-merge join algorithm based on dynamic bundling Computes the join by finding all solutions of the CSP using DynBndl. Reduces number of tuples compared in the main memory. Is memory efficient and produces compacted results, saving –I/O for the next operator and –disk space (and network bandwidth in distributed databases). Experiments Real-world problem: 3 relations; 4 attributes; largest table: 300 tuples –Compaction rate achieved: 2.26 (69 tuples in 32 nested tuples) Random data-set: Relations with same schema as example; each relation: 10’000 tuples; memory size: 4’000 tuples; page size 200 tuples. –Compaction rate achieved: 1.48 Task: Join query V3V3 {d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1 Binary CSP Partition Unequal partitions Symmetric partitions Bundle for R1.A: {1, 5} R1 A B C Choueiry, B.Y., Davis, A.M.: Dynamic bundling: Less Effort for More Solutions. SARA Freuder, E.C.: Eliminating Interchangeable Values in Constraint Satisfaction Problems. AAAI Haselböck, A: Exploiting Interchangeabilities in Constraint Satisfaction Problems. IJCAI Lal, A., Choueiry, B.Y., Freuder, E.C.: Neighborhood Interchangeability and Dynamic Bundling for Non-Binary Finite CSPs. CSCLP Lal, A., Choueiry, B.Y.: Constraint Processing Techniques for Improving Join Computation. CDB 04. {1, 2} {3} {1, 2} {1, 3} {3} No-good bundle V D C A B Solution bundle Constraint Satisfaction Problems {c, d, e, f } V3V3 {d} {a, b, d}{a, b, c} V4V4 V2V2 V1V1 Bundling is about exploiting interchangeability relations in backtrack search to produce robust, multiple solutions. This research is supported by a Maude Hammond Fling Faculty Research Fellowship and NSF CAREER Award # Experiments were conducted utilizing the Research Computing Facility of the University of Nebraska-Lincoln # NV in DynBndl # NV in FC First Bundle Size Tightness n = 20, a = 10 p 2 = 0.25 c 3 = 3 c 4 = DynBndl FC n = 20, a = 10 p 2 = 0.25 c 3 = 3 c 4 = 2 Tightness Overhead due to large FBS (upto 1200) at low tightness Maximum gains occur around phase- transition area CPU time [ms] Dynamic Bundling for Non-Binary CSPs Interchangeability in non-binary CSPs We show how to compute NI for non-binary constraints by [4]: 1.Building a data-structure, called the non-binary discrimination tree (nb- DT), determines the NI sets of a variable given a constraint that applies to the variable. 2.Intersecting the NI sets from the nb-DTs of a set of constraints yields the domain partition of the variable given the constraints. Tests carried out on random CSPs Parameters: n (number of variables), a (domain size), p 2 (proportion of binary constraints), c 3 / c 4 (number of ternary / quaternary constraints). Criteria: FBS (First Bundle Size), CPU time, number of nodes visited (NV), and number of constraint checks. partitions domains in a memory efficient manner. fits into the iterator model of databases and produces one bundle at a time. Future Research Directions References