Constraint Satisfaction Problems & Its Application in Databases

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

1 Constraint Satisfaction Problems A Quick Overview (based on AIMA book slides)
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
Outline Interchangeability: Basics Robert Beyond simple CSPs Relating & Comparing Interchangeability Shant Compacting the Search Space – AND/OR graphs,
A First Practical Algorithm for High Levels of Relational Consistency Shant Karakashian, Robert Woodward, Christopher Reeson, Berthe Y. Choueiry & Christian.
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables {A,B,C,…}, a set of domain values for these variables,
Constraint Processing Techniques for Improving Join Computation: A Proof of Concept Anagh Lal & Berthe Y. Choueiry Constraint Systems Laboratory Department.
Foundations of Constraint Processing, Spring 2008 Evaluation to BT SearchApril 16, Foundations of Constraint Processing CSCE421/821, Spring 2008:
Constraint Systems Laboratory Oct 21, 2004Guddeti: MS thesis defense1 An Improved Restart Strategy for Randomized Backtrack Search Venkata P. Guddeti Constraint.
An Approximation of Generalized Arc-Consistency for Temporal CSPs Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science.
Improving Backtrack Search For Solving the TCSP Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science and Engineering.
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
Foundations of Constraint Processing Evaluation to BT Search 1 Foundations of Constraint Processing CSCE421/821, Spring
Solvable problem Deviation from best known solution [%] Percentage of test runs ERA RDGR RGR LS Over-constrained.
Efficient Techniques for Searching the Temporal CSP Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science and Engineering.
Foundations of Constraint Processing, Fall 2005 Sep 20, 2005BT: A Theoretical Evaluation1 Foundations of Constraint Processing CSCE421/821, Fall 2005:
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
Constraint Systems Laboratory March 26, 2007Reeson–Undergraduate Thesis1 Using Constraint Processing to Model, Solve, and Support Interactive Solving of.
Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases.
Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases.
Constraint Systems Laboratory 11/22/2005Zheng – Comprehensive1 Survey of Techniques for Detecting and Exploiting Symmetry in Constraint Satisfaction Problems.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
Query Processing Presented by Aung S. Win.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
Constraint Systems Laboratory 10/24/2015Bayer–MS Thesis Defense1 Reformulating Constraint Satisfaction Problems with Application to Geospatial Reasoning.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Constraint Systems Laboratory R.J. Woodward 1, S. Karakashian 1, B.Y. Choueiry 1 & C. Bessiere 2 1 Constraint Systems Laboratory, University of Nebraska-Lincoln.
Foundations of Constraint Processing, Fall 2004 October 3, 2004Interchangeability in CSPs1 Foundations of Constraint Processing CSCE421/821, Fall 2004:
ERA on an over-constrained problem A Constraint-Based System for Hiring & Managing Graduate Teaching Assistants Ryan Lim, Praveen Venkata Guddeti, and.
Shortcomings of Traditional Backtrack Search on Large, Tight CSPs: A Real-world Example Venkata Praveen Guddeti and Berthe Y. Choueiry The combination.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Hybrid BDD and All-SAT Method for Model Checking
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CS 440 Database Management Systems
Structure-Based Methods Foundations of Constraint Processing
Consistency Methods for Temporal Reasoning
A First Practical Algorithm for High Levels of Relational Consistency
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
An Empirical Study of the Performance
Robert Glaubius and Berthe Y. Choueiry
Empirical Comparison of Preprocessing and Lookahead Techniques for Binary Constraint Satisfaction Problems Zheying Jane Yang & Berthe Y. Choueiry Constraint.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Constraint Propagation
Structure-Based Methods Foundations of Constraint Processing
CSPs and Relational DBs
More on Constraint Consistency
Structure-Based Methods Foundations of Constraint Processing
Problem Solving with Constraints
Intelligent Backtracking Algorithms: A Theoretical Evaluation
Structure-Based Methods Foundations of Constraint Processing
Evaluation of (Deterministic) BT Search Algorithms
More on Constraint Consistency
Constraints and Search
Chapter 5: General search strategies: Look-ahead
Evaluation of (Deterministic) BT Search Algorithms
Intelligent Backtracking Algorithms: A Theoretical Evaluation
Intelligent Backtracking Algorithms: A Theoretical Evaluation
Intelligent Backtracking Algorithms: A Theoretical Evaluation
Intelligent Backtracking Algorithms: A Theoretical Evaluation
Evaluation of (Deterministic) BT Search Algorithms
Problem Solving with Constraints
Problem Solving with Constraints
Intelligent Backtracking Algorithms: A Theoretical Evaluation
Improving the Performance of Consistency Algorithms by Localizing and Bolstering Propagation in a Tree Decomposition Shant Karakashian, Robert J. Woodward.
Structure-Based Methods Foundations of Constraint Processing
Revisiting Neighborhood Inverse Consistency on Binary CSPs
Constraint Satisfaction Problems
An Introduction Structure-Based Methods
Reformulating the Dual Graphs of CSPs
Structure-Based Methods Foundations of Constraint Processing
Presentation transcript:

Constraint Satisfaction Problems & Its Application in Databases Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder Supported by NSF CAREER award #0133568

Outline Definitions Bundling in CSPs Interchangeability Bundling Bundling in CSPs Bundling for join query computation Conclusions December 9, 2005

Constraint Satisfaction Problem (CSP) V3 {d} {a, b, d} {a, b, c} {c, d, e, f} V4 V2 V1 Given P = (V, D, C) V : set of variables D : set of their domains C : set of constraints (relations) restricting the acceptable combination of values for variables Solution is a consistent assignment of values to variables Query: find 1 solution, all solutions, etc. Examples: SAT, scheduling, product configuration NP-Complete in general Graphical representation, binary CSP December 9, 2005

Backtrack search DFS + backtracking (linear space) Solution V1  d V2  e V3  a V4  c S {c,d,e,f} {a,b,d} {a,b,c} V1 V2 V3 V4 d V3 {d} {a, b, d} {a, b, c} { c, d, e, f } V4 V2 V1 V1 d V2 c e f d V3 DFS + backtracking (linear space) Variable being instantiated: current variable Un-instantiated variables: future variables Instantiated variables: past variables + Constraint propagation Backtrack search with forward checking (FC) December 9, 2005

Interchangeability [Freuder, 91] Captures the idea of symmetry between solutions Functional interchangeability Any mapping between two solutions Including permutation of values across variables, equivalent to graph isomorphism Full interchangeability (FI) Restricted to values of a single variable Also, likely intractable V1   V2  {d, e, f} V3   V4   In every solution V3 {d} {a, b, d} {a, b, c} { c, d, e, f } V4 V2 V1 V1  d V2  c V3  a V4  b V1  d V2  c V3  b V4  a December 9, 2005

Value interchangeability [Freuder, 91] Full Interchangeability (FI): d, e, f interchangeable for V2 in any solution Neighborhood Interchangeability (NI): Considers only the neighborhood of the variable Finds e, f but misses d Efficiently approximates FI Discrimination tree DT(V2) {c, d, e, f } V1 {d} V2 V3 {a, b, d} {a, b, c} V4 December 9, 2005

Outline Definitions Bundling in CSPs Static bundling Dynamic bundling Dynamic bundling for non-binary CSPs Bundling for join query computation Conclusions December 9, 2005

Bundling: using NI in search V1 V3 {d} {a, b, d} {a, b, c} { c, d, e, f } V4 V2 V1 c e, f d V1 V2 S V1  d V2  {e,f} V3  a { c, d, e, f } V2 { c, d, e, f } { d, c, e, f } V4  {b,c} V3 Static bundling V4 Static bundling [Haselböck, 93] Before search: compute & store NI sets During search: Future variables: remove bundle of equivalent values Current variable: assign a bundle of equivalent values Advantages Reduces search space Creates bundled solutions 7min December 9, 2005

Dynamic bundling (DynBndl) [2001] V3 {d} {a, b, d} {a, b, c} { c, d, e, f } V4 V2 V1 c e, f d V1 V2 S S c d, e, f d V1 V2 <V3,a> <V3,b> <V4,a> <V3,d> <V4,b> <V4,a> <V4,c> <V4,b> V2,{d,e,f} V2,{c} Static bundling Dynamic bundling Dynamically identifies NI Using discrimination tree for forward checking: is never less efficient than BT & static bundling 7min December 9, 2005

Non-binary CSPs Scope(Cx): the set of variables involved in Cx {1, 2, 3, 4, 5, 6} {1, 2, 3} C2 C1 C3 V1 V2 V3 V4 V Scope(Cx): the set of variables involved in Cx Arity(Cx): size of scope Constraint Variable C1 C2 C3 C4 V V1 V2 V3 V4 1 3 2 4 6 5 Computing NI for non-binary CSPs is not a trivial extension from binary CSPs DT compares neighbohood that have the same size It is difficult to compare neighborhoods that have different sizes December 9, 2005

NI for non-binary CSPs [2003,2005] Building an nb-DT for each constraint Determines the NI sets of variable given constraint Intersecting partitions from nb-DTs Yields NI sets of V (partition of DV) Processing paths in nb-DTs Gives, for free, updates necessary for forward checking C4 {1, 2, 3, 4, 5, 6} C2 C1 C3 V1 V2 V3 V4 V Root Root {5} {1, 2} {5, 6} {3, 4} {1, 2} {6} {3, 4} nb-DT(V, C1) nb-DT(V, C2) For each one of the cnstraints, we build a discrimination tree that is appropriate for non-binary constraints. For example, V is involved in 2 constraints, C1 and C2. So, we build one nb-dt for C1 and another one for C2. In the next slide, I will explain in detail how we build this tree. What is important to notice here is that, in each tree, we have these boxes that we call annotations. the domain of V is partitioned in these annotations. Each partition is a set of equivalent values for V given the constraint. -------------------------- Now, intersecting the partitions from the constraints that apply to the variable gives us the setS of equivalent values for V given the constraints. Of course, these equivalence sets partition the domain of V. ---------------- Furthermore/Moreover, we collect the paths from every root to every annotation in every nb-DT. The information in these paths is important during search, when V is being instantiated. It will allow to determine the effect of forward checking on the future variables for free , w/o running checking mechanism. {5} {1, 2} {3, 4} {6} December 9, 2005

Robust solutions Solution bundle Dynamic bundling finds larger bundles Single Solution Static bundling Dynamic bundling V1  d V2  e V3  a V4  c V1  d V2  {e,f} V3  a V4  {b,c} V1  d V2  {d,e,f} V3  a V4  {b,c} Solution bundle Cartesian product of domain bundles Compact representation Robust solutions Dynamic bundling finds larger bundles December 9, 2005

DynBndl: worth the effort? Finds larger bundles Enables forward checking at no extra cost Does not cost more than BT or static bundling Cost model: # nodes visited by search # constraint checks made Theoretical guarantee holds for finding all solutions under same variable ordering Finding first solution ? Experiments uncover an unexpected benefit December 9, 2005

Bundling of no-goods… … is particularly effective No-good bundle {3, 4} {2} {1} V V4 V3 V1 V2 {1, 2} {1, 3} {3} C4 {1, 2, 3, 4, 5, 6} {1, 2, 3} C2 C1 C3 V1 V2 V3 V4 V No-good bundle Solution bundle … is particularly effective December 9, 2005

Experimental set-up CSP parameters: Phase transition n: number of variables {20,30} a: domain size {10,15} t: constraint tightness [25%, 75%] CR: constraint ratio (arity: 2, 3, 4) 1,000 instances per tightness value Phase transition Performance measures Nodes visited (NV) Constraint checks (CC) CPU time First Bundle Size (FBS) Cost of solving Mostly solvable instances un-solvable instances Critical value Order parameter December 9, 2005

Empirical evaluations DynBndl versus FC (BT + forward checking) Randomly generated problems, Model B Experiments Effect of varying tightness In the phase-transition region Effect of varying domain size Effect of varying constraint ratio (CR) ANOVA to statistically compare performance of DynBndl and FC with varying t t-distribution for confidence intervals December 9, 2005

Analysis: Varying tightness Low tightness Large FBS 33 at t=0.35 2254 (Dataset #13, t=0.35) Small additional cost Phase transition Multiple solutions present Maximum no-good bundling causes max savings in CPU time, NV, & CC High tightness Problems mostly unsolvable Overhead of bundling minimal FC 20 n=20 t FBS 18 a=15 0.350 33.44 Time [sec] DynBndl CR=CR3 0.400 10.91 #NV, hundreds 16 0.425 7.13 0.437 6.38 14 0.450 5.62 12 FC 0.462 2.37 0.475 0.66 10 NV 0.500 0.03 8 0.550 0.00 6 DynBndl 4 2 CPU time 19min 0.325 0.35 0.375 0.4 0.425 0.45 0.475 0.5 0.525 0.55 0.575 0.6 Tightness December 9, 2005

Analysis: Varying domain size Increasing a in phase-transition FBS increases: More chances for symmetry CPU time decreases: more bundling of no-goods CR Improv (CPU) % FBS a=10 a=15 CR1 33.3 34.3 5.5 11.9 CR2 28.6 33.0 5.0 CR3 29.8 31.7 3.6 CR4 28.4 31.6 1.2 1.4 Increasing a (n=30) Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical December 9, 2005

Outline Definitions Bundling in CSPs Bundling for join query computation Idea A CSP model for the query join Sorting-based bundling algorithm Dynamic-bundling-based join algorithm Conclusions December 9, 2005

The join query Result: 10 tuples in 3 nested tuples Join query SELECT R2.A,R2.B,R2.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C R1 R2 (compacted) Result: 10 tuples in 3 nested tuples A B C {1, 5} {12, 13, 14} {23} {2, 4} {10} {25} {6} {13, 14} {27} December 9, 2005

Databases & CSPs Same computational problems, different cost models Databases: minimize # I/O operations CSP community: # CPU operations Challenges for using CSP techniques in DB Use of lighter data structures to minimize memory usage Fit in the iterator model of database engines DB terminology CSP terminology Table, relation Constraint (relational constraint) Join condition Constraint (join-condition constraint) Attribute CSP variable Tuple in a table Tuple in a constraint or allowed by one Computing a join sequence Finding all solutions to a CSP 21min December 9, 2005

Modeling join query as a CSP Attributes of relations  CSP variables Attribute values  variable domains Relations  relational constraints Join conditions  join-condition constraints SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C R1.A R1.B R1.C R2.A R2.B R2.C R1 R2 24 December 9, 2005

Join operator R1 xy R2 Join algorithms Most expensive operator in terms of I/O  is “=”  Equi-Join x is same as y  Natural Join Join algorithms Nested Loop Sorting-based Sort-Merge, Progressive Merge-Join (PMJ) Partitions relations by sorting, minimizes # scans of relations Hashing-based 22min December 9, 2005

Join query R2 R1 R1.A R1.B R1.C R2.A R2.B R2.C R1 xy R2 CSP model Most expensive operator in terms of I/O  is “=”  Equi-Join x is same as y  Natural Join CSP model Attributes of relations  CSP variables Attribute values  variable domains Relations  relational constraints Join conditions  join-condition constraints R1.A R1.B R1.C R2.A R2.B R2.C R1 R2 SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C December 9, 2005

Progressive Merge Join PMJ: a sort-merge algorithm [Dittrich et al. 03] Two phases Sorting: sorts sub-sets of relations & Merging phase: merges sorted sub-sets PMJ produces early results We use the framework of the PMJ 32min December 9, 2005

New join algorithm Sorting & merging phases Load sub-sets of relations in memory Compute in-memory join using dynamic bundling Uses sorting-based bundling (shown next) Computes join of in-memory relations using dynamically computed bundles December 9, 2005

Sorting-based bundling Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible R1.A R2.A R1 R1.B R2.B R2 R1.C R2.C Sort relations using above ordering Next: Compute bundles of variable ahead in variable ordering (R1.A) Shrink figure lift up bottom lines December 9, 2005

Computing a bundle of R1.A Partition of a constraint Tuples of the relation having the same value of R1.A Compare projected tuples of first partition with those of another partition Compare with every other partition to get complete bundle R1 A B C 1 12 23 Partition 1 13 23 1 14 23 Unequal partitions 2 10 25 Symmetric partitions 5 12 23 26 5 13 23 5 14 23 Bundle {1, 5} December 9, 2005

Finding the valid bundle Common {1, 5} {1, 5, x} Compute a bundle for the attribute Check bundle validity with future constraints If no common value ‘backtrack’  Assign variable with the surviving values in the bundle {1, 5, y, z} December 9, 2005

Experiments XXL library for implementation & evaluation Data sets Random: 2 relations R1, R2 with same schema as example Each relation: 10,000 tuples Memory size: 4,000 tuples Page size 200 tuples Real-world problem: 3 relations, 4 attributes Compaction rate achieved Random problem: 1.48 Savings even with (very) preliminary implementation Real-world problem: 2.26 (69 tuples in 32 nested tuples) December 9, 2005

Outline Definitions Bundling in CSPs Bundling for join query computation Conclusions Summary Future research December 9, 2005

Summary Dynamic bundling in finite CSPs Binary and non-binary constraints Produces multiple robust solutions Significantly reduces cost of search at phase transition Application to join-query computation 35 Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases December 9, 2005

Future research CSPs Databases Constraint databases Only scratched the surface: interchangeability + decomposition [ECAI 1996], partial interchangeability [AAAI 1998], tractable structures Databases Investigate benefit of bundling Sampling operator Main-memory databases Automatic categorization of query results Constraint databases Design bundling mechanisms for gap & linear constraints over intervals (spatial databases) Query-size estimation: December 9, 2005