Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases.

Similar presentations


Presentation on theme: "Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases."— Presentation transcript:

1 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases Anagh Lal Constraint Systems Laboratory Computer Science & Engineering University of Nebraska-Lincoln Research supported by NSF CAREER award #0133568 and by Maude Hammond Fling Faculty Research Fellowship.

2 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense2 Main contributions CSPs 1.Interchangeability: An algorithm for neighborhood interchangeability (NI) in non-binary CSPs 2.Dynamic bundling: Integrating NI + backtrack search for solving non-binary CSPs 3.Exploratory: Towards detecting substitutability Databases 1. A new model of the join query as a CSP 2. A new sorting-based bundling algorithm 3. A new sort-merge join algorithm that produces bundled tuples 4.Exploratory: Application to materialized views

3 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense3 Outline Background Neighborhood Interchangeability (NI) for non-binary CSPs Empirical evaluations Database algorithms based on dynamic bundling Conclusions & future work

4 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense4 Constraint Satisfaction Problem Given P = ( V, D, C ) –V : set of variables –D : set of their domains –C : set of constraints restricting the acceptable combination of values for variables –Solution is a consistent assignment of values to variables Query: find 1 solution, all solutions, etc. Examples: SAT, scheduling, product configuration NP-Complete in general V3V3 {d}{d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1

5 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense5 Systematic search Basic mechanism –DFS & backtracking (BT) –Variable being instantiated:current variable –Uninstantiated variables:future variables –Instantiated variables: past variables Constraint propagation –Remove values inconsistent with constraints –Forward checking filters domains of future variables given the instantiation of current variable

6 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense6 Value interchangeability [Freuder, ‘91] Equivalent values in the domain of a variable {c, d, e, f } {d}{d} {a, b, d}{a, b, c} V4V4 V2V2 V1V1 V3V3 Full Interchangeability (FI): –d, e, f interchangeable for V 2 in any solution Neighborhood Interchangeability (NI): –Efficiently approximates FI –Finds e, f but misses d –Discrimination tree DT(V x )

7 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense7 Dynamic bundling [Our group, ‘01] –Dynamically identifies NI –Finds fatter solution than BT & static bundling –Never less efficient than BT & static bundling Bundling: using NI in search BT Static bundling S cd, e, f d V1V1 V2V2 Dynamic bundling ce, fd d V1V1 V2V2 S cefd d V1V1 V2V2 S V3V3 {d}{d} {a, b, d}{a, b, c} { c, d, e, f } V4V4 V2V2 V1V1 Static bundling [Haselböck, ‘93]

8 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense8 Robust solutions Single solution V 1  d V 2  e V 3  a V 4  c Robust solution V 1  {d} V 2  {d, e, f} V 3  {a} V 4  {b, c} V3V3 {d}{d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1 Solution bundle: Cartesian product of bundles of variables Solution-bundle size = 1  3  1  2 = 6

9 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense9 Phase transition [Cheeseman et al. ‘91] Significant increase of cost around critical value In CSPs, order parameter is constraint tightness & ratio Algorithms compared around phase transition Cost of solving Mostly solvable problems Mostly un-solvable problems Critical value of order parameter Order parameter

10 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense10 Non-binary CSPs Constraint Variable C1C1 C2C2 C3C3 C4C4 VV1V1 V2V2 VV3V3 V2V2 V3V3 V4V4 V1V1 V4V4 1131312111 1332312222 2133222131 23342222 31142311 32261 411 422 532 632 C4C4 {1, 2, 3, 4, 5, 6} {1, 2, 3} C2C2 C1C1 C3C3 V1V1 V2V2 V3V3 V4V4 V Scope(C x ): the set of variables involved in C x Arity(C x ): size of scope Computing NI for non-binary CSPs is not a trivial extension from binary CSPs

11 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense11 CSP parameters n number of variables a domain size t constraint tightness ratio of number of disallowed tuples over all possible tuples deg degree of a variable c k number of constraints of arity k p k = c k / ( n k ) constraint ratio

12 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense12 Outline Background Neighborhood Interchangeability (NI) for non-binary CSPs –Non-binary discrimination tree (nb-DT) Empirical evaluations Database algorithms based on dynamic bundling Conclusions & future work

13 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense13 NI for non-binary CSPs 1.Building an nb-DT for each constraint –Determines the NI sets of variable given constraint 2.Intersecting partitions from nb-DTs –Yields NI sets of V (partition of D V ) 3.Processing paths in nb-DTs –Gives, for free, updates necessary for forward checking C4C4 {1, 2, 3, 4, 5, 6} C2C2 C1C1 C3C3 V1V1 V2V2 V3V3 V4V4 V {1, 2} {5, 6}{3, 4} Root nb-DT(V, C 1 ) Root {1, 2} {3, 4} {6} {5} nb-DT(V, C 2 )

14 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense14 Building an nb-DT: nb-DT(V, C 1 ) (, ) {1, 2} Root C1C1 VV1V1 V2V2 113 133 213 233 311 322 411 422 532 632 (, ) Annotation Path {1} Domain of V 562341 O (deg. a (k+1). (1 - t)) (, ) {3, 4} (, ) {5, 6}

15 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense15 Bundling = Search + NI Benefits of bundling 1.Bundles solutions 2.Bundles no-goods Dynamic bundling (DynBndl) –Re-computes NI during search –Yields larger bundles,boosts effects of bundling Skeptics’ objection to DynBndl –Costly & not worthwhile We show that the converse holds {3, 4} {2} {1} {1, 2} {1, 3} {1} {3} {1} No- good bundle V V4V4 V3V3 V1V1 V2V2 Solution bundle

16 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense16 Advantages of DynBndl We exploit nb-DTs for forward checking DynBndl versus FC (BT+ forward checking) –Finding all solutions: theoretically best –Finding first solution: empirical evidence DynBndl yields multiple, robust for less cost

17 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense17 Outline Background Neighborhood Interchangeability (NI) for non-binary CSPs Empirical evaluations Database algorithms based on dynamic bundling Conclusions & future work

18 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense18 Empirical evaluations DynBndl versus FC (BT+forward checking) Experiments –Effect of varying tightness –In the phase-transition region Effect of varying domain size Effect of varying constraint ratio (CR) Randomly generated problems, Model B ANOVA to statistically compare performance of DynBndl and FC with varying t t-distribution for confidence intervals

19 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense19 Experimental set-up Generated 16 data sets –n = {20,30}  a = {10,15}  {CR1,CR2,CR3,CR4} –9—12 values for t  [25%,75%] –1,000 instances per tightness value Performance measurements –FBS, size of the first solution bundle –NV, number of nodes visited in the search tree –CC, number of constraints checked –CPU time

20 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense20 Analysis: Varying tightness Low tightness –Large FBS 33 at t=0.35 2254 (Dataset #13, t=0.35) –Small additional cost Phase transition –Multiple solutions present –Maximum no-good bundling causes max savings in CPU time, NV, & CC High tightness –Problems mostly unsolvable –Overhead of bundling minimal n=20 a=15 CR=CR3 0 2 4 6 8 10 12 14 16 18 20 0.3250.350.3750.40.4250.450.4750.50.5250.550.5750.6 Tightness Time [sec] #NV, hundreds t FBS 0.350 33.44 0.400 10.91 0.425 7.13 0.437 6.38 0.450 5.62 0.462 2.37 0.475 0.66 0.500 0.03 0.550 0.00 NV CPU time DynBndl FC DynBndl FC

21 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense21 Analysis: Varying domain size Increasing a in phase- transition –FBS increases: More chances for symmetry –CPU time decreases: more bundling of no- goods CRImprov (CPU) % FBS a=10a=15a=10a=15 CR133.334.35.511.9 CR228.633.05.05.5 CR329.831.73.65.0 CR428.431.61.21.4 Increasing a (n=30) Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical

22 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense22 Outline Background Neighborhood Interchangeability (NI) for non-binary CSPs Empirical evaluations Database algorithms based on dynamic bundling –Sorting-based bundling algorithm –Dynamic-bundling-based join algorithm Conclusions & future work

23 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense23 Databases & CSPs DB terminologyCSP terminology Table, relationConstraint (relational constraint) Join conditionConstraint (join-condition constraint) AttributeCSP variable Tuple in a tableTuple in a constraint or allowed by one A sequence of natural joinsAll solutions to a CSP Same computational problems, different cost models –Databases: minimize # I/O operations –CSP community: # CPU operations Challenges for using CSP techniques in DB –Use of lighter data structures to minimize memory usage –Fit in the iterator model of database engines

24 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense24 Join operator R1 x  y R2 –Most expensive operator in terms of I/O –  is “=”  Equi-Join x is same as y  Natural Join Join algorithms –Nested Loop –Sorting-based Sort-Merge, Progressive Merge-Join (PMJ) Partitions relations by sorting, minimizes # scans of relations –Hashing-based

25 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense25 The join query Join query SELECT R2.A,R2.B,R2.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C Result: 10 tuples in 3 nested tuples R1R2 (Compacted) A B C {1, 5} {12, 13, 14} {23} {2, 4} {10} {25} {6} {13, 14} {27}

26 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense26 Modeling join query as a CSP Attributes of relations  CSP variables Attribute values  variable domains Relations  relational constraints Join conditions  join-condition constraints SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C R1.AR1.BR1.C R2.AR2.B R2.C R1 R2

27 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense27 Progressive Merge Join PMJ: a sort-merge algorithm by [Dittrich et al. ‘03] Two phases 1.Sorting: sorts sub-sets of relations & produces early results 2.Merging phase: merges sorted sub-sets We use the framework of the PMJ for our external join

28 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense28 New join algorithm Sorting & merging phases –Load sub-sets of relations in memory –Compute in-memory join using dynamic bundling In-memory join –Uses sorting-based bundling (shown next) –Computes join of in-memory relations using dynamically computed bundles Cool animation upon request

29 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense29 Computing a bundle of R1.A Partition Unequal partitions Symmetric partitions Bundle {1, 5} R1 A B C 1 12 23 1 13 23 1 14 23 2 10 25 5 12 23 5 13 23 5 14 23 Partition of a constraint –Tuples of the relation having the same value of R1.A Compare projected tuples of first partition with those of another partition Compare with every other partition to get complete bundle

30 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense30 Experiments XXL library for implementation & evaluation Data sets Random: 2 relations R1, R2 with same schema as example –Each relation: 10,000 tuples –Memory size: 4,000 tuples –Page size 200 tuples Real-world problem: 3 relations, 4 attributes Compaction rate achieved –Random problem: 1.48 –Savings even with (very) preliminary implementation –Real-world problem: 2.26 (69 tuples in 32 nested tuples)

31 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense31 Outline Background Neighborhood Interchangeability (NI) for non-binary CSPs Empirical evaluations Database algorithms based on dynamic bundling Conclusions & future work

32 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense32 Conclusions Algorithm for computing NI sets in non-binary CSPs DynBndl –produces multiple robust solutions –significantly reduces cost of search at phase transition New dynamic-bundling-based join algorithm Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases

33 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense33 Future work Sort constraint definitions to improve CSP techniques Design bundling mechanisms for gap & linear constraints in Constraint Databases Explore benefits of bundling in Databases –Sampling operator –Main-memory databases –Automatic categorization of query results

34 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense34 Thanks!!

35 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense35 Related work Join algorithms –Well established algorithms –Do not focus on exploiting symmetry Database compression –Output results are not compressed –Compression at value level, not tuple level

36 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense36 Related work (contd) [Mamoulis & Papadias 1998] –Join using FC for spatial DB –Restricted to binary constraints –No compaction of solution space [Bayardo et al. 1996] –Reduce the number of the intermediate tuples of a sequence of joins [Rich et al. 1993] –Do not compact join attribute values –Does not detect redundancy present in the grouped sub- relations

37 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense37 Analysis of overheads For Bundling –Additional data structures: 2 arrays, 1 pointer –Only 1 array (Processed values) may become cumbersome Array size is largest –when all the values of a variable are in one bundle –But, this case also leads to best savings!

38 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense38 Sorting-based bundling Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible R1.A R2.A R1.B R2.B R1.C R2.C R1 R2  Sort relations using above ordering  Next: Compute bundles of variable ahead in variable ordering ( R1.A )

39 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense39 Join using bundling Computing bundle for R1.A ABC ABC Processed values R1 Processed values R2 Select partition to compare for R1.ASymmetric partitions, Adding to bundle of R1.A, Current bundle of R1.A = {1, 5} Computing bundle for R2.A Select partition to compare Symmetric partitions, Adding to bundle of R2.A, Current bundle of R2.A = {1, 5} Update processed values for R1.A 5 5 Update processed values for R2.A R1 R2 R2.C R1.A R2.A R1.B R2.B R1.C

40 Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense40 Join using bundling 5 1, 5 5 Current bundle of R1.A = {1, 5} Current bundle of R2.A = {1, 5} Common(R1.A, R2.A) = {1, 5} Compute current constraint of R1 Assign {1, 5} to R1.A ABC ABC Processed values R1 Processed values R2 R1 R2 R2.C R1.A R2.A R1.B R2.B R1.C 1, 5 Assign {1, 5} to R2.A Compute current constraint of R2 Next variable R1.B


Download ppt "Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases."

Similar presentations


Ads by Google