Download presentation
Presentation is loading. Please wait.
1
Constraint Processing Techniques for Improving Join Computation: A Proof of Concept Anagh Lal & Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln
2
An illustrative example Join query SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C 10 tuples in 3 nested tuples
3
Advantages Direct Savings of number of tuple comparisons Savings in I/O for next operator Space reduction of materialized join queries Future applications Use for query size estimation Assist in high-level analysis of data & in data mining
4
Our contributions A new representation of a join query as a Constraint Satisfaction Problem (CSP) A new sorting-based bundling algorithm Suitable for CSPs with fewer and larger constraints (i.e., join) Improves memory usage A new sort-merge join algorithm for producing (dynamically) bundled tuples Yields compact representation, saves memory space Identification of possible applications Data analysis Materialized views Assisting query-size estimation Suggested, not yet demonstrated
5
Constraint Satisfaction Problem Given P = ( V, D, C ) V = {V i }, a set of variables D = {D Vi }, the set of their respective domains C is a set of constraints restricting the acceptable combination of values for variables. Solution is a consistent assignment of values to variables Query: find 1 solution, all solutions, etc. V3V3 {d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1
6
Solving CSPs Typically, DFS & backtracking Improvement Static bundling [Freuder 91] Dynamic bundling [our group] –Based on dynamically identifying symmetries –Guaranteed never less efficient than non-bundling, static bundling Without bundlingStatic bundling S cd, e, f d V1V1 V2V2 Dynamic bundling ce, fd d V1V1 V2V2 S cefd d V1V1 V2V2 S V3V3 {d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1
7
Modeling Join as a CSP Attributes of relations CSP variables Attribute values variable domains Relations relational constraints Join conditions join-condition constraints SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C
8
Sorting-based bundling Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible R1.A R2.A R1.B R2.B R1.C R2.C R1 R2 Sort relations using above ordering Next: Compute bundles of variable ahead in variable ordering ( R1.A )
9
Bundling an attribute Partition of a constraint Tuples of the relation having the same value of R1.A Compare projected tuples of first partition with those of another partition Compare with every other partition to get complete bundle Partition Unequal partitions Symmetric partitions Bundle {1, 5}
10
Join using dynamic bundling Select next- variable Compute next valid bundle Found bundle? Last variable? Move to previous variable Undo previous assignment 1 st in Ordering? No Yes Output one tuple Start Stop Yes No Assign bundle
11
Finding the valid bundle {1, 5, x} {1, 5, y, z} Common {1, 5} 1.Compute a bundle for the attribute 2.Check bundle validity with future constraints 3.If no common value found GOTO 1 Assign variable with the surviving values in the bundle
12
Analysis of overheads For Bundling Additional data structures: 2 arrays, 1 pointer Only 1 array may become cumbersome Array size is largest when all the values of a variable are in one bundle But, this case also leads to best savings! Improved implementation Use of Bitmaps?
13
Progressive Merge Join PMJ: A sort-merge algorithm by [Dittrich et al. 03] Provides early results Assists in query size-estimation Two main phases Sorting: starts producing results in this phase Merging phase: merges sorted runs We use the framework of the PMJ for our external join. Implemented & evaluated with the XXL library We use the same library for our implementation
14
Preliminary experiments Data sets Random: 2 relations R1, R2 with same schema as example –Each relation: 10’000 tuples –Memory size: 4’000 tuples –Page size 200 tuples Real-world problem: 3 relations, 4 attributes Compaction rate achieved Random problem: 1.48 –Savings compensate for even worst case (of the current experimental implementation) Real-world problem: 2.26 (69 tuples in 32 nested tuples)
15
Related work Join algorithms Well established algorithms Do not focus on exploiting symmetry Database compression Output results are not compressed Compression at value level, not tuple level
16
Related work (contd) [Mamoulis & Papadias 1998] Join using FC for spatial DB Restricted to binary constraints No compaction of solution space [Bayardo et al. 1996] Reduce the number of the intermediate tuples of a sequence of joins [Rich et al. 1993] Do not compact join attribute values Does not detect redundancy present in the grouped sub- relations
17
Future work Refine implementation Use of lighter data structures Test usefulness in the context of Constraint DBs Values are continuous intervals, e.g. spatial database Conduct thorough evaluations of overall performance & overhead (memory & CPU) on different data distributions Investigate benefit of using bundling query size estimation materialized views
18
Research supported by CAREER Award #0133568 from NSF
19
DB vs. CSP terminology
20
Bundling relations: Data structures Considering the portion of the relation in memory Current-Inst: To store the current instantiations of past variables V p of R1. Current-Constraint: selection of R’: Past variable values equal Current-Inst Current variable V c > all previous instantiations of V c
21
Bundling relations: Computing bundles (Algorithm 1) NEXT-PARTITION( p ) returns the first unchecked partition in Current-Constraint following the partition p. Sorted constraints Checking equality of tuples is efficient
22
Bundling relations: Data structures Processed-Values : Cumulatively stores non- representative values of bundles Computing bundles of V c Values of V c in it are ignored Partition p is marked as checked when: Value(p) is in an instantiation bundle p is selected for comparing with other partitions to check for bundles
23
Join computation: In memory Two subsets of relations (some pages) in memory: Algorithm to find result of joining the two. Join computed as a search –Finding all solutions After finding one solution, search resumes from same depth –Algorithm shown can be entered at any “depth” in the search Uses Algorithm 1 to find bundles for assigning to variables
24
Join computation: In memory Join as a search (Algo. 2) BACKTRACK Variable[depth] in Current-Inst reset Processed-Values for the variable emptied Value in Current- Solution reset Current-Constraint re- computed Undoes the effects of the previous instantiation. Expanded on next slide
25
Join computation: In memory COMMON(b i, bundles) subset of b i consistent using join-condition constraints For equality COMMON Intersection Empty result of COMMON inconsistency BACKTRACK
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.