Optimization of Relational Algebra Expressions Database I.
Moore's law Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. The capabilities of many digital electronic devices are strongly linked to Moore's law: processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras.
Our objective --1 On the other hand, the development of other factors is linear. An example of such a factor is the speed with which the disk moves in hard disk drives. For large data the amount of data moved between the primary and the secondary storage devices should be minimized.
Our objective --2 Data is moved in blocks between the main memory and the hard disk. Thus, in other words, in RDBMS computations the number of blocks involved in I/O operations should be kept as low as possible. This can be achieved by working with as small transitory relations as possible.
Equivalence of relational algebra expressions In order to reduce the size of the transitory relations the relational algebra expressions are rewritten. Two relational algebra queries q, q’ are equivalent if for all database instances I q(I) is the same as q’(I). Here: – a database instance consists of relations – q(I) denotes the result of applying q on I.
Example Consider relations – Likes(drinker, beer) – Frequents(drinker, bar) The following two expressions are equivalent to each other: – bar (σ F.drinker=L.drinker beer= Bud (F L)) – bar (σ F.drinker=L.drinker (F (σ beer=Bud (L)))). In most cases the second query can be evaluated faster.
Optimization algorithm -- sketch The original relational algebra expression is rewritten into another one in which – the selection operations are accomplished as soon as possible – the unnecessary columns are removed afterwards. Next, the selection and the subsequent cross product operators are substituted with the appropriate join operators.
Our running example Likes(drinker, beer) Bar(name, city) Frequents(drinker, bar) Π L. drinker (σ L.drinker=F.drinker name=bar beer=Bud city=N.Y. (L B F))
Splitting the conditions σ C1 C2 (E) is equivalent with σ C1 ( σ C2 (E)). Π L. drinker (σ L.drinker=F.drinker name=bar beer=Bud city=N.Y. (L B F)) is equivalent with Π L. drinker (σ L.drinker=F.drinker (σ name=bar (σ beer=Bud (σ city=N.Y. (L B F)))))
Π L.drinker L BF σ L.dinker=F.drinker σ name=bar σ beer=Bud σ city=N.Y. Π L.drinker L BF σ L.drinker=F.drinker name=bar beer=Bud city=N.Y. Expression trees
Pulling down the conditions σ C (E 1 ΘE 2 ) ≡ (σ C (E 1 ))ΘE 2, where attr(C) attr(E 1 ) and Θ Є { , ⋈ }. Here – attr(C), attr(E 1 ) respectively denote the attributes appearing in condition C and relational algebra expression E 1 – while ≡ denotes the equivalence relation. Π L. drinker (σ L.drinker=F.drinker (σ name=bar (σ beer=Bud (σ city=N.Y. (L B F))))) is equivalent with Π L. drinker (σ L.drinker=F.drinker (σ name=bar ((σ beer=Bud (L)) (σ city=N.Y. (B)) F)))
Π L.drinker L BF σ L.dinker=F.drinker σ name=bar σ beer=Bud σ city=N.Y. Π L.drinker σ L.dinker=F.drinker σ name=bar σ beer=Bud L σ city=N.Y. F B
Removal of the unnecessary columns Π X (E 1 Θ E 2 ) ≡ Π Y (E 1 ) Θ Π Z (E 2 ), where X = Y Z, Y attr(E 1 ), Z attr(E 2 ) and Θ Є { , ⋈ }. Π X (σ C (E)) ≡ Π X (σ F (Π Y (E))), where Y = attr(C) X. Π L. drinker (σ L.drinker=F.drinker (σ name=bar ((σ beer=Bud (L)) (σ city=N.Y. (B)) F))) is equivalent with Π L. drinker (σ L.drinker=F.drinker (σ name=bar ( (Π drinker (σ beer=Bud (L))) (Π bar (σ city=N.Y. (B))) F)))
Π L.drinker σ L.dinker=F.drinker σ name=bar σ beer=Bud L σ city=N.Y. F B Π L.drinker σ L.dinker=F.drinker σ name=bar σ beer=Bud L σ city=N.Y. F B Π drinker Π bar Note: the application of extra projections increases the time of the evaluation of the query, hence this rewriting step can be omitted.
Substitution with joins By definition, – E 1 ⋈ C E 2 ≡ σ C (E 1 E 2 ) – E 1 ⋈ E 2 ≡ Π L (σ C (E 1 E 2 )), where in condition C the common attributes of E 1 and E 2 are made equal and these common attributes occur only once in L. Π L. drinker (σ L.drinker=F.drinker (σ name=bar ( (Π drinker (σ beer=Bud (L))) (Π bar (σ city=N.Y. (B))) F))) is equivalent with Π L. drinker ((Π drinker (σ beer=Bud (L))) ⋈ (Π bar (σ city=N.Y. (B)) ⋈ name=bar F)))
Π L.drinker σ L.dinker=F.drinker σ name=bar σ beer=Bud L σ city=N.Y. F B Π drinker Π bar Π L.drinker ⋈ σ beer=Bud L ⋈ name=bar σ city=N.Y. F B Π drinker Π bar
Commutativity and associativity Commutativity: E 1 Θ E 2 ≡ E 2 Θ E 1, where Θ Є { , ⋈, ⋈ C }. Associativity: (E 1 Θ E 2 ) Θ E 3 ≡ E 1 Θ (E 2 Θ E 3 ), where Θ Є { , ⋈ } Note: in general (E 1 ⋈ C1 E 2 ) ⋈ C2 E 3 is not equivalent with E 1 ⋈ C1 (E 2 ⋈ C2 E 3 ). Why??
Disjunctions in the conditions Disjunctions in the conditions of selection operators may complicate the situation. As a first attempt one may use equivalence rule σ C1 C2 (E) ≡ σ C1 (E) σ C2 (E) and then apply the previous algorithm on σ C1 (E) and σ C2 (E). However, in this case the relations appearing in E may be scanned twice, which is costly.