Download presentation
Presentation is loading. Please wait.
Published byAndrew Lawson Modified over 9 years ago
1
Join Synopses for Approximate Query Answering Swarup Acharya, Philip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy By Vladimir Gamaley
2
Abstract In large data environments its difficult to provide fast and reliable answers. In this paper we demonstrate the difficulties in the traditional approach and propose the new technics for evaluation and maintenance of Join Synopses.
3
Introduction Tradition query processing approach (exact answer though minimal time use) Not always an exact answer is needed Sometimes appropriate answer is enough
4
Introduction (continued) Schemes for providing approximate answers that rely on basic relations alone suffer from serious disadvantages. Use of precomputed small sets of distinguished joins.
5
Introduction (continued) Careful allocation of place Allocation heuristics Providing approximate bounds Join synopses maintenance Experimental study results
6
AQUA System The goal of AQUA system is to improve response times by avoiding accesses to the original data. Maintenance of small synopses of various samples and histograms.
7
AQUA System (Components) Statistics Collection Query Rewriting Maintenance
8
AQUA System (Architecture) Aqua Synopses Data Warehouse AQUA Answers Bounds Queries New Data
9
Problems with joins Uniform random samples provide: Non uniform result samples Small join results sizes
10
Problems with joins (example) R.X a a b b a b a1 a2 b1 S.X Base probability for tuple to be selected 1/r a1 and a2 - 1/r 3 a1 and b1 - 1/r 4 for k way foreign join - 1/r k
11
Join Synopses Foreign Key Join: A two way join r 1 r 2 is a foreign key join if the join attribute is a foreign key in r 1 (a key in r 2 ). For k > 2, a k-way foreign join if there is an ordering r 1,r 2..r k and for j = 1,2,.. K, s i - 1 r i is a 2-way foreign join where s i-1 is a relation obtained joining r 1, r 2, … r i-1
12
Join synopses TPC-D scheme L PS S N R C O P
13
Join synopses (continued) Lemma 1: The subgraph of G on the k nodes in any k - way foreign key join must be a connected graph with a single root node Lemma 2: There is a 1-1 correspondence between tuples in a relation r1 and tuple in any k-way foreign key join.
14
Join Synopses (continued) Join Synopses: For each node u in G, corresponding to a relation r 1, define J(u) to be the output of the maximum foreign key join r 1,r 2..r k with source k 1. Let S u be a uniform random sample of r 1. Define a join synopses J(S u ) to be the output of join S u, r 2,..r k. The join synopses for scheme consists of join synopses for all u’s.
15
Join Synopses (continued) Theorem: Let r1,r2…rk, k>=3 be an arbitrary k-way foreign join, with source relation r1. Let u be the node in G corresponding to r1 and let Su be a uniform random sample of r1. Let A be the set of attributes in r1,r1…rk Then: 1. J(Su) is a uniform random sample of J(u) with |Su| tuples 2. Join r1, r2…rk is a projection of J(u) on the attributes in r1, r2…rk
16
Join Synopses (continued) Lemma: From a single join synopses for a node whose maximum foreign key has k relations we can extract uniform random sample of between k-1 to 2 k-1 -1 distinct foreign key joins Lemma: For any node u whose maximum foreign key join is a k-way join, number of tuples in its renormalized join synopsis J(Su) is at most k|Su|
17
Space allocation strategies ni - numbr of tuples allocated to the join fi- fraction of queries for which the join is a relation or the source of foreign key join Theorem: ni = N’N’ = N/ si - size of join tuple
18
Space allocation strategies Heuristics EqJoin: Equally between relations CubeJoin: In proportion to the cube root of their join synopses tuple size PropJoin: In proportion to their join synopses size.
19
Maintenance of Join Synopses Adding a tuple Deleting a tuple
20
Experiments Results TestBed: TPC-D decision benchmark. Scale factor 0.3 (database of about 300 megabytes). 296MHz UltraSparc-II. I/O - 5 MB/sec
21
Experiment 1 accuracy - summary size EquiBase, PropBase - produce answers only when the summary size exceeds 1.5% of the database. EquiJoin, PropJoin - good results even for 0.1% of the database.
22
Experiment 2 execution timing Actual execution time - 122 seconds. The response time increases with the summary size. Query using Join Synopses needs in two orders less time!
23
Related Work Approximate query answering Statistical techniques
24
Conclusions Schemes based on join synopses provide better answer than those, based only on the basic relations samples. Approximate answering is becoming extremely important in new application of data warehouses. However, there are still more problems: group-bys, ranks etc...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.