Download presentation
Presentation is loading. Please wait.
1
On Random Sampling over Joins Surajit Chaudhuri Rajeeve Motwani Vivek Narasayya Microsoft Research Stanford University Microsoft Research
2
Subtitles: The difficulty of join sampling - Example. Semantic and algorithms of sample Two previous sampling strategies New strategies for join sampling Experiment’s results
3
The Difficulty of Join Sampling - Example: Suppose that we have the relations
4
Black-Box U2: Given relation R with n tuples, generate an unweighted WR sample of size r. 1. 2. Initialize reservoir array A[1..r] with r dummy values. 3. While tuples are streaming by do begin (a) get next tuple t; (b) (c) for j=1 to r set A[j] to t with probability 1/N end
5
Black-Box WR2 : Given relation R with n tuples, generate a weighted WR sample of size r. 1. 2. Initialize reservoir array A[1…r] with r dummy values. 3. While tuples are streaming by do begin (a) get next tuple t with weight w(t); (b) (c) for j=1 to r do set A[j] to t with prob. w(t)/W end.
6
The Classification of the Problem : Case A : No information is available for either or. Case B : No information is available for but indexes and /or statistics are available for. Case C : Indexes/statistics are available for and.
7
Previous Sampling Strategies Strategy Naive-Sample: 1. Compute the join. 2. As the tuples of J stream by, use Black-Box U1 or U2 to produce.
8
Previous Sampling Strategies Strategy Olken-Sample: 1. Let M be an upper bound on for all. 2.repeat (a) Sample a tuple uniformly at random. (b) Sample a random tuple from among all tuples that have. (c) Output with probability, and with remaining probability reject the sample. Until r tuples have been produced.
9
New Strategies for Join Sampling Strategy Stream Sample is more efficiency then Olken : 1. No information is required for - case B. 2. No tuple is rejected after computing the join. 3. Only one iteration is needed for each output tuple.
10
New Strategies for Join Sampling Strategy Stream Sample: 1. Use Black-Box WR1 or WR2 to produce a WR sample of size r, where the weight for a tuple is set to 2. While tuples of are streaming by do begin (a) get next tuple and let ; (b) sample a random tuple from among all tuples that have ; (c) output. end.
11
New Strategies for Join Sampling Strategy Group Sample 1. Use Black-Box WR1 or WR2 to produce a WR sample of size r, where the weight for a tuple is set to. 2. Let consist of the tuples. Produce whose tuples are grouped by ‘s tuples that generated them. 3. Use r invocations of Black-Box U1 or U2 to sample r sample, one of each group.
12
New Strategy for Join Sampling Strategy Frequency-Partition-Sample
13
Experimental Results:
16
Summery The difficulty of join sampling- example. The classification of the problem - 3 cases. Naive-sample Olken-sample previous strategies Stream-sample Group-sample new strategies Frequency-partition-sample Conclusion : The new strategies are better then the earlier techniques.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.