Download presentation
Presentation is loading. Please wait.
Published byJulia Wouters Modified over 6 years ago
1
SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS
Sudipto Guha UPENN
2
Space efficiency in synopsis construction algorithms
Synopses Given n input numbers, summarize the input using B numbers, minimizing some error. Examples Histograms – piecewise constant repn. Wavelets – uses the wavelet basis Fourier, Bessel, SVD, what have you… VLDB 2005 Space efficiency in synopsis construction algorithms
3
Space efficiency in synopsis construction algorithms
Why space efficiency “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. From a computational viewpoint however… VLDB 2005 Space efficiency in synopsis construction algorithms
4
Space is the cruelest resource
Resources Time : tweedle thumbs Access (stream): make more passes Program simply will not run – or if data is shifted to disk, will run quite slow(er). Further, if we had more space, maybe we can compute a better (more accurate) synopsis VLDB 2005 Space efficiency in synopsis construction algorithms
5
Space efficiency in synopsis construction algorithms
Examples - I Histograms Many error measures V-OPT, Jagadish etal, 1998 O(n2B) time O(nB) space Only O(n) space at a time (working space) O(n2B2) time and O(n) space Is that the best ? Here: O(n2B) time O(n) space. VLDB 2005 Space efficiency in synopsis construction algorithms
6
Space efficiency in synopsis construction algorithms
Example - II (Haar) Wavelets Orthonormal systems For l2 error store the largest B coeffs of input Does not work for non l2 Find the best B coeffs to retain (note, restricted). Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) Here O(n) space, and O(n2) time VLDB 2005 Space efficiency in synopsis construction algorithms
7
Space efficiency in synopsis construction algorithms
Example - III Extended Wavelets Multiple measures Optimization is similar to Knapsack with choices. Previous best – Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) Here, O(BM) space VLDB 2005 Space efficiency in synopsis construction algorithms
8
What we will not talk about
Approximation algorithms for histograms Range Query Histograms Basically improvement of a factor B in space across the board. B is not always small, specially when n is large VLDB 2005 Space efficiency in synopsis construction algorithms
9
Space efficiency in synopsis construction algorithms
The main idea Can we solve using a non DP paradigm ? Well, divide & conquer … Small details – how do we divide ? Interaction Does a small interaction partitioning exist ? How (much size) to represent it ? Ease of finding it (in the given representation) ? VLDB 2005 Space efficiency in synopsis construction algorithms
10
A case study - Histograms
Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 Consider one bucket. The mean is the best value. A natural DP … VLDB 2005 Space efficiency in synopsis construction algorithms
11
Space efficiency in synopsis construction algorithms
The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n VLDB 2005 Space efficiency in synopsis construction algorithms
12
Space efficiency in synopsis construction algorithms
What if We could figure out what was the story at the middlepoint ! Two questions So what ? How ? (use a DP) VLDB 2005 Space efficiency in synopsis construction algorithms
13
Space efficiency in synopsis construction algorithms
Wait a minute … We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ? VLDB 2005 Space efficiency in synopsis construction algorithms
14
Space efficiency in synopsis construction algorithms
The code VLDB 2005 Space efficiency in synopsis construction algorithms
15
The end of working space
If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. Working space = total space. VLDB 2005 Space efficiency in synopsis construction algorithms
16
Space efficiency in synopsis construction algorithms
How much is little ? VLDB 2005 Space efficiency in synopsis construction algorithms
17
Space efficiency in synopsis construction algorithms
Wavelets A set of vectors {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution VLDB 2005 Space efficiency in synopsis construction algorithms
18
Wavelet Synopsis Construction
Formally, given a signal X and the Haar basis {i} find a representation F=i zi i with at most B non-zero zi minimizing some error which a fn of X-F Restriction. Zi is either 0 or h X,i i Debate. Unrestricted or restricted. Omit. VLDB 2005 Space efficiency in synopsis construction algorithms
19
Space efficiency in synopsis construction algorithms
Wavelets ||X-F||1 Long history Matias, Vitter Wang ’98 Garofalakis, Gibbons, ’02 Garofalakis, Kumar, ’04 State of the Art O(n2B log B) time O(n2B) space O(nB) working space Here O(n2log B) time O(n) space SEE ALSO NEXT TALK … VLDB 2005 Space efficiency in synopsis construction algorithms
20
What happens to wavelets [GK04] ?
VLDB 2005 Space efficiency in synopsis construction algorithms
21
Space efficiency in synopsis construction algorithms
Extensions Approximation Algorithms Range Query Histograms Extended Wavelets VLDB 2005 Space efficiency in synopsis construction algorithms
22
Space efficiency in synopsis construction algorithms
Histograms Saves space across all algorithms except algorithms which extend to general error measure over streams VLDB 2005 Space efficiency in synopsis construction algorithms
23
Space efficiency in synopsis construction algorithms
Range Query Same story Open Q: faster algorithm obeying synopsis size VLDB 2005 Space efficiency in synopsis construction algorithms
24
Space efficiency in synopsis construction algorithms
That’s all folks VLDB 2005 Space efficiency in synopsis construction algorithms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.