Download presentation
Presentation is loading. Please wait.
Published byLogan Lang Modified over 9 years ago
1
The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis
2
Introduction Data synopsis problems require the optimization of error under a bound on space. Classical approaches treat them in a direct manner, producing complicated solutions, and sometimes resorting to heuristics. Parameters involved have a monotonic relationship. Hence, an alternative approach is possible, based on the dual, error-bounded problems.
3
Outline Histograms. Restricted Haar Wavelet Synopses. Unrestricted Haar and Haar+ Synopses. Experiments. Conclusions.
4
Histograms Approximate a data set [d 1, d 2, …, d n ] with B buckets, s i = [b i, e i, v i ] so that a maximum-error metric is minimized. Classical solution: Jagadish et al. VLDB 1998 Guha et al. VLDB 2004, Guha VLDB 2005 Recent solutions: Buragohain et al. ICDE 2007 Guha and Shim TKDE 19(7) 2007 For weighted error: Liner for:
5
Histograms Solve the error-bounded problem. Maximum Absolute Error bound ε = 2 4 5 6 2 15 17 3 6 9 12 … [ 4 ][ 16 ][ 4.5 ][… Generalized to any weighted maximum-error metric. Each value d i defines a tolerance interval Bucket closed when running intersection of interval becomes null Complexity:
6
Histograms Apply to the space-bounded problem. Perform binary search in the domain of the error bound ε Complexity: For error values requiring space, with actual error, run an optimality test: Error-bounded algorithm running under constraint instead of If requires space, then optimal solution has been reached. Independent of buckets B
7
34 16 2 20 20 0 36 16 0 18 7 -8 9 -9 10 25 11 10 26 Restricted Haar Wavelet Synopses Select subset of Haar wavelet decomposition coefficients, so that a maximum-error metric is minimized. Classical solution: Garofalakis and Kumar PODS 2004 Guha VLDB 2005 18
8
Restricted Haar Wavelet Synopses Solve the error-bounded problem. Muthukrishnan FSTTCS 2005 Local search within each of subtrees in bottom Haar tree levels Complexity: Apply to the space-bounded problem. Complexity:no significant advantage
9
Unrestricted Haar and Haar + Synopses Assign arbitrary values to Haar/Haar + coefficients, so that a maximum-error metric is minimized. Classical solutions: Guha and Harb KDD 2005, SODA 2006 c1c1 + c2c2 c3c3 C1C1 c5c5 c6c6 + C2C2 c7c7 c8c8 c9c9 coco d3d3 d2d2 d1d1 d0d0 - + + + - + c4c4 + - + + + C3C3 time space Karras and Mamoulis ICDE 2007
10
Unrestricted Haar and Haar + Synopses Solve the error-bounded problem. Complexity: Apply to the space-bounded problem. Complexity: unrestricted Haar Haar + time space significant time & space advantage
11
Experiments: Histograms, Time vs. n
12
Experiments: Histograms, Time vs. B
13
Experiments: Haar Wavelets, Time vs. n
14
Experiments: Haar Wavelets, Time vs. B
15
Experiments: Haar +, Time vs. n
16
Experiments: Haar +, Time vs. B
17
Conclusions Offline space-bounded data synopsis problems are more easily solvable through their error-bounded counterparts. Complexities lower & independent of synopsis space. Dual-problem-based algorithms are simpler, more scalable, more general, more elegant, and more memory-parsimonious than the direct ones. Future: application on other data representation models, multi-measure, multi-dimensional data.
18
Related Work H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. VLDB 1998 S. Guha, K. Shim, and J. Woo. REHIST: Relative error histogram construction algorithms. VLDB 2004 M. Garofalakis and A. Kumar. Wavelet synopses for general error metrics. TODS, 30(4):888–928, 2005 (also PODS 2004). S. Guha. Space efficiency in synopsis construction algorithms. VLDB 2005 S. Guha and B. Harb. Wavelet Synopses for Data Streams: Minimizing Non-Euclidean Error. KDD 2005 S. Guha and B. Harb. Approximation algorithms for wavelet transform coding of data streams. SODA 2006 S. Muthukrishnan. Subquadratic algorithms for workload-aware haar wavelet synopses. FSTTCS 2005 P. Karras and N. Mamoulis. The Haar + tree: a refined synopsis data structure. ICDE 2007
19
Thank you! Questions? More discussion at Board 17 this evening
21
Compact Hierarchical Histograms Assign arbitrary values to CHH coefficients, so that a maximum- error metric is minimized. Heuristic solutions: Reiss et al. VLDB 2006 c0c0 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 d3d3 d2d2 d1d1 d0d0 time space The benefit of making node B a bucket (occupied) node depends on whether node A is a bucket node – and also on whether node C is a bucket node. [Reiss et al. VLDB 2006]
22
Compact Hierarchical Histograms Solve the error-bounded problem. Next-to-bottom level case cici c 2i c 2i+1 cici c 2i
23
Compact Hierarchical Histograms Solve the error-bounded problem. General, recursive case Complexity: time space Apply to the space-bounded problem. Complexity: Polynomially Tractable
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.