The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis.

Slides:



Advertisements
Similar presentations
QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,
Advertisements

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
The CORS method Selecting the roots of a system of polynomial equations with combinatorial optimization H. Bekker E.P. Braad B. Goldengorin University.
Fast Algorithms For Hierarchical Range Histogram Constructions
Approximations of points and polygonal chains
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Probabilistic Histograms for Probabilistic Data Graham Cormode AT&T Labs-Research Antonios Deligiannakis Technical University of Crete Minos Garofalakis.
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Optimal Workload-Based Weighted Wavelet Synopsis
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Introduction Combining two frameworks
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Based on the work of Jeffrey Scott Vitter and Min Wang.
A Quick Introduction to Approximate Query Processing Part II
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
Ch 13 – Backtracking + Branch-and-Bound
Dependency-Based Histogram Synopses for High-dimensional Data Amol Deshpande, UC Berkeley Minos Garofalakis, Bell Labs Rajeev Rastogi, Bell Labs.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU)
Wavelet Synopses with Error Guarantees Minos Garofalakis Intel Research Berkeley
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Internet Management Research Dept. Bell Labs, Lucent
CS 591 A11 Algorithms for Data Streams Dhiman Barman CS 591 A1 Algorithms for the New Age 2 nd Dec, 2002.
Distributed Constraint Optimization * some slides courtesy of P. Modi
Classification and Prediction: Regression Analysis
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Fast Approximate Wavelet Tracking on Streams Graham Cormode Minos Garofalakis Dimitris Sacharidis
Charalampos (Babis) E. Tsourakakis SODA th January ‘11 SODA '111.
Frame by Frame Bit Allocation for Motion-Compensated Video Michael Ringenburg May 9, 2003.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Special Topics in Data Engineering Panagiotis Karras CS6234 Lecture, March 4 th, 2009.
An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)
Wavelet Synopses with Predefined Error Bounds: Windfalls of Duality Panagiotis Karras DB seminar, 23 March, 2006.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
The Haar + Tree: A Refined Synopsis Data Structure Panagiotis Karras HKU, September 7 th, 2006.
Histograms for Selectivity Estimation
Outline Introduction Minimizing the makespan Minimizing total flowtime
Presented by Ho Wai Shing
Approximating Data Stream using histogram for Query Evaluation Huiping Cao Jan. 03, 2003.
The Impact of Duality on Data Representation Problems Panagiotis Karras HKU, June 14 th, 2007.
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
One-Pass Wavelet Synopses for Maximum-Error Metrics Panagiotis Karras Trondheim, August 31st, 2005 Research at HKU with Nikos Mamoulis.
Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Multi-Objective Optimization for Topology Control in Hybrid FSO/RF Networks Jaime Llorca December 8, 2004.
Introduction to Multiple-multicast Routing Chu-Fu Wang.
Dense-Region Based Compact Data Cube
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Non-additive Security Games
Data-Streams and Histograms
RE-Tree: An Efficient Index Structure for Regular Expressions
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Lattice Histograms: A Resilient Synopsis Structure
Multi-Objective Optimization for Topology Control in Hybrid FSO/RF Networks Jaime Llorca December 8, 2004.
Efficient Algorithms for the Weighted k-Center Problem on a Real Line
SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS
Sungho Kang Yonsei University
Approximating Points by A Piecewise Linear Function: I
Wavelet-based histograms for selectivity estimation
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis

Introduction Data synopsis problems require the optimization of error under a bound on space. Classical approaches treat them in a direct manner, producing complicated solutions, and sometimes resorting to heuristics. Parameters involved have a monotonic relationship. Hence, an alternative approach is possible, based on the dual, error-bounded problems.

Outline Histograms. Restricted Haar Wavelet Synopses. Unrestricted Haar and Haar+ Synopses. Experiments. Conclusions.

Histograms Approximate a data set [d 1, d 2, …, d n ] with B buckets, s i = [b i, e i, v i ] so that a maximum-error metric is minimized. Classical solution: Jagadish et al. VLDB 1998 Guha et al. VLDB 2004, Guha VLDB 2005 Recent solutions: Buragohain et al. ICDE 2007 Guha and Shim TKDE 19(7) 2007 For weighted error: Liner for:

Histograms Solve the error-bounded problem. Maximum Absolute Error bound ε = … [ 4 ][ 16 ][ 4.5 ][… Generalized to any weighted maximum-error metric. Each value d i defines a tolerance interval Bucket closed when running intersection of interval becomes null Complexity:

Histograms Apply to the space-bounded problem. Perform binary search in the domain of the error bound ε Complexity: For error values requiring space, with actual error, run an optimality test: Error-bounded algorithm running under constraint instead of If requires space, then optimal solution has been reached. Independent of buckets B

Restricted Haar Wavelet Synopses Select subset of Haar wavelet decomposition coefficients, so that a maximum-error metric is minimized. Classical solution: Garofalakis and Kumar PODS 2004 Guha VLDB

Restricted Haar Wavelet Synopses Solve the error-bounded problem. Muthukrishnan FSTTCS 2005 Local search within each of subtrees in bottom Haar tree levels Complexity: Apply to the space-bounded problem. Complexity:no significant advantage

Unrestricted Haar and Haar + Synopses Assign arbitrary values to Haar/Haar + coefficients, so that a maximum-error metric is minimized. Classical solutions: Guha and Harb KDD 2005, SODA 2006 c1c1 + c2c2 c3c3 C1C1 c5c5 c6c6 + C2C2 c7c7 c8c8 c9c9 coco d3d3 d2d2 d1d1 d0d c4c C3C3 time space Karras and Mamoulis ICDE 2007

Unrestricted Haar and Haar + Synopses Solve the error-bounded problem. Complexity: Apply to the space-bounded problem. Complexity: unrestricted Haar Haar + time space significant time & space advantage

Experiments: Histograms, Time vs. n

Experiments: Histograms, Time vs. B

Experiments: Haar Wavelets, Time vs. n

Experiments: Haar Wavelets, Time vs. B

Experiments: Haar +, Time vs. n

Experiments: Haar +, Time vs. B

Conclusions Offline space-bounded data synopsis problems are more easily solvable through their error-bounded counterparts. Complexities lower & independent of synopsis space. Dual-problem-based algorithms are simpler, more scalable, more general, more elegant, and more memory-parsimonious than the direct ones. Future: application on other data representation models, multi-measure, multi-dimensional data.

Related Work H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. VLDB 1998 S. Guha, K. Shim, and J. Woo. REHIST: Relative error histogram construction algorithms. VLDB 2004 M. Garofalakis and A. Kumar. Wavelet synopses for general error metrics. TODS, 30(4):888–928, 2005 (also PODS 2004). S. Guha. Space efficiency in synopsis construction algorithms. VLDB 2005 S. Guha and B. Harb. Wavelet Synopses for Data Streams: Minimizing Non-Euclidean Error. KDD 2005 S. Guha and B. Harb. Approximation algorithms for wavelet transform coding of data streams. SODA 2006 S. Muthukrishnan. Subquadratic algorithms for workload-aware haar wavelet synopses. FSTTCS 2005 P. Karras and N. Mamoulis. The Haar + tree: a refined synopsis data structure. ICDE 2007

Thank you! Questions? More discussion at Board 17 this evening

Compact Hierarchical Histograms Assign arbitrary values to CHH coefficients, so that a maximum- error metric is minimized. Heuristic solutions: Reiss et al. VLDB 2006 c0c0 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 d3d3 d2d2 d1d1 d0d0 time space The benefit of making node B a bucket (occupied) node depends on whether node A is a bucket node – and also on whether node C is a bucket node. [Reiss et al. VLDB 2006]

Compact Hierarchical Histograms Solve the error-bounded problem. Next-to-bottom level case cici c 2i c 2i+1 cici c 2i

Compact Hierarchical Histograms Solve the error-bounded problem. General, recursive case Complexity: time space Apply to the space-bounded problem. Complexity: Polynomially Tractable