Download presentation
Presentation is loading. Please wait.
Published byJuniper Dawson Modified over 6 years ago
1
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees
Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis, Yannis Papakonstantinou
2
Challenge & Motivation
Sensor data is big GB per minute DELPHI project collects many environmental and biomedical data (spatiotemporal) …… Public health analyst Online analytical queries are needed E.g., Cross-Correlation between heart rate and temperature (for various participant subsets) Performance is too slow Approximate answers with bounded errors are acceptable Errors should be bounded by user provided budgets. E.g., budget = 10% Computation of approximate answer should be efficient
3
PlatoDB Overview PlatoDB provides efficient approximate query processing and deterministic error guarantees Offline Construct an segment tree data structure. Each node is a segment that is represented by a linear scalable function minimizing the Euclidean distance Offline Store error statistics for each segment. Online Use error statistics to get tight estimated error for each operator Calculate query errors by combining estimated operator errors Online Navigate tree structures and incrementally update the estimated error
4
Data model Time series data a sequence of (timestamp, value) pairs
Assume fixed intervals Omit timestamps ( :00:00,15.72),( :00:01,15.74),...,( :59:59,2.32) (15.72, ,..., 2.32) *Dina Q. Goldin, Paris C. Kanellakis: On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. CP 1995:
5
Query Basic time series operators Serialize
Serialize(1.2, 5) creates a time series with 5 points whose values are 1.2 Shift Shift(T, 2) moves i-th point to (i+2)-th position Plus Plus(T1, T2) creates a new time series where di = di(1) + di(2) Minus Minus(T1, T2) creates a new time series where di = di(1) - di(2) Times Times(T1, T2) creates a new time series where di = di(1) * di(2) Aggregation operator Sum Sum(T, a, b) returns the value of Arithmetic manipulation operator (e.g., +, -, x, / )
6
Arithmetic manipulation
Example Queries Time series operator Aggregation operator Arithmetic manipulation
7
PlatoDB Architecture Pre-Processing Query Processing Query Q
Online Error budget Query Processor Query Processing Segment Tree Generator Time Series Segment Trees Sensor Data Pre-Processing Offline Noise Removal clean data error statistics compressed data Time series T1 Segment Tree for T1 S1 S1.1 S1.2 S1.1.1 S1.1.2 Segment S1 Segment S1.1 Segment S1.2 S1.1,1 Segment Tree for T2 Segment Tree for T1
8
Segment tree One tree structure for each time series
How to build such a tree? Existing time series segmentation algorithms Top-down Bottom-up Sliding-window Tree may not be balance Each node represents one segment Lower level nodes represent smaller segments How to represent each segment? Existing time series representation algorithms Piecewise Aggregate Approximation (PAA) Adaptive Piecewise Constant Approximation (APCA) Piecewise Linear Representation (PLR) Discrete Wavelet Transform (DWT) *Fu, Tak-chung. "A review on time series data mining." Engineering Applications of Artificial Intelligence 24.1 (2011):
9
Segment tree One tree structure for each time series
Linear scalable family: A family is linear scalable if for every segment S and every function f on this segment, for any subsegment S1, the restriction f|S1 is still a function in this family. Tree may not be balance Each node represents one segment Lower level nodes represent smaller segments Polynomial function family is linear scalable family PlatoDB chooses a linear scalable function f to represent a segment where f minimizes the Euclidean Distance Contribution 1: PlatoDB provides O(n) algorithm to build the tree structure.
10
Segment tree One tree structure for each time series
Each node stores two kinds of values Coefficients of the estimation function in the orthonormal basis Error statistics One tree structure for each time series Tree may not be balance Each node represents one segment Lower level nodes represent smaller segments PlatoDB stores by setting p = q =2
11
Query Processing Contribution 2:
PlatoDB provides a top-down query processing algorithm Update one node at a time and update the current error Terminate when the current error < error budget Consider query = (Sum(Times(T1, T2)), 10% Time series T1 Time series T2
12
Query Processing Contribution 2:
PlatoDB provides a top-down query processing algorithm Update one node at a time and update the current error Terminate when the current error < error budget Consider query = (Sum(Times(T1, T2)), 10% Time series T1 Time series T2
13
Error Estimation Contribution 3:
PlatoDB provides minimal error estimation for each time series operator by using error statistics Store for each segment Aligned segments (same length and same start point) Based on the orthogonal projection property In PlatoDB
14
Error Estimation Contribution 3:
PlatoDB provides minimal error estimation for each time series operator by using error statistics Store for each segment Unaligned segments (different lengths or different start points) Symbolic computation Contribution 4: PlatoDB provides an optimal algorithm in O(K1+K2) to find the segmentation with minimal error
15
Further Optimizations
Performance-wise optimization Contribution 5: PlatoDB provides an incremental update segmentation algorithm that gives ratio compared with the optimal one. 1+ 2 Space-wise optimization Contribution 6: PlatoDB can avoid storing the coefficients for the right nodes. Only red nodes store coefficients Coefficients can be deduced from the parent node and the left sibling node via an invert basis matrix
16
Preliminary Experiment
Data: 1 billion temperature and ozone data points. Query: Correlation of T1 and T2
17
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees
Thank you for your time! Q & A
18
Techniques orthogonal to PlatoDB
Synopses E.g., Wavelets, histograms, sketches Tightly tied to specific classes of queries Queries are assume to be known in advance Samples E.g., STRAT (single stratified sample), SciBORQ (biased samples), AQUA (single stratified sample), BlinkDB (multiple samples)
19
Sampling vs Compression
Sampling: a subset of the original data Compression: use functions to model data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.