Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.

Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004

Sahara Retreat, January 2004 Motivation Definition of Data agg. An important function for network infrastructures Exact result not achievable in face of loss and faults Low overhead, accurate approximation is crucial in  Sensor networks  P2P networks  Network monitoring and intrusion detection systems But, it’s difficult to achieve  Many problems in existing approaches

Sahara Retreat, January 2004 Background Aggregate functions  MIN, MAX, AGG, COUNT, …, etc. In-Network hierarchical processing  Reduce overhead  Query propagation  Tree construction  Aggregates calculation Addressing fault-tolerance  Multi-root  Multi-tree  Reliable transmission 1 2 3 4 5 5 3 1 2 1

Sahara Retreat, January 2004 Problems in Existing Approaches Few approach is designed to handle data loss and corruption.  Simple algorithm for data loss recovery Fragile for large process groups  Need all relevant nodes for participation Difficult to trade accuracy for communication overhead  Good applications need this tradeoff  Only need approximation  But, minimize resource consumption

Sahara Retreat, January 2004 Our Approach Probabilistic data aggregation: a scalable and robust approach  Model loss on links and failures on nodes  Apply statistical learning theory (SLT) into aggregation  Develop protocol that handles loss and failures as essential part of normal operations Self-repairing algorithm for aggregation tree maintenance Nodes participate in aggregation and communication according to statistical sampling algorithm In the absence of data, estimate value using statistical learning algorithm

Sahara Retreat, January 2004 Design & System Architecture Building blocks  Spanning tree with fault-detection and self-repairing algorithm for tree construction and maintenance  Statistical sampling for low-overhead and scalability without much loss of accuracy  Distribution estimation to provide information for work load analysis, data prediction and outlier detection  Data prediction to compensate the data loss in sampling, as well as the uncontrolled loss on links Sampler Aggregator Distribution Estimator Data Predictor Tree Constructor

Sahara Retreat, January 2004 Statistical Sampling A simple approach: sampling on the agg. tree  Every child node report the aggregation result of its subtree to its parent with certain probability, which is the design parameter of the algorithm  Low overhead of in control traffic and easy for implementation.  Might result in high data loss close to the root Distribution of sampling rate on the tree  Uniform distribution on each level  Linear distribution on each level  Proportional to the number of nodes on its subtree  Value-based sampling

Sahara Retreat, January 2004 Prediction Algorithm Naive algorithm: use value in previous epoch as current one. Linear Prediction: linear algorithm with Minimum Mean Square Estimation (MMSE) Where: More sophisticate algorithm like Kalman Filter can be used to achieve better prediction results.

Sahara Retreat, January 2004 The Protocol Tree construction and query propagation start from root of the query Aggregates are computed in each epoch from bottom up When a node receives data from a child, it updates the distribution statistics based on the distribution estimator. If a node receives data from all its children in the epoch, it does a normal data aggregation. If a node doesn't receive data from a child at the end of epoch, it does a data prediction to estimate a value, and then performs the aggregation. Aggregates are report from children to parents with certain probability. If necessary, a node might performance outlier detection on the data from a child. However  It is very danger to discard a data  Assume neighbor nodes has physical locality, a parent can use both temporal and spatial statistics to do the outlier detection.

Sahara Retreat, January 2004 Experimental Results

Sahara Retreat, January 2004 Future Work Integrated optimization by combining tree construction with statistical learning theory  Sampling on graph before tree construction Non-linear estimation algorithm for data prediction Evaluation of outlier detector in data aggregation System implementation System deployment and evaluation in real environment

Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.

Similar presentations

Presentation on theme: "Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.

Similar presentations

Presentation on theme: "Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004."— Presentation transcript:

Similar presentations

About project

Feedback