CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.

CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006

Medians and Beyond: New Aggregation Techniques for Sensor Networks

Sensor networks l Conserve power as much as possible l Individual sensor readings are inherently unreliable l TinyDB and Cougar support in-network data aggregation –Reduce communication costs –Improve reliability

Existing data aggregation schemes l Limited to relatively simple types of queries, e.g., SUM, COUNT, AVG and MIN/MAX l Recall some holistic queries, e. g., median is not well supported as discussed in the previous lecture –Approximation is required

Key contributions l Support approximate quantiles –Median –Most frequent data values –Histogram –Range queries

l Each sensor aggregates the data it has received into a fixed size message l Novel data structure called quantile digest or q-digest –Provable guarantees on approximation error & max resource consumption –Tradeoffs between user specified tolerance and memory and bandwidth consumption –If the sensor values are in [1, s], we can answer quantile queries using message size m with an error of O (log s/m)

Key contributions (cont’d) l Performance evaluation via simulation –Accuracy, scalability & low resource utilization for highly variable input data set

Assumptions l Each sensor’s reading is an integer value in the range [1, s] –Strong assumption? l Base station initiates queries l Base station is the root of the spanning tree l No routing loops l No duplicate packets –Strong assumption? –What about packet losses?

Quantile digest l Error-memory tradeoff –Error conscious users can set a high max msg size to achieve good accuracy –Resource conscious users can specify the max msg size they can tolerate l Confidence factor –Theoretic worst case error bound l Multiple queries –Use q-digest at the base station to answer other queries without further querying sensor nodes

Q-digest

Properties of q-digest l Each sensor has a q-digest reflecting the summary of data available to it l Q-digest consists of buckets of different sizes & their associated counts l Binary partition of the value space 1,..., s l Depth of the tree is log s l Each node v is a bucket [v.min, v.max] –E.g., root has a range [1, s] and its two children have ranges [1, s/2] & [s/2 + 1, s] –Every bucket has a counter count(v) associated with it

Data Compression l Compression factor k determines the size of the q- digest l Digest properties 1. count(v) <= └n/k┘ 2. count(v) + count(v p ) + count(v s ) > └ n/k ┘ –Root and leaf nodes are exceptions –A leaf’s frequency can be larger than └ n/k ┘ –No parent & sibling for root l Property 1: Unless it’s a leaf, no node should have a high count l Property 2: If two children which are siblings have low counts, we merge them into their parent

Building a q-digest l Consider a sensor has n data values –Each data value is in [1, s] –Exact representation is {f1, f2, …, fs} –Sum(fi) = n –The worst case storage requirement is O(n) or O(s), whichever is smaller –Construct a compact representation using q-digest

Hierarchical merge l Hierarchically merge & reduce #buckets –Go through all nodes bottom up & check if any node violates the digest property –Only Property 2 can be violated – ∆ v = count(v) + cont(v l ) + count(v r ) where v l & v r are the left and right child of v –If a node v’s child violates Property 2, its children can be merged with it by setting the count to ∆ v & deleting its children

Example Info. loss No Loss

Observations l The detailed info. of data values occurring frequently is maintained l Less frequently occurring values are limped into a larger bucket resulting in info. loss

Merging q-digests: Build the q- digest in a distributed manner n1= n2 = 200, k=10

Space complexity l Q-digest is a subset of the complete tree only containing the nodes with significant counts l Lemma 1. A q-digest Q constructed with compression parameter k has a size at most 3k –Directly from Property 2 – Sum (count(v) + cont(v l ) + count(v r )) > |Q|*n/k where |Q| is the number of nodes in Q

Error bound l In the worst case, the count of any node can deviate from its actual value by the sum of the counts of its ancestors l Lemma 2. In a Q created using the compression factor k, the max error of any node is n/k * log s –Proof Error(v) <= sum count(x) <= sum n/k (Property 1) <= n/k * log s l The relative error error(v)/n is 1/k * log s

l Lemma 3. The relative error is bounded by 1/k * log s after the union step. l Theorem 1. Given memory m to build a q-digest, it is possible to answer any quantile query with error <= 3/m * log s –Directly from Lemmas 1 & 2

Representation of a q-digest l Transmit for each node –Log(2s) + log(n) bits per tuple l {,,,, }

Queries on q-digest l Quantile query –Sort the list nodes in the q-digest in postorder –Q = {,,,, } –Postoder List = {,,,, } –Add counts –Stop at at which sum(count) = 10 > 0.5*n

Other Queries l Inverse quantile: Given a value x, determine its rank in the sorted sequence of the input values l Range query: Find #values in [low, high] l Consensus query or frequent items: Given a fraction f in (0, 1), find all the values that are reported by more than fn sensors

Confidence factor l Thm. 1: Worst case error bound l Confidence factor = (max height of nay path from root to leaf in Q)/n l In their experiments, the worst case error bound is 48%, but confidence factor is about 9% when s=2 16 & m=100

Experimental Evaluation l Tree topology l Constant density –E.g., 1000 sensors for 1000 * 1000 area, 4000 sensors for 2000 * 2000 –Simulated 8000 nodes l Random & correlated sensor values –16bit random number –Geographic elevation data l Compare with a simple unaggregated approach –Summary is a list of distinct sensor values and a count for each value

Experimental results

400 bytes is enough to support 2% error

Max message size required to support 2% error Constant 1% of nodes, near the base station, transmit a 30K msg

Total data transmitted

Summary l Quantify tradeoffs between accuracy and resource, e.g., memory and bandwidth, usage l Communication/energy model is not realistic –Assumes perfect radio model with no packet loss –Simplistic energy consumption model

Questions l How will it work under a more realistic communication model? l Any better approaches? l Other questions? Suggestions?

CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.

Similar presentations

Presentation on theme: "CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.

Similar presentations

Presentation on theme: "CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006."— Presentation transcript:

Similar presentations

About project

Feedback