Download presentation
Presentation is loading. Please wait.
Published byPoppy Cook Modified over 9 years ago
1
When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris 1
2
How to gather interesting data from thousands of Motes? Tens to thousands of motes Unreliable individually To collect and analyze data Long term low energy deployment Can using processing power at each Mote Analyze local before sharing data 2
3
Transmission of data is expensive compare to CPU cycles 1Kb transmitted 100 meters = 3 million CPU instructions AA power Mote can transmit 1 message per day for about two months (assuming no other power draws) Power density is growing very slowly compared to computation power, storage, etc Analyze and process locally, only transmitting what is required 3
4
Minimize communications ◦ Minimize broadcast/receive time ◦ Minimize message size ◦ Move computations to individual nodes Nodes pass data in multi-hop fashion towards a root Select connectivity so graph helps with processing Handle faulty nodes within network 4
5
5 10 6 7 6 5 5 5
6
Max is very simple What about Count? ◦ Need to avoid double counting due to redundant paths What about spatial events? ◦ Need to evaluate readings across multiple sensors Correlation between events Failures of nodes can loose branches of the tree 6
7
Connectivity Graph – unstructured or how to structure Diffusion of requests and how to combine data Maintenance messages vs Query messages Reliability of results Load balancing – messages traffic – storage Storage costs at different nodes 7
8
S.Madden, M.Franklin, J.Hellerstein, and W.Hong Intel Research, 2002 8
9
Aggregates values in low power, distributed network Implemented on TinyOS Motes SQL like language to search for values or sets of values – Simple declarative language Energy savings Tree based methodology – Root node generates requests and dissipates down the children 9
10
Three functions to aggregate results – f (merge function) Each node runs f to combine values =f (, ) EX: =f (, ) – i (initialize function) Generates state record at lowest level of tree EX: – e (evaluator function) Root uses e to generate the final result RESULT=e, EX: SUM/COUNT Functions must be preloaded on Motes or distributed via software protocols 10
11
1 1 3 1 1 3 7 1 2 1 Count = Max via tree 11
12
All searches have different properties that affect aggregate performance Duplicate insensitive – unaffected by double counting (Max, Min) vs (Count, Average) – Restrict network properties Exemplary – return one value (Max/Min) – Sensitive to failure Summary – computation over values (Average) – Less sensitive to failure 12
13
Distributive – Partial states are the same as final state (Max) Algebraic – Partial states are of fixed size but differ from final state (Average - Sum, Count) Holistic – Partial states contain all sub-records (median) – Unique – similar to Holistic, but partial records may be smaller then holistic Content Sensitive – Size of partial records depend on content (Count Distinct) 13
14
Diffusion of requests and then collection of information Epochs subdivided for each level to complete task ◦ Saves energy ◦ Limits rate of data flow 14
15
Snooping – Broadcast messages so others can hear messages ◦ Rejoin tree if parents have failure ◦ Listen to other broadcasts and only broadcast if its values are needed In case of MAX, do not broadcast if peer has transmitted a higher value Hypothesis testing – root guesses at value to minimize traffic 15
16
Theoretic results for ◦ 2500 Nodes Savings depend on function Duplicate Insensitive, summary best ◦ Distributive helps Holistic is the worse 16
17
16 Mote network Count number of motes in 4 sec epochs No optimizations Quality of count is due to less radio contention in TAG Centralized used 4685 messages vs TAG’s 2330 50% reduction, but less then theoretical results – Different loss model, node placement 17
18
Loss of nodes and subtrees – Maintenance for structured connectivity Single message per node per epoch – Message size might increase at higher level nodes – Root gets overload (Does it always matter?) Epochs give a method for idling nodes – Snooping not included, timing issues 18
19
Continuous aggregation ◦ Nodes constantly passing data towards aggregation points Root free ◦ Any node start query Query can take different paths ◦ Balances load between nodes What costs, advantages over TAG? 19
20
S.Nath, P.Gibbons, S.Seshan, Z.Anderson Microsoft Research, 2008 20
21
TAG ◦ Not robust against node or link failure ◦ A single node failure leads to loss of the entire sub branch's data Synopsis Diffusion ◦ Exploiting the broadcast nature of wireless medium to enhance reliability ◦ Separating routing from aggregation ◦ The final aggregated data at the sink is independent of the underlying routing topology ◦ Synopsis diffusion can be used on top of any routing structure ◦ The order of evaluations and the number of times each data included in the result is irrelevant 21
22
Not robust against node or link failure 22 1 1 3 1 1 3 7 1 2 1 10 3 Count = 10
23
Multi-path routing ◦ Benefits Robust Energy-efficient ◦ Challenges Duplicate sensitivity Order sensitivity 1 4 7 15 2 20 23 Count = 1 3 2 58 10 23
24
A novel aggregation framework ◦ ODI synopsis: small-sized digest of the partial results Bit-vectors Sample Histogram Better aggregation topologies ◦ Multi-path routing ◦ Implicit acknowledgment ◦ Adaptive rings Example aggregates Performance evaluation 24
25
The exact definition of these functions depend on the particular aggregation function: ◦ SG(.) Takes a sensor reading and generates a synopsis ◦ SF(.,.) Takes two synopsis and generates a new one ◦ SE(.) Translates a synopsis into the final answer 25 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation
26
Distribution phase ◦ The aggregate query is flooded ◦ The aggregate topology is constructed Aggregation phase ◦ Aggregated values are routed toward Sink ◦ SG() and SF() functions are used to create partial results 26
27
The sink is in R0 A node is in Ri if it’s i hops away from sink Nodes in Ri-1 should hear the broadcast by nodes in Ri Loose synchronization between nodes in different rings Each node transmits only once ◦ Energy cost same as tree 27 R3R3 R2R2 R0R0 R1R1 A B C
28
Coin tossing experiment CT(x) used in Flajolet and Martin’s Algorithm: ◦ For i=1,…,x-1: CT(x) = i with probability ◦ Simulates the behavior of the exponential hash function ◦ Synopsis: a bit vector of length k > log(n) n is an upper bound on the number of the sensor nodes in the network ◦ SG(): a bit vector of length k with only the CT(k)th bit is set ◦ SF(): bit wise Boolean OR ◦ SE(): the index of lowest-order 0 in the bit vector= i-> 28 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Magic Constant
29
The number of live sensor nodes, N, is proportional to 010000000010001000000001010000010010011000010000010010011010010010010011011011 Count 1 bits 4 29 Intuition : The probability of N nodes all failing to set the i th bit is which is approximately 0.37 when and even smaller for larger N. SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation
30
Aggregation DAGCanonical left-deep tree SG SF r1r2r5r3r4 s SG r1r2 r3 r4 r5 SF s 30 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation
31
◦ P1: SG() preserves duplicates If two reading are considered duplicates then the same synopsis is generated ◦ P2: SF() is commutative SF(s1, s2) = SF(s2, s1) ◦ P3: SF() is associative SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3) ◦ P4: SF() is same-synopsis idempotent SF(s, s) = s Theorem: Properties P1-P4 are necessary and sufficient properties for ODI-Correctness 31
32
Uniform Sample of Readings ◦ Synopsis: A sample of size K of tuples ◦ SG(): Output the tuple ◦ SF(s,s’): outputs the K tuples in s∪s’ with the K largest r i ◦ SE(s): Output the set of values val i in s ◦ Useful holistic aggregation 32 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation
33
Frequent Items (items occurring at least T times) ◦ Synopsis: A set of pairs, the values are unique and the weights are at least log(T) ◦ SG(): Compute CT(k) where k>log(n) and call this weight and if it’s at least log(T) output ◦ SF(s,s’): For each distinct value discard all but the pair with maximum weight. Output the remaining pairs. ◦ SE(s): Output for each pair in s as a frequent value and its approximate count ◦ Intuition: A value occurring at least T time is expected to have at least one of its calls to CT() return at least log(T) p=1/T 33 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation
34
Communication error ◦ 1-Percent contributing ◦ h: height of DAG ◦ k: the number of neighbors each nodes has ◦ p: probability of loss ◦ The overall communication error upper bound: ◦ If p=0.1, h=10 then the error is negligible with k=3 Approximation error ◦ Introduced by SG(), SF(), and SE() functions ◦ Theorem 2: any approximation error guarantees provided for the centralized data stream scenario immediately applies to a synopsis diffusion algorithm, as long as the data stream synopsis is ODI-correct. 34
35
Implicit acknowledgement provided by ODI synopses ◦ Retransmission High energy cost and delay ◦ Adapting the topology When the number of times a node’s transmission is included in the parents transmission is below a threshold Assigning the node to a ring that can have a good number of parents Assign a node in ring i with probability p to : Ring i +1 If ni > ni-1 ni+1 > ni -1 and ni+2 > ni Ring i -1 If ni-2 > ni-1 ni-1 ni 35
36
RingsAdaptive Rings 36
37
The algorithms are implemented in TAG simulator 600 sensors deployed randomly in a 20 ft * 20 ft grid The query node is in the center Loss probabilities are assigned based of the distance between nodes 37
38
RMS Error% Value Included 38
39
Pros ◦ High reliability and robustness ◦ More accurate answers ◦ Implicit acknowledgment ◦ Dynamic topology adaptation ◦ Moderately affected by mobility Cons ◦ Approximation error ◦ Low node density decreases the benefits ◦ The fusion functions should be defined for each aggregation function ◦ Increased message size 39
40
Is there any benefit in coupling routing with aggregation? ◦ Choosing the paths and finding the optimal aggregation points ◦ Routing the sensed data along a longer path to maximize aggregation ◦ Finding the optimal routing structure Considering energy cost of links NP-Complete Heuristics (Greedy Incremental) Considering data correlation in the aggregation process ◦ Spatial ◦ Temporal Defining a threshold TiNA 40
41
Could energy saving gained by aggregation be outweighed by the cost of it? ◦ Aggregation function cost Storage cost Computation cost (Number of CPU cycles) No mobility ◦ Static aggregation tree Structure-less or structured? That is the question… ◦ Continuous ◦ On-demand 41
42
Transmitting large amounts of data on the internet is slow ◦ Better to process locally and transmit the interesting parts only 42
43
How does query rate affect design decisions? Load balancing between levels of the tree ◦ Overload root and main nodes How will video capabilities of Imote affect aggregation models? 43
44
44
45
Query can originate at any node, not just the root Histogram data so different levels of the tree hold different details of data. ◦ Child hold wider range/smaller area ◦ Parents hold smaller range / wider area 45
46
Avoid bottlenecks ◦ Queue can originate anywhere Avoid overload the one root node ◦ Different nodes can answer different questions quickly Must constantly aggregating data 46
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.