Download presentation
Presentation is loading. Please wait.
Published byEdmund Boyd Modified over 9 years ago
1
1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri Abdelmajid Khelil Dept. of Computer Science TU Darmstadt, Germany
2
2 Typical Wireless Sensor Networks (WSN) Internet, GSM, Satellite, etc Sink User mote Large number of cheap sensing devices (Sensor nodes or SN) Self-organized Multi-hop comm.
3
3 Application Spectrum: Measurement vs. Event Src: Univeristy of Virginia
4
4 Typical Network Functionalities Data Collection (SNs Sink) Covergecast Aggregation Routing .. Data Dissemination (Sink SNs) Code update (Reprogramming) Query Diagnosis .. Sink Event Sink
5
5 Challenges for Fault-Tolerance (FT) Cheap components Scarcity of resources: Processing/storage/communication Finite energy supply In situ in physical environment Test environment different from the deployment environment Evolvable deployment conditions Difficulty of testing after deployment Failure-prone components FT is as critical as other performance metrics (energy efficiency, latency & accuracy) in supporting distributed Apps. Earlier work from wired networks does not directly apply for WSNs
6
6 Fault Model Fault Tolerance in WSN Fault Tolerance in Collaborative Sensor Networks for Target Detection © DEEDS Group SWFT WS ‘07 Outline of Today’s Lecture
7
7 Fault, Error and Failure in WSN A sensor service running on node A is expected to periodically send the measurements of its sensors to an aggregation service running on node B. However, node A suffers an impact causing a loose connection with one of its sensors (Fault). Since the code implementing node A’s service is not designed to detect and overcome such situations, an erroneous state is reached (Error) when the sensor service tries to acquire data from the sensor. Due to this state, the service does not send sensor data (Failure) to the aggregation service within the specified time interval. This results in a crash or omission failure of node A observed by node B.
8
8 Sources of Faults Fragile sensor nodes Node failures Depletion of batteries Harsh environment: Dammage, short circuits, enclosure etc. Incorrect sensor reading or processing Due to battery low level, etc. SW bugs (e.g. in routing layer) Erroneous wireless communication Reading transmissiom failures Temporarily/permanent link failures Packet corruption/loss Network partitioning Sink faults Security breaches Byzantine faults: Overlap between faults and security breaches describing arbitrary behaviour
9
9 Failure Classification Omission Crash Omission: A service is sporadically not responding to requests (e.g. due to msg loss) Crash: The service at some point stops responding to any request (e.g. after f omissions) Timing: The service´s response is received out of the time interal specified by the application (too eraly or too late). Value: A service sends a timely response but with lack of accuracy. Arbitrary: Include all other failures! E.g. Byzantine failures: Failures that are in general caused by a malicious service that behaves erroneously AND not consistently
10
10 Node Architecture System Threads Address space Files Hardware Drivers Physical Data Link Network Transport Sensor Driver HardwareSensor Middleware management AlgorithmsModules Services Virtual Machine App1App2App3 P o w er M a n a g e m e nt Application Hardware Middleware Operating System Src: Abdul-Halim Jallad and Tanya Vladimirova Software
11
11 Fault Classification and Propagation in WSN Sink Event Internet, GSM, Satellite, etc User
12
12 Fault Tolerance (FT) Fault tolerance: the ability to sustain sensor network functionalities without any interruption due to failures FT techniques taxonomy Fault prevention Fault detection Fult isolation Fault identification Fault recovery FT can be adressed at different layers (HW, SW, MW, network and application) There is always a trade-off between FT and efficiency
13
13 Fault Prevention Sensor node deployment/placement Connectivity Coverage Relay node placement in two-tiered networks Power on-off Transmission range ajustment Sensor network monitoring („watch the watch-dog!“) Active Passive
14
14 Fault Detection (, Isolation and Identification) Self-diagnosis Node itself can identify faults. Examples: Measuring the current battery voltage Measuring comm. link quality Cooperative diagnosis (group detection) Several nodes monitor the behavior of another node. Examples: „Sensors from same region should have similar values“ (detailed in SWOSFT) Consumer nodes observe the behaviours of the service provider (eg. next-hop relay nodes). Hierarchical detection Uses a tree In general detection is shifted to more powerful nodes (e.g. sink) Frameworks: Momento, Sympathy and SNIF Trend: Region-level diagnosis
15
15 Fault Recovery in WSN The most common technique: Replication (Redundancy of components) Active replication (request is processed by all replicas) Inherent in WSN Multipath routing Informtion is sensor value aggregation Ignore values from faulty nodes Passive replication (request is processed by a primary replica, backup replicas are then synchronized) 1. Fault detection 2. Election (self, group, hierarchical) 3. Service distribution Pre-Copy (dynamic role assignment) Code update (Maté, Agilla, Impala..)
16
16 Fault Model Fault Tolerance in WSN Fault Tolerance in Collaborative Sensor Networks for Target Detection © DEEDS Group SWFT WS ‘07 Outline of Today’s Lecture
17
17 Fault Tolerance in Collaborative Sensor Networks for Target Detection IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004 Thomas Clouqueur, Kewal K. Saluja, and Parameswaran Ramanathan
18
18 Target Detection Problem Target Emit signals characterizing their presence (sound, pressure, temperature etc.) Signal strength decreases with the distance SN collaborate by exchanging&fusing their local information to produce a result global to sensor field. Detection results need to be available at each node
19
19 Difference from Agreement Problem Nodes sharing information may contain local information that can be totally different from one node to another. In target detection, nodes close to the target report high signal measurement, while nodes far from the target report low signal measurements. Thus, in fusion, there is a lack of common truth in the measured values. Yet, it is desirable to arrive at a common value or common values and determine the impact of faults in the methods developed to arrive at consensus.
20
20 Fault Model Faults include misbehaviors ranging from simple crash faults, where a node becomes inactive, to Byzantine faults, where the node behaves arbitrarily or maliciously Faulty nodes are assumed to send inconsistent and arbitrary values to other nodes during information sharing No comm. failures (reliable links) - Target is outside R - C is faulty! A and D may conclude the presence of a traget!
21
21 Motivation The algorithm for target detection needs to be robust to such inconsistent behavior that can jeopardize the collaboration in the sensor network. For example, if the detection results trigger subsequent actions at each node, then inconsistent detection results can lead each node to operate in a different mode, resulting in the sensor network going out of service. The performance of fusion is therefore also defined by precision. Precision measures the closeness of decisions from each other, the goal being that all nodes obtain the same decision.
22
22 Centralized/Decentralized Target Detection Centralized detection: All local sensors communicate their data to a central processor performing optimal or near optimal detection The correctness of such a scheme relies on the central node’s correctness, therefore, central node-based schemes have low robustness to sensor failure. Decentralized detection: Some preliminary processing of data is performed at each sensor node so that compressed information is gathered at the fusion center Loss of performance, but reduced communication bandwidth Improves reliability The performance loss may be reduced by optimally processing the information at each sensor node
23
23 FT Fusion Algorithms Nodes share their information Nodes use fusion rule to arrive at a decision Algorithms guarantee All non-faulty nodes obtain the same set S of data S contains all data sent by non-faulty nodes Problem: Consistent outliers remains in the set largest and smallest data are dropped. Average is computed over remaining data Different fusion algorithms can be derived by varying the size of the information shared between sensor nodes.
24
24 Value/Decision Fusion Two extrem cases: Value fusion: Nodes exchange raw measurements Decision fusion: Nodes exchange local detection decisions Value Fusion Algorithm At each node Obtain raw measurements from every node Drop the p largest values and the p smallest values (step needed for faulty nodes) Compute the AVERAGE of remaining values Compare the AVERAGE to THRESHOLD for final decision Decision Fusion Algorithm It works in the same way as the value fusion algorithm 2* p < N/3
25
25 Evaluation Metrics Precision Measures the closeness of the final decisions obtained by all sensor nodes, the goal being that all non-faulty nodes obtain the same decision If f< N/3 Precision is guranteed Accuracy Measures how well the node decisions represent the environment, the goal being that the decision of non-faulty nodes is “object detected” if and only if a target is present Measured by false alarm probability and detection probability Determined by THRESHOLD, noise, target position, node placement. If f< N/3 Relative accuracy is guranteed (due to noise) Communication overhead Number and size of messages exchanged Robustness Robustness is measured by system failure probability System failure when the faulty nodes exceed the bound of tolerable faulty nodes
26
26 Comparison of Algorithms Compare metrics in absence/presence of faulty nodes Due to the reduced amount of information shared in decision fusion, the communication cost is lower in decision fusion than in value fusion. The robustness is identical for value and decision fusion since failures depend on the number of faulty nodes present and not on the algorithm used. The performance measured in terms of precision and accuracy differs.
27
27 Literature Luciana M. S. D. Souza, Harald Vogt, Michael Beigl, “A survey on fault tolerance in wireless sensor networks”, 2007. Hai Liu, Amiya Nayak, and Ivan Stojmenovic, "On Fault Tolerance in Wireless Sensor Networks," Handbook of Wireless Ad Hoc and Sensor Networks, 2007. Lilia Paradis and Qi Han, “A Survey of Fault Management in Wireless Sensor Networks”, Journal of Network and Systems Management, 15(2), 2007. F. Koushanfar, M. Potkonjak, A. Sangiovanni-Vincentelli. “Fault Tolerance in Wireless Ad Hoc Sensor Networks”, IEEE Sensors, Vol. 2, 2002.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.