1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de.

1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri Abdelmajid Khelil Dept. of Computer Science TU Darmstadt, Germany

2 Typical Wireless Sensor Networks (WSN) Internet, GSM, Satellite, etc Sink User mote Large number of cheap sensing devices (Sensor nodes or SN) Self-organized Multi-hop comm.

3 Application Spectrum: Measurement vs. Event Src: Univeristy of Virginia

4 Typical Network Functionalities  Data Collection (SNs  Sink)  Covergecast  Aggregation  Routing ..  Data Dissemination (Sink  SNs)  Code update (Reprogramming)  Query  Diagnosis .. Sink Event Sink

5 Challenges for Fault-Tolerance (FT)  Cheap components  Scarcity of resources: Processing/storage/communication  Finite energy supply  In situ in physical environment  Test environment different from the deployment environment  Evolvable deployment conditions  Difficulty of testing after deployment  Failure-prone components  FT is as critical as other performance metrics (energy efficiency, latency & accuracy) in supporting distributed Apps.  Earlier work from wired networks does not directly apply for WSNs

6  Fault Model  Fault Tolerance in WSN  Fault Tolerance in Collaborative Sensor Networks for Target Detection © DEEDS Group SWFT WS ‘07 Outline of Today’s Lecture

7 Fault, Error and Failure in WSN A sensor service running on node A is expected to periodically send the measurements of its sensors to an aggregation service running on node B.  However, node A suffers an impact causing a loose connection with one of its sensors (Fault).  Since the code implementing node A’s service is not designed to detect and overcome such situations, an erroneous state is reached (Error) when the sensor service tries to acquire data from the sensor.  Due to this state, the service does not send sensor data (Failure) to the aggregation service within the specified time interval. This results in a crash or omission failure of node A observed by node B.

8 Sources of Faults  Fragile sensor nodes  Node failures Depletion of batteries Harsh environment: Dammage, short circuits, enclosure etc.  Incorrect sensor reading or processing Due to battery low level, etc.  SW bugs (e.g. in routing layer)  Erroneous wireless communication  Reading transmissiom failures Temporarily/permanent link failures Packet corruption/loss Network partitioning  Sink faults  Security breaches  Byzantine faults: Overlap between faults and security breaches describing arbitrary behaviour

9 Failure Classification  Omission  Crash  Omission: A service is sporadically not responding to requests (e.g. due to msg loss)  Crash: The service at some point stops responding to any request (e.g. after f omissions)  Timing: The service´s response is received out of the time interal specified by the application (too eraly or too late).  Value: A service sends a timely response but with lack of accuracy.  Arbitrary: Include all other failures!  E.g. Byzantine failures: Failures that are in general caused by a malicious service that behaves erroneously AND not consistently

10 Node Architecture System Threads Address space Files Hardware Drivers Physical Data Link Network Transport Sensor Driver HardwareSensor Middleware management AlgorithmsModules Services Virtual Machine App1App2App3 P o w er M a n a g e m e nt Application Hardware Middleware Operating System Src: Abdul-Halim Jallad and Tanya Vladimirova Software

11 Fault Classification and Propagation in WSN Sink Event Internet, GSM, Satellite, etc User

12 Fault Tolerance (FT)  Fault tolerance: the ability to sustain sensor network functionalities without any interruption due to failures  FT techniques taxonomy  Fault prevention  Fault detection  Fult isolation  Fault identification  Fault recovery  FT can be adressed at different layers (HW, SW, MW, network and application)  There is always a trade-off between FT and efficiency

13 Fault Prevention  Sensor node deployment/placement  Connectivity  Coverage  Relay node placement in two-tiered networks  Power on-off  Transmission range ajustment  Sensor network monitoring („watch the watch-dog!“)  Active  Passive

14 Fault Detection (, Isolation and Identification)  Self-diagnosis Node itself can identify faults. Examples:  Measuring the current battery voltage  Measuring comm. link quality  Cooperative diagnosis (group detection) Several nodes monitor the behavior of another node. Examples:  „Sensors from same region should have similar values“ (detailed in SWOSFT)  Consumer nodes observe the behaviours of the service provider (eg. next-hop relay nodes).  Hierarchical detection  Uses a tree  In general detection is shifted to more powerful nodes (e.g. sink)  Frameworks: Momento, Sympathy and SNIF  Trend: Region-level diagnosis

15 Fault Recovery in WSN The most common technique: Replication (Redundancy of components)  Active replication (request is processed by all replicas)  Inherent in WSN Multipath routing Informtion is sensor value aggregation Ignore values from faulty nodes  Passive replication (request is processed by a primary replica, backup replicas are then synchronized)  1. Fault detection  2. Election (self, group, hierarchical)  3. Service distribution Pre-Copy (dynamic role assignment) Code update (Maté, Agilla, Impala..)

16  Fault Model  Fault Tolerance in WSN  Fault Tolerance in Collaborative Sensor Networks for Target Detection © DEEDS Group SWFT WS ‘07 Outline of Today’s Lecture

17 Fault Tolerance in Collaborative Sensor Networks for Target Detection IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004 Thomas Clouqueur, Kewal K. Saluja, and Parameswaran Ramanathan

18 Target Detection Problem  Target  Emit signals characterizing their presence (sound, pressure, temperature etc.)  Signal strength decreases with the distance  SN collaborate by exchanging&fusing their local information to produce a result global to sensor field.  Detection results need to be available at each node

19 Difference from Agreement Problem  Nodes sharing information may contain local information that can be totally different from one node to another. In target detection, nodes close to the target report high signal measurement, while nodes far from the target report low signal measurements. Thus, in fusion, there is a lack of common truth in the measured values. Yet, it is desirable to arrive at a common value or common values and determine the impact of faults in the methods developed to arrive at consensus.

20 Fault Model  Faults include misbehaviors ranging from simple crash faults, where a node becomes inactive, to Byzantine faults, where the node behaves arbitrarily or maliciously  Faulty nodes are assumed to send inconsistent and arbitrary values to other nodes during information sharing  No comm. failures (reliable links) - Target is outside R - C is faulty!  A and D may conclude the presence of a traget!

21 Motivation  The algorithm for target detection needs to be robust to such inconsistent behavior that can jeopardize the collaboration in the sensor network.  For example, if the detection results trigger subsequent actions at each node, then inconsistent detection results can lead each node to operate in a different mode, resulting in the sensor network going out of service.  The performance of fusion is therefore also defined by precision. Precision measures the closeness of decisions from each other, the goal being that all nodes obtain the same decision.

22 Centralized/Decentralized Target Detection  Centralized detection: All local sensors communicate their data to a central processor performing optimal or near optimal detection  The correctness of such a scheme relies on the central node’s correctness, therefore, central node-based schemes have low robustness to sensor failure.  Decentralized detection: Some preliminary processing of data is performed at each sensor node so that compressed information is gathered at the fusion center  Loss of performance, but reduced communication bandwidth  Improves reliability  The performance loss may be reduced by optimally processing the information at each sensor node

23 FT Fusion Algorithms  Nodes share their information  Nodes use fusion rule to arrive at a decision  Algorithms guarantee  All non-faulty nodes obtain the same set S of data  S contains all data sent by non-faulty nodes  Problem: Consistent outliers remains in the set  largest and smallest data are dropped.  Average is computed over remaining data  Different fusion algorithms can be derived by varying the size of the information shared between sensor nodes.

24 Value/Decision Fusion  Two extrem cases:  Value fusion: Nodes exchange raw measurements  Decision fusion: Nodes exchange local detection decisions  Value Fusion Algorithm  At each node Obtain raw measurements from every node Drop the p largest values and the p smallest values (step needed for faulty nodes) Compute the AVERAGE of remaining values Compare the AVERAGE to THRESHOLD for final decision  Decision Fusion Algorithm  It works in the same way as the value fusion algorithm 2* p < N/3

25 Evaluation Metrics  Precision  Measures the closeness of the final decisions obtained by all sensor nodes, the goal being that all non-faulty nodes obtain the same decision  If f< N/3  Precision is guranteed  Accuracy  Measures how well the node decisions represent the environment, the goal being that the decision of non-faulty nodes is “object detected” if and only if a target is present  Measured by false alarm probability and detection probability  Determined by THRESHOLD, noise, target position, node placement.  If f< N/3  Relative accuracy is guranteed (due to noise)  Communication overhead  Number and size of messages exchanged  Robustness  Robustness is measured by system failure probability  System failure when the faulty nodes exceed the bound of tolerable faulty nodes

26 Comparison of Algorithms Compare metrics in absence/presence of faulty nodes  Due to the reduced amount of information shared in decision fusion, the communication cost is lower in decision fusion than in value fusion.  The robustness is identical for value and decision fusion since failures depend on the number of faulty nodes present and not on the algorithm used.  The performance measured in terms of precision and accuracy differs.

27 Literature  Luciana M. S. D. Souza, Harald Vogt, Michael Beigl, “A survey on fault tolerance in wireless sensor networks”, 2007.  Hai Liu, Amiya Nayak, and Ivan Stojmenovic, "On Fault Tolerance in Wireless Sensor Networks," Handbook of Wireless Ad Hoc and Sensor Networks, 2007.  Lilia Paradis and Qi Han, “A Survey of Fault Management in Wireless Sensor Networks”, Journal of Network and Systems Management, 15(2), 2007.  F. Koushanfar, M. Potkonjak, A. Sangiovanni-Vincentelli. “Fault Tolerance in Wireless Ad Hoc Sensor Networks”, IEEE Sensors, Vol. 2, 2002.

1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de.

Similar presentations

Presentation on theme: "1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de.

Similar presentations

Presentation on theme: "1 Software Fault Tolerance (SWFT) SWFT for Wireless Sensor Networks (Lec 1) Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de."— Presentation transcript:

Similar presentations

About project

Feedback