Download presentation
Presentation is loading. Please wait.
Published byEdwin Watson Modified over 9 years ago
1
Filter Creation for Application Level Fault Tolerance and Detection Eric Ciocca, Israel Koren, C.M. Krishna ECE Department, UMass Amherst
2
Overview Our approach to fault detection and tolerance relies on an application’s inherent familiarity with its own data Fault detection and tolerance in the application level Applications do not need hardware or middleware to provide fault recovery To realize trends in application data, the developer must be familiar with what the data represents The existing trend can be used for fault detection, but needs to be quantitatively defined so the application may detect it.
3
What is ALFTD? AL FT D complements existing system or algorithm level fault tolerance by leveraging information available only at the application level Using such application level semantic information significantly reduces the overall cost of providing fault tolerance ALFT D may be used alone or to supplement other fault detection schemes ALFTD is tunable: It permits users to trade off fault tolerance against computation overhead Allowing more overhead for ALFTD produces better results
4
Principles of ALFTD Every physical node will run its own work (P,primary) as well as a scaled-down copy of a neighboring node’s work (S,secondary) If a fault corrupts a process, the corresponding secondary of that task will still produce output, albeit at a lower (but acceptable) quality Node 1 Node 2 Node 3 Node 4 P1P1 S4S4 P2P2 S1S1 P3P3 S2S2 P4P4 S3S3
5
Fault Detection Faults do not always completely disable a node Malformed and corrupted data are also possible Hardware-disabling faults are easy to detect with watchdog hardware and “I am alive” messages Faulty data is difficult to detect without application syntax Fault detection is a necessary condition for ALFTD to schedule which secondary nodes to run Secondary processes can provide verification for ambiguously faulty data
6
Principles of ALFTD Filters Faults are detected by passing results through one or more acceptance filters Filters are unique to applications with certain data characteristics. Value bound tests are applicable to most applications Sanity checks require knowledge of the expected output value and format. Results from Primary Filter 1 Secondary Task Queue Filter 2 Data is OK Pass Fail
7
OTIS Characteristics ALFTD was applied to OTIS (Orbital Thermal Imaging Spectrometer), part of the REE suite OTIS reads radiation values from various bands and calculates temperature data The output can be viewed graphically or numerically OTIS lends itself to ALFTD because the output data (temperature) has Local Correlation: Data changes gradually over an area Absolute Bounds: Data falls within some expected realistic range
8
ALFTD in OTIS Local Correlation and Absolute Bounds on the data led to the creation of two data fault filters Spatial Locality Filter : If the difference between pixel (x,y) and (x-1,y) is greater than some threshold , the pixel may be the result of faulty data Absolute Bounds Filter : Any pixel not falling in the value range of < value < may be the result of faulty data The filter thresholds are set based on the sample datasets provided
9
OTIS Datasets “Blob”“Stripe”“Spots” Faulty Faultless
10
OTIS Datasets with ALFTD “Blob”“Stripe”“Spots” Faulty ALFTD Corrected Faulty
11
Problem ALFTD filters require calibration Calibration constants are context sensitive Filter values can be approximated, but gains can be made in detection efficiency with well-tuned filters Heuristics are created based on characteristics of the most frequent data
12
Frequency Plots (bounds filter) Frequency of temperature values
13
Frequency Plots (spatial locality filter) Frequency of differences between adjacent pixels
14
Approach To test the detection characteristics of a scheme, an erroneous case and a control case of the same data are needed Errors may produce different kinds and intensities of faults. It is important to decide what sort of errors we want to detect In the case of OTIS, intensely faulty data (set-to- zero errors, memory gibberish) is easily detected, as it seldom falls inside the prescribed filters Our experiments include moderately faulty data: offsets in input values of up to 30% These faults tend to blend in with non-faulty data, making them especially hard to detect
15
Approach Filters can be adjusted in steps of increasing complexity A single filter has a high and low cutoff The “left” and “right” bounds of data are usually exclusive, therefore their detections act cumulatively In each filter, a tradeoff must be resolved between the desired fault detection rate and the number of incurred false alarms Multiple filters are independently calibrated Multiple filters will not necessarily detect different faults Many filters working at a low expected detection rate may detect the same or more faults for a system than a single filter working with a high expected detection rate
16
Detection Plots (single side) Fault detections and false alarms on a left-sided filter
17
Detection Plots (both sides) By overlaying the left and right filter plots, general detection traits can be observed
18
Fault Detections, Numerically Columns = left filter, Rows = right filter Bounds Filter Fault Detections This table is used to find the possible configurations that satisfy a minimum fault detection rate.
19
False Alarms, Numerically Columns = left filter, Rows = right filter Bounds Filter False Alarms Of the possible combinations chosen from the previous table, we can choose the one with the minimum number of false alarms
20
Detection Plots (both sides, spatial locality filter) By overlaying the left and right filter plots, general detection traits can be observed
21
Spatial Locality filter Multiple Filters By combining multiple filters, fault detection is increased. To be effective, filters should have distinct fault detection domains. Bounds filter
22
Relation Between Datasets “Blob” is an average data set, however we need to analyze the behavior of other datasets “Stripe”: Any filter settings achieve the same false alarm and fault detection rate, within a few percent “Spots”: Not for the bounds filter It has an average temperature 10K less than the others, pushing it closer to the “faulty” region of the bounds filter We can relax the filter and accept the cut in efficiency, or predict when the “Spots” climate should be expected and use modified filters This is the downfall of using absolute, instead of differential, data as criteria for the filters
23
Extensions to Other Applications OTIS was a likely candidate for ALFTD, due to regularity of data Natural phenomena tends to have regular and predictable behavior. Other applications dealing with temperature, imaging (NGST), or even geological surveys could have success with these two basic filters These filter settings are only useful when considering environments similar to our sample datasets, but the method of calibrating filters is general enough to apply to other datasets and similar applications
24
Extensions to Novel Datasets Once a working set of filters is devised, it should be applicable to any dataset which has the same characteristics Precalculated filter calibrations could be created to allow for higher fault detection in very specific, localized datasets General purpose filters can also be extracted by running through many datasets, but incur performance penalties
25
Dynamic Filter Calibration Approximate settings are possible, but these may perform poorly when encountering new data cases The application may need to reconfigure its filters for the new data This process could be automated – assuming the calibrating computer can obtain at least one control (fault free) dataset Without prior exposure to these novel datasets, automated dynamic reconfiguration should be implemented as a numerically based decision process
26
Conclusion Filters are a critical part of ALFTD The efficiency of the ALFTD method is contingent on a having a successful method of fault detection Careful calibration of filters can greatly improve the fault detection capability of ALFTD Options for novel datasets General Purpose filter calibrations Precalculated filter calibrations Dynamic calibration
27
Thank you! For further information, please contact: Israel Koren (koren@ecs.umass.edu) C.M. Krishna (krishna@ecs.umass.edu) Eric Ciocca (eciocca@cyberlore.com)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.