Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Revisiting Fault Diagnosis Agreement in a New Territory” S. C. Wang and K. Q. Yan Operating Systems Review, April 2004, p. 41– 61. An extension of the.

Similar presentations


Presentation on theme: "“Revisiting Fault Diagnosis Agreement in a New Territory” S. C. Wang and K. Q. Yan Operating Systems Review, April 2004, p. 41– 61. An extension of the."— Presentation transcript:

1 “Revisiting Fault Diagnosis Agreement in a New Territory” S. C. Wang and K. Q. Yan Operating Systems Review, April 2004, p. 41– 61. An extension of the Byzantine General’s algorithm – and hot off the press

2 Agreement Problem In the Byzantine General problem there is a commanding general that issues an “order” and all loyal lieutenant generals must come to the same agreement on the order. A related subproblem is the consensus problem – each processor, which has its own initial value, has to communicate with all other processors to reach a common value among the healthy processors.

3 Consensus constraints All the healthy processors agree on the common value (Consensus) If there exists a common initial value v_i among ALL the processors, then all the healthy processors must agree on v_i Most protocols for solving Byzantine Agreement or consensus are fault- masking protocols – come to consensus without the fault affecting the outcome.

4 Fault Diagnosis Agreement (FDA) Goal is to make each healthy processor able to detect and locate the faulty components in the distributed system ALL the healthy processor identify the common set of faulty components in the process of reaching consensus (Agreement) No healthy component is falsely detected as faulty by any healthy processor (Fairness)

5 Paper assumes dual failure mode on the network Most previous papers assume that the faulty components are processors only and that the network is fault-free.  Here we assume that the processors are fault-free and that the network may have a fault. Also, most other papers assume that the fault is malicious only. Here we assume dual failure:  Malicious faults (a random value is sent), and  Dormant faults (no value/crash or a stuck-at value is sent). Assume that a healthy process can detect components with dormant faults.

6 Assumptions A synchronous distributed system whose processors are reliable during the protocol execution Some faults, crash, stuck-at, noise or an intruder may interfere with message transmission N-processor fully connected network, with m malicious faults, d dormant faults, m<=ceiling[(n-d-3)/2]

7 Dual Fault Detection Consensus (DFDC) Algorithm Three phases:  Message exchange phase  Decision making phase  Fault detection phase Message exchange phase and the decision making phase is (similar to) OM(1) in the Byzantine General paper. This results in a matrix of information at each processor, MAT_i, which is used to construct a majority vector, MAJ_i

8 Fault detection phase Each processor sends every other processor its MAT_i. The MAT_i is used to find the faults by each healthy processor i:  Take the majority value in each position of the matrix to get FDMAT_i  If no majority exists for the i,jth position, use the negative value of the i,jth position of the MAT_j that was sent

9 P2=0 P4=1 P5=1 P3=0 P1=0 dormant faulty malcious faulty Initial value V10 V20 V30 V41 V51 V1V2V3V4V5 0001x 00000 00000 01111 x1111 Vectors received after the first round

10 P2 =0 P4 =1 P5 =1 P3 =0 P1 =0 dormant faulty malcious faulty V1V2V3V4V5 0001x 00000 00000 01111 x1111 Vectors received after the first round 0001x0 0001x0 0000x0 0111x1 x110x1 MAT_1MAJ_1

11 P2 =0 P4 =1 P5 =1 P3 =0 P1 =0 dormant faulty malcious faulty V1V2V3V4V5 0001x 00000 00000 01111 x1111 Vectors received after the first round 0001x0 0000 0 0 0000 0 0 0111 1 1 x111 1 1 MAT_2,3MAJ_2,3

12 P2 =0 P4 =1 P5 =1 P3 =0 P1 =0 dormant faulty malcious faulty V1V2V3V4V5 0001x 00000 00000 01111 x1111 Vectors received after the first round 0001x0 100 00 0 1 00000 1 111 1 1 0 11 11 1 MAT_4MAJ_4

13 P2 =0 P4 =1 P5 =1 P3 =0 P1 =0 dormant faulty malcious faulty V1V2V3V4V5 0001x 00000 00000 01111 x1111 Vectors received after the first round x001x0 X 00 00 0 X 000 0 0 X 111 1 1 X 111 1 1 MAT_5MAJ_5

14 0001x 0001x 0000x 0111x x110x MAT from P1 0001x 00000 00000 01111 x1111 MAT from P2 0001X 00000 00000 01111 x1111 MAT from P3 11001 00000 10010 00100 00111 MAT from P4 XXXXX XXXXX XXXXX XXXXX XXXXX MAT from P5 0001x 00000 00000 01111 x1111 FDMAT Fault detection phase with processor P1


Download ppt "“Revisiting Fault Diagnosis Agreement in a New Territory” S. C. Wang and K. Q. Yan Operating Systems Review, April 2004, p. 41– 61. An extension of the."

Similar presentations


Ads by Google