Download presentation
Presentation is loading. Please wait.
2
Tolerating Faults in Counting Networks http://www.paradise.caltech.edu Marc D. Riedel Jehoshua Bruck California Institute of Technology Parallel and Distributed Computing Group
3
Multiprocessor Coordination scheduling Shared Counting Processes cooperate to assign successive values 602 606 605 601 603 604 607 608 609 610 load balancing resource allocation
4
Multiprocessor Coordination Centralized Solution serialized access 602 601 603 604 608 600601602603604605606
5
Multiprocessor Coordination Centralized Solution high contention Disadvantages: 602 601 603 604 608 low throughput
6
0 00 00 0 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) concurrent data structure
7
0 00 00 0 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) 1 11 concurrent data structure
8
0 00 00 0 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) 0 0 1 0 00 0 1111 1 concurrent data structure change this to 601 with eq. editor
9
Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) Concurrent access by up to n processes Each process accesses 1/n-th of bits 0 00 00 0 0 0 1 0 00 01 111 1
10
Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) 0 00 00 0 0 0 1 0 00 01 111 1 low contention Advantages: high throughput
11
Balancer Asynchronous token routing device inputsoutputs 1 bit of memory
12
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
13
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
14
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
15
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
16
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
17
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
18
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
19
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
20
inputsoutputs 1 bit of memory Balancer Asynchronous token routing device
21
inputsoutputs 1 bit of memory balanced token counts Balancer Asynchronous token routing device
22
Shared Memory Architectures Balancer : shared boolean variable. Type balancer begin state: boolean; top: ptr to balancer; bottom: ptr to balancer; end state top bottom 1 Processes shepherd tokens through the network. 01
23
b e a aaa b bbcc cc d dee eddfg f g ff g g Counting Network Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) depth outputs inputs
24
b e a aaa b bbcc cc d dee eddfg f g ff g g step sequence Counting Network Isomorphic to Batcher’s Bitonic sorting network.
25
Snapshot inputsoutputs 1 bit of memory x y Balancer
26
3 1 3 0 1 2 2 2 2 1 2 2 2 2 1 2 Execution trace: token counts on all wires Counting Network
27
concurrent data structure 01 00 Fault Tolerance 0 No lost tokens No errors in control:Dynamic faults in the data structure: Corrupted data Inaccessible data No errors in network wiring
28
inputsoutputs Fault Model
29
inputsoutputs Fault Model fault!
30
inputsoutputs Fault Model state is inaccessible
31
inputsoutputs Fault Model state is inaccessible tokens bypass balancer
32
inputsoutputs Fault Model state is inaccessible tokens bypass balancer
33
inputsoutputs Fault Model state is inaccessible tokens bypass balancer
34
inputsoutputs Fault Model imbalance in token counts state is inaccessible tokens bypass balancer
35
inputsoutputs Fault Model received prior to the fault received after the fault tokens bypass balancer
36
Fault Tolerance Naïve approach: replicate every balancer. outputsinputs
37
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.
38
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.
39
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.
40
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.
41
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer. fault!
42
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.
43
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.
44
Fault Tolerance inputsoutputs Naïve approach: replicate every balancer. imbalance in token counts Doesn’t work!
45
Fault-Tolerant Balancer inputsoutputs LFF k+1 “pseudo-balancers”, tolerates k faults two bits of memory each
46
Pseudo-Balancer inputsoutputs L two bits of memory state: up or down status: leader (L) or follower (F)
47
Fault Tolerance 1st Solution: Counting Network constructed with FT balancers. Counting Network FT Counting Network tolerates k faults
48
Fault Tolerance FT balancers Correction Network Counting Network 2nd Solution: Rectify errors with a correction network. remapped faulty balancers (better provided that
49
Remapping Faulty Balancers
50
fault Remapping Faulty Balancers
51
inaccessible balancer Remapping Faulty Balancers
52
inaccessible balancer spare balancer, random initial state Redirect pointers to spare balancer Remapping Faulty Balancers
53
inputsoutputs Fault Model
54
inputsoutputs Fault Model fault!
55
inputsoutputs Fault Model spurious state transition Remapped balancer
56
inputsoutputs Fault Model spurious state transition Remapped balancer
57
inputsoutputs Fault Model imbalance in token counts spurious state transition Remapped balancer
58
inputsoutputs Fault Model x y Remapped balancer
59
Error Bound Error bound for the output sequence of a balancing network with remapped balancers: Balancing Network k faults
60
Distance Measure The distance between two sequences and is: Definition: gives number of “misplaced tokens” Balancing Network k faults
61
Two identical balancing networks, given same inputs: Error Bound k faults no faults
62
3 1 3 0 1 2 2 2 Execution without faults: 2 1 2 2 2 2 1 2 Error Bound
63
3 1 3 0 1 2 2 2 2 1 2 2 2 2 1 2 3 1 3 0 1 2 2 2 2 1 1 3 2 1 1 3 Execution with a fault: Error Bound
64
2 2 1 2 2 1 1 3 Distance: = 1 = 0 = 1 = 0 Error Bound
65
Correction Network Strategy: Construct a block which reduces error by one. step sequence with k errors step sequence with errors CORRECT[ n ]
66
Correction Network BUTTERFLY[ n ] largest value smallest value step sequence with k errors step sequence with errors To reduce error by one: balance smallest and largest entries.
67
Butterfly Network Network which separates out smallest and largest entries: 0 1 10 1 0 1 34 0 1 0 6 5 1 0 17 4 3 3 2 9 9 9 8 7 6 6 5 6 6 6 5 largest value smallest value
68
Butterfly Network Balance smallest and largest entries: 0 1 10 1 0 1 34 0 1 0 6 5 1 0 17 4 3 3 2 9 9 9 8 7 6 6 5 6 6 6 5 6 6 6 5 6 6 6 6 error reduced
69
Correction Network step sequence with k errors Strategy: to correct k faults, append k copies. CORRECT[ n ] #k CORRECT[ n ] #1#1 smooth sequence step sequence
70
Fault Tolerance FT balancers Correction Network Counting Network remapped faulty balancers Correction network, constructed with FT balancers, is appended to counting network.
71
Conclusions Upper bound on error resulting from faults. Practical method for tolerating faults with extra stages. Future Work Extend concepts to Diffracting Trees (Shavit et al., 1996) and other constructs. General framework for fault-tolerant concurrent data structures.
72
Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.
73
Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.
74
Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.
75
Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.
76
Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.
77
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
78
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
79
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
80
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
81
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
82
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
83
Follower Accepts red tokens in order. inputsoutputs F two bits of memory
84
Follower Accepts red tokens in order. inputsoutputs F two bits of memory Becomes a leader if it receives a green token.
85
Follower Accepts red tokens in order. inputsoutputs F two bits of memory Becomes a leader if it receives a green token. L
86
Follower Accepts red tokens in order. inputsoutputs F two bits of memory Becomes a leader if it receives a green token. L
87
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
88
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
89
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
90
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
91
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
92
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
93
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
94
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
95
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
96
Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers
97
Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers
98
Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers
99
Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers
100
Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers L
101
Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers L
102
Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers L
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.