Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture of Systola 1024

Similar presentations


Presentation on theme: "Architecture of Systola 1024"— Presentation transcript:

1 Architecture of Systola 1024
Interface processors ISA RAM NORTH host computer bus Controller RAM WEST program memory M. Kunde H.W. Lang M. Schimmler H. Schmeck H. Schröder Special features of the ISA: fast local communication aggregate functions with constant period fast integer arithmetic

2 PIPS architecture Torus 32x32 off-the-shelf SRAM as distributed
bit memory control Torus 32x32 off-the-shelf SRAM as distributed shared memory with prefetch

3 mesh + interior connections
reconfigurable mesh = mesh + interior connections The additional cost to turn a mesh/torus into a reconfigurable architecture are very small. Instruction sets have to be developed though, since every communication instruction needs to be augmented to include a switch setting and information about which processors are reading and which are writing from/to which port. There is substantial literature about the power of reconfigurable architectures. Much of this has little relevance to applications as the hardware model underlying the evaluation of reconfigurable architectures ignores signal propagation delay that will most certainly occur on long buses. Here we present matrix multiplication algorithms that will profit from reconfigurability. Even if we assume that signal can travel a fixed number of grid positions every clock cycle. In this talk we ignore these issues, but we also do not present algorithms that would under any realistic model look useless such as the algorithms for constant time sorting. Such algorithms are of significant theoretical importance due to the fact that they show that the concept of reconfigurability is strictly stronger than any fixed connection network. In particular it shows the incorrectness of the widely spread assumption that lower bounds developed for the PRAM model are general lower bounds. low cost diameter 1 !! 15 positions

4 Fault tolerance through reconfiguration
CAN shadow-processors majority voting Fault tolerance through reconfiguration

5 majority voting Mission-critical tasks can be taken over by any processor PS(k)= probability of survival under the assumption that k processors fail. For the “shadow system” with 16 = 8x2 processors: PS(1)=1, PS(2)=0.93, PS(4)=0.6, PS(6)=0.22, PS(8)=0.02

6 high fault tolerance (torus)
A simple solution with high fault tolerance (torus) “atomic fault pattern” processor instrument Every fault pattern, that does not contain a 2x2 array of faulty PEs survives. PS(7)=0.7

7 Torus: 16 task 32 processors
on the “other side” PS(5)=1 How about 32 tasks and 16 processors ??

8 majority voting 6-neighborhood torus PS(5)=1

9 Distributed shared memory ? Migration of contents ?
Memory/voting/error-correction Distributed shared memory ? Migration of contents ? Memory/controller/error-correction

10 Assumption: The processors are identical and
have at least twice the required capacity. Other devices do not fail. S N E W P I 4x4 torus = 4D hypercube

11 If less then 9 processors fail, no processor needs to take more than 2 tasks.
The processors time-share between the tasks. PS(10)=1 ?

12 ?? 1-processor/1-task ?? spares - torus ?? control of switches ??
PS(7)=1 3 2 2 1 S N E W P I ?? 1-processor/1-task ?? spares - torus ?? control of switches ??

13 PS Number of switches: 2, 4, 6, 30 # faults Wire area: 0, 2.8, 3.8, 4
1 16 8 Number of switches: 2, 4, 6, 30 Wire area: , 2.8, 3.8, 4

14 Load balancing dynamic allocation of tasks


Download ppt "Architecture of Systola 1024"

Similar presentations


Ads by Google