Presentation is loading. Please wait.

Presentation is loading. Please wait.

Host Side Dynamic Reconfiguration with InfiniBand TM By Wei Lin Guay*, Sven-Arne Reinemo*, Olav Lysne*, Tor Skeie*, Bjørn Dag Johnsen^ and Line Holen^

Similar presentations


Presentation on theme: "Host Side Dynamic Reconfiguration with InfiniBand TM By Wei Lin Guay*, Sven-Arne Reinemo*, Olav Lysne*, Tor Skeie*, Bjørn Dag Johnsen^ and Line Holen^"— Presentation transcript:

1 Host Side Dynamic Reconfiguration with InfiniBand TM By Wei Lin Guay*, Sven-Arne Reinemo*, Olav Lysne*, Tor Skeie*, Bjørn Dag Johnsen^ and Line Holen^ *Simula Research Laboratory ^Sun Microsystems

2 Introduction The quest for ever increasing computing power drives the state-of-art large scale clusters. In Top500 list, more than 20 sites have > 10k processors supercomputers. The increased cluster size is challenging the reliability of interconnects – InfiniBand.

3 Introduction What are the available fault tolerance mechanisms?  Check-point/restart: Halted and restarted from the last checkpoint. Disadvantages: non-application transparent.  Deadlock-free re-routing Application transparent. Disadvantages: Inflexible.  Network Dynamic reconfiguration is the trend!

4 Network Dynamic Reconfiguration Network dynamic reconfiguration.  Move from one routing function to another while system is up and running.  Application transparent.  More flexible. Challenges of network dynamic reconfiguration  Deadlock freedom in the transition phase.  Assume that the network interface attributes have not been changed.

5 Host Side Dynamic Reconfiguration Host Side Dynamic Reconfiguration.  Migrate the attributes of the connection (Queue Pair) from the old routing structure to the new one.  Fault tolerance mechanism.  Live Migration  Policy Changes – Cluster Maintenance. Challenges of Host Side Dynamic Reconfiguration.  Which component to trigger the changes of routing path during the fault happened?  Setup prior alternative paths?  Network manager responsible to find new path?

6 Challenges of Dynamic Reconfiguration RC connection established between A and B.

7 Challenges of Dynamic Reconfiguration RC connection established between A and B. During the transmission, a link fails!

8 Challenges of Dynamic Reconfiguration RC connection established between A and B. During the transmission, a link fails! SM regenerated a deadlock free routing table.

9 Challenges of Dynamic Reconfiguration RC connection established between A and B. During the transmission, a link fails! SM regenerated a deadlock free routing table. Predefined deadlock free and shortest path for every paths are very difficult!

10 Host Side Dynamic Reconfiguration

11

12

13

14 1 1

15 2 2

16 3 3

17 Host Reconfiguration Keep track active QPs created in each host stack

18 Host Reconfiguration Keep track active QPs created in each host stack Modify QP’s context in RTS state  Reset Queue Pair

19 Host Reconfiguration Keep track active QPs created in each host stack Modify QP’s context in RTS state  Reset Queue Pair  Send Queue Drain(SQD)

20 Host Reconfiguration Keep track active QPs created in each host stack Modify QP’s context in RTS state  Reset Queue Pair  Send Queue Drain(SQD)  Auto. Path Mig.(APM)

21 Performance Evaluation Synthetic Traffic Patterns.  6-3:5-2:4-1:3-6:2-5:1-4 Application traffic patterns  HPCC b_eff

22 Performance Evaluation Micro benchmark − Setup Phase: No additional overhead!

23 Performance Evaluation Synthetic traffic patterns

24 Performance Evaluation HPCC b_eff Without dynamic reconfiguration  Benchmark will not complete once the first fault happened.  Deadlock happened!

25 Conclusion Novel fault tolerance mechanism  Feedback from SM.  Application Transparent. Evaluation of Scalability.  Event notification. Live Migration of Virtualization. Future Work

26 Thanks!


Download ppt "Host Side Dynamic Reconfiguration with InfiniBand TM By Wei Lin Guay*, Sven-Arne Reinemo*, Olav Lysne*, Tor Skeie*, Bjørn Dag Johnsen^ and Line Holen^"

Similar presentations


Ads by Google