Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.

Similar presentations


Presentation on theme: "High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam."— Presentation transcript:

1 High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam

2 Overview Motivation & Design Constraints Motivation & Design Constraints Network design Network design Performance Performance Adaptive Routing Adaptive Routing Conclusion Conclusion

3 Overview Motivation & Design Constraints Motivation & Design Constraints Network design Network design Performance Performance Adaptive Routing Adaptive Routing Conclusion Conclusion

4 Motivation Signal delay on wires is more important than transistor switching speed Signal delay on wires is more important than transistor switching speed Seriously decreased reliability in future processes Seriously decreased reliability in future processes Factory testing will not be possible Factory testing will not be possible Expect 20% of transistors to be DOA Expect 20% of transistors to be DOA Expect 10% more to die over several months Expect 10% more to die over several months Dataflow is an answer, but the network is currently a bottleneck Dataflow is an answer, but the network is currently a bottleneck

5 Dataflow Characteristics Unpredictable traffic Unpredictable traffic Cannot pre-allocate resources Cannot pre-allocate resources Highly bursty traffic Highly bursty traffic Quick delivery of bursts is critical Quick delivery of bursts is critical Nodes are not guaranteed to consume messages Nodes are not guaranteed to consume messages Potential for livelock & deadlock Potential for livelock & deadlock

6 Overview Motivation & Design Constraints Motivation & Design Constraints Network design Network design Performance Performance Adaptive Routing Adaptive Routing Conclusion Conclusion

7 Network Requirements High-Performance during bursts High-Performance during bursts Area efficient Area efficient Guarantee message delivery Guarantee message delivery Deadlock & Livelock free Deadlock & Livelock free Fault Tolerant Fault Tolerant Regular 2-D physical structure Regular 2-D physical structure

8 Topology On-chip - must be implementable in 2-D On-chip - must be implementable in 2-D Regular tiled structure suggests: Regular tiled structure suggests: Grid Grid Torus Torus Hypercube Hypercube Fat Tree Fat Tree Hypercube is difficult to route, scale Hypercube is difficult to route, scale Fat Tree has a single point of failure Fat Tree has a single point of failure

9 Routing Static routing does not provide essential fault tolerance Static routing does not provide essential fault tolerance Use a modified Virtual Channel algorithm Use a modified Virtual Channel algorithm VC guarantees deadlock free if nodes consume messages VC guarantees deadlock free if nodes consume messages Dynamically adaptive to handle transient faults & congestion Dynamically adaptive to handle transient faults & congestion Initial studies used static routing Initial studies used static routing

10 Flow Control Resource reservation not possible Resource reservation not possible Long-latency wires prohibit handshakes Long-latency wires prohibit handshakes Send messages assuming accept Send messages assuming accept Buffer just enough to allow receiver to send reject signal on subsequent clock cycle Buffer just enough to allow receiver to send reject signal on subsequent clock cycle

11 Deadlock-Free Operation Nodes cannot always consume messages Nodes cannot always consume messages Add a dedicated channel to and from memory Add a dedicated channel to and from memory Adds 8% area overhead Adds 8% area overhead Rotate stalled operands out of PEs to ensure forward progress Rotate stalled operands out of PEs to ensure forward progress Send first operand back at a faster rate to avoid livelock Send first operand back at a faster rate to avoid livelock

12 Overview Motivation & Design Constraints Motivation & Design Constraints Network design Network design Performance Performance Adaptive Routing Adaptive Routing Conclusion Conclusion

13 Performance Ran network-centric simulations Ran network-centric simulations 20 billion instructions 20 billion instructions Spec2000, Splash2, and Dataflow benchmarks Spec2000, Splash2, and Dataflow benchmarks Goal is to find optimum balance of: Goal is to find optimum balance of: Number of Virtual Channels Number of Virtual Channels Queue Length Queue Length Link Bandwidth Link Bandwidth Packets per message Packets per message

14

15

16

17

18 ASIC Model Performance must be balanced with area Performance must be balanced with area Developed RTL model of WaveScalar network architecture Developed RTL model of WaveScalar network architecture 90 nm process ASIC standard cell library 90 nm process ASIC standard cell library Timing per link: Timing per link: Grid links: 2.76 ns Grid links: 2.76 ns Torus links: 6.16 ns Torus links: 6.16 ns Network switch is 11.6% of chip area Network switch is 11.6% of chip area

19

20

21

22 Overview Motivation & Design Constraints Motivation & Design Constraints Network design Network design Performance Performance Adaptive Routing Adaptive Routing Conclusion Conclusion

23 Virtual Channels Flow Control In hardware only Head- of-Queue can be dequeued in one clock cycle In hardware only Head- of-Queue can be dequeued in one clock cycle If the first message in a queue is blocked then every message behind it is blocked If the first message in a queue is blocked then every message behind it is blocked The network utilization suffers due to idle links The network utilization suffers due to idle links

24 Virtual Channels Flow Channel Virtual Channels – several small queues instead of one long queue Virtual Channels – several small queues instead of one long queue Decouples buffer resources from link resources Decouples buffer resources from link resources Increase network throughput by increasing link usage Increase network throughput by increasing link usage

25 Dimension Order Routing Old WaveScalar Routing Protocol Old WaveScalar Routing Protocol Network topology is a static grid Network topology is a static grid Packets first travel to the correct x- coordinate and then to the correct y- coordinate Packets first travel to the correct x- coordinate and then to the correct y- coordinate Low network utilization from not using all available paths Low network utilization from not using all available paths Not fault tolerant Not fault tolerant

26 Adaptive Routing Progressively chooses longer routes instead of waiting for an unavailable resource Progressively chooses longer routes instead of waiting for an unavailable resource High Network Utilization High Network Utilization Fault tolerant Fault tolerant Can cause deadlock Can cause deadlock

27 Deadlock Free Adaptive Routing Some Virtual Channels are reserved for Dimension Order Routing, rest used for Adaptive routing Some Virtual Channels are reserved for Dimension Order Routing, rest used for Adaptive routing Every time a packet is routed in the wrong direction the Dimension Reversal count incremented Every time a packet is routed in the wrong direction the Dimension Reversal count incremented No packet is allowed to wait in a virtual channel with a packet that has a lower Dimension reversal count No packet is allowed to wait in a virtual channel with a packet that has a lower Dimension reversal count Mathematically proven to be deadlock free. Mathematically proven to be deadlock free.

28

29

30

31 Conclusion Best performance per area with: Best performance per area with: 2 Virtual Channels 2 Virtual Channels 2 Links 2 Links 2-4 entries per queue 2-4 entries per queue Torus Topology Torus Topology Adaptive Routing Adaptive Routing Dataflow chip networks can be high- performance at reasonable area Dataflow chip networks can be high- performance at reasonable area


Download ppt "High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam."

Similar presentations


Ads by Google