NoC General concepts Andreas Ehliar - Per Karlström
Outline Background Some Implementations Design Issues / Tools Example Application Conclusions
Current situation Many transistors –Hard to design IP cores –Solve design issue Comunication problem –BUS TDM Can't handle many nodes
Current Situation Transistors Time
Current Situation IP
NOC implementations SoCBUS xPipes Pleiades Eclipse (FPGA)
SoCBUS Arbitrary topology Packet Connected Circuit (PCC) –No need for buffers –Payload transfer latency = 1cc Mesochronous –No asynchronuous bridging –Only retiming Timing per link
SoCBUS
Pipes Multi-GHz Heterogeneous Packet-switched Parameterizable components –Compileable Wormhole switching Street sign routing Error detection Pipelined links
Pipes
Pleiades Platform –Instantiation: Maia processor –Hetrogenous FUs ALU Memories FPGAs MACs etc GALS Two level network
Pleiades MEMALU FPGAMAC ALU DSP MEMMACetc.
Eclipse Embedded chip level integrated parallel supercomputer 2D sparse mesh –High bandwidth MTAC processors –Multi Threaded Architecture with Chaining Thread Level Parallelism
Eclipse
FPGA Field Programmable Gate Array Archetype of future NoC Fine grained NoC Homogenous blocks Heterogeneous links
FPGA
Different NoC Homogenous –In function –Simplify design and floorplan Heterogeneous –In function –Better functionality and silicon usage
Homogenous NoC FU
Heterogeneous NoC FU MUL ALU DSP
Heterogeneous NoC FU MUL ALU DSP
Quality of Service Guaranteed latency Guaranteed bandwidth Correctness
Design Issues Physical layer –low swing –Differential signaling –Pseudo differential signaling –Clocks (GALS, Mesochronous)
Design Issues - Signaling V t
Design Issues - Clocking MEM FPGA ALU DSP
Design Issues - Architecture FU
Design Issues - Architecture FU
Design Issues Data-link –Error detection/correction –Media access control
Design Issues- Errors N e /N p Cost Error correction Error detection
Design Issues Network layer –Architecture Hierarchy –Switching scheme Packed switched Circuit switched
Design Issues Transport layer –Connection-oriented/-less –Flow control Packet segmentation / reassembly Reordering
Design Issues - Flow Control
Design Issues - Power Control Application Layer –Power management Node/Network centric –Power aware API
Design Issues - Effect of Design Algorithm Architecture RTL Gate Transistor Silicon
Design Issues - Power Control
Design Issues - Long Wires Solving the global interconnect mess –Delay –Bit errors –Repeaters –Clock domains Create one optimized solution that can be reused
Design Issues - Long Wires Add flip flops to increase clock frequency What about ACKs? NoC Router NoC Router
Design Issues - Long Wires Add flip flops to increase clock frequency What about ACKs? NoC Router NoC Router What about bit errors?
Design Issues - Long Wires Bit errors on long wires will not be avoidable in the future Use error correcting codes –Disadvantage: More wires Use parity bits to discover errors –Resend damaged packets –No longer possible to guarantee real-time performance
Design Issues - Long Wires Possibility to create heavily optimized solution –Low voltage signaling –Advanced symbol encoding/decoding –Wave pipelining
Design Issues - Long Wires High performance interconnect through wave pipelining –Need very careful analysis NoC Router NoC Router NoC Router NoC Router
Design Issues - Long Wires Wave pipelining performance –3.45 Ghz signaling on one bit line in 0.25 um –More energy efficient than regular pipeline –Faster than regular pipeline Disadvantage –Much harder to test/verify
System design Typical tools –Simulator –Network generator
System design What I would want –Graphical frontend to design NoC –C and RTL models of the finished NoC –C API to create C level models of the NoC –Mix C and RTL models in RTL simulator –And of course...
System design IP cores
Example: Core Router SoCBUS Simulation Study of 16 port core router on a chip 16 x 10 Gigabit Ethernet Ports Prove feasibility of using SoCBUS
Example: Core Router IPP FT PB OPP CPUMU
Example: Core Router IPP (Input Packet Processor) –Receive packet from network –Validate Packet/Filter packet –Send lookup request to forwarding table –Send packet to Packet Buffer FT (Forwarding Table) –Get IP address from IPP –Perform Lookup and send the output port to the packet buffer OPP (Output Packet Processor) –Send packet to Network
Example: Core Router PB (Packet Buffer) –Responsible for packet buffering –Buffers packets until output port information is received from the forwarding table MU (Multicast Unit) –Handle multicast packets CPU
Example: Core Router Data flow for a single packet Output Packet Processor Forwarding Table Packet Buffer Input Packet Processor
Example: Core Router Assumptions: –Each link can transfer 64 bits each clock cycle –SoCBUS can be clocked at 1.2 Ghz –Packet buffers are “large enough”
Results for “Internet Mix” packet sizes Example: Core Router
Results for minimum size packets Example: Core Router
Network utilization
Example: Core Router Bottleneck in forwarding table access –Current version of SoCBUS creates a virtual circuit for each request Proposal: Extend SoCBUS –Reliable delivery of small (64 bit or less) packets without setting up a virtual circuit
Example: Core Router Conclusion on this application example –Initial concept seems to work in simulation Current work: –Master thesis to test concept in an FPGA
Our Reflections Many papers use routers for each connection core –Not every IP core has to have a NoC Uplink –Probably better to use local shared buses with a common NoC Uplink –On the Internet, terminals are not connected directly to routers Hard to design a network if the traffic is unknown
Our Reflections Research on how to improve NoCs can often be used to improve non-NoC based designs –Communication over long distances –Improved crossbars It will be hard to guarantee real-time performance on NoCs
Conclusions NoC seems to be a reasonable tradeoff –Similar to how standard cells make it easier to design chips No industry usage (yet?) As yet, no killer application has been demonstrated Next level of abstraction –IP centric design
Questions/Discussion Will future chips have communication patterns favoring NoCs?
References Networks on chips: a new SoC paradigm Benini, L.; De Micheli, G.; Computer, Volume: 35, Issue: 1, Jan Pages: Powering networks on chips Benini, L.; De Micheli, G.; System Synthesis, Proceedings. The 14th International Symposium on, 30 Sept.-3 Oct Pages:33 – 38 Addressing the system-on-a-chip interconnect woes through communication-based design Sgroi, M.; Sheets, M.; Mihal, A.; Keutzer, K.; Malik, S.; Rabaey, J.; Sangiovanni-Vincentelli, A.; Design Automation Conference, Proceedings, June 2001 Pages: On-chip networks: a scalable, communication-centric embedded system design paradigm Henkel, J.; Wolf, W.; Chakradhar, S.; VLSI Design, Proceedings. 17th International Conference on, 2004 Pages: Design of a Core Router using the SoCBUS On-chip Network; Jimmy Svensson; LiTH-ISY-EX-04/3562-SE; LiTH
References A scalable high-performance computing solution for networks on chips Forsell, M.; Micro, IEEE, Volume: 22, Issue: 5, Sept.-Oct Pages: Xpipes: a network-on-chip architecture for gigascale systems-on-chip Bertozzi, D.; Benini, L.; Circuits and Systems Magazine, IEEE, Volume: 4, Issue: 2, 2004 Pages: xpipesCompiler: a tool for instantiating application specific networks on chip Jalabert, A.; Murali, S.; Benini, L.; De Micheli, G.; Design, Automation and Test in Europe Conference and Exhibition, Proceedings, Volume: 2, Feb Pages: Vol.2 A wave-pipelined on-chip interconnect structure for networks-on- chips Jiang Xu; Wayne, W. High Performance Interconnects, Proceedings. 11th Symposium on, Vol., Iss., Aug Pages: An on-chip network architecture for hard real time system; Daniel Wiklund; LiU-TEK-LIC-2002:69 LIU