A Survey of Architectural Design and Implementation Tradeoffs in Network on Chip Systems Dan Marconett Next-Generation Networking Systems Lab University of California, Davis dmarconett@ucdavis.edu Slide_1
Overview Introduction SoC/NoC Architectures Routing Strategies Energy Dissipation Conclusion Slide_2
SoC What is System-on-Chip (SoC)? Integration of multiple computer components (i.e. microcontroller, memory blocks, timers, etc.) onto a single silicon chip Each on chip component referred to as a block Block abstraction enables component-level design of SoC containing multiple proprietary elements Slide_3
NoC What is Network-on-Chip (NoC)? Leveraging existing computer networking principles to improve inter-component intra-chip communications for SoC Each on chip component connected by switch to a particular comm wire(s) Improvement over standard bus based interconnections for SoC architectures in terms of throughput Slide_4
Overview Introduction SoC/NoC Architectures Routing Strategies Energy Dissipation Conclusion Slide_5
Architectures: CLICHE CLICHÉ: Chip-Level Integration of Communicating Heterogeneous Elements Two-dimensional mesh network layout for NoC design All switches are connected to the four closest other switches and target resource block, except those switches on the edge of the layout Connections are two unidirectional links Slide_6
Architectures: Folded Torus Similar to mesh based architectures Wires are wrapped around from the top component to the bottom and rightmost to leftmost Smaller hop count Higher bandwidth Decreased Contention Increased chip space usage Slide_7
Architectures: BFT BFT: Butterfly Fat Tree Each node in tree model has coordinates (level, position) where level is depth and position is from left to right Leaves are component blocks Interior nodes are switches Four child ports per switch and two parent ports LogN levels, ith level has n/(2^i+1) switches, n = leaves (blocks) Use traffic aggregation to reduce congestion Slide_8
Architectures: SPIN SPIN: Scalable, Programmable, Integrated Network Leverages the Butterfly Fat Tree design Now every level has same number switches Network grows like (NlogN)/8 Trades area overhead and decreased power efficiency for higher throughput Illustrative of performance vs. power consumption Slide_9
Architectures: Octagon Standard model: 8 components, 12 interconnects Design complexity increases linearly with number of nodes Largest packet travel distance is two hops High throughput Shortest path routing easy to implement Slide_10
Overview Introduction SoC/NoC Architectures Routing Strategies Energy Dissipation Conclusion Slide_11
Routing: Circuit/Packet Switching Circuit Switching Dedicated path, or circuit, is established over which data packets will travel Naturally lends itself to time-sensitive guaranteed service due to resource allocation Reservation of bandwidth decreases overall throughput and increases average delays Packet Switching Intermediate routers are now responsible for the routing of individual packets through the network, rather than following a single path Provides for so-called best-effort services Slide_12
Routing: Wormhole/Virtual Cut Through Wormhole Switching Message is divided up into smaller, fixed length flow units called flits Only first flit contains routing information, subsequent flits follow Buffer size is significantly reduced due to the limitation on the number of flits needed to be buffered at any given time Virtual Cut Through Switching Much like Wormhole switching Header flit can travel ahead and undergo processing while remaining flits are still navigating the network Higher acceptance rates and lower latencies than Wormhole Slide_13
Routing: Contention Contention occurs when routers or IP blocks attempt to send data over the same link at the same time For Circuit Switching, contention is resolved at the time of actual connection setup For packet switching, contention resolution is handled at a much finer level, by the router buffering and scheduling individual packets of information Better overall performance for packet switched networks at the cost of lack of service guarantee Slide_14
Overview Introduction SoC/NoC Architectures Routing Strategies Energy Dissipation Conclusion Slide_15
Energy Dissipation: Architectures Two causes for dissipation, switches and wire segments Many parameters in the architectural design phase which affect the key trade-off of performance vs. power dissipation Length of physical wires Switching techniques Buffer allocation Types of guaranteed service The topology itself Slide_16
Energy Dissipation: Architectures (2) Pande et al. [10] used a simulator to investigate various metrics, including energy dissipation, with respect to the five main architectures Average dynamic energy dissipated per event, each layout containing 256 functional blocks Energy dissipation increases linearly with the increase of virtual channels for all five architectures Small number (4) of virtual channels will keep energy dissipation low without giving up throughput When the traffic load was analyzed, it was found that the energy dissipation reached an upper limit when throughput was maximized Architectures with more elaborate topologies, and therefore higher degrees of connectivity (such as SPIN and Octagon) have a higher much greater energy dissipation on average (~60 nj vs. 250-350 nj) Slide_17
Energy Dissipation: Switching How to route information from block A to block B in such a way that the constraints on energy consumption are maintained Banerjee et al. [9] address this issue through a modeling approach based on a 4x4 mesh layout Virtual-cut Through Switching versus Wormhole Switching For both routing techniques, energy dissipation rises linearly with the injection rate of data packets until the network is fully congested, after which it is constant Both techniques yield same power consumption Virtual-Cut Through switching produces higher acceptance rates and lower latencies than Wormhole Switching, therefore VCT is preferred Slide_18
Overview Introduction SoC/NoC Architectures Routing Strategies Energy Dissipation Conclusion Slide_19
Conclusion More elaborate layouts with higher degrees of connectivity (SPIN and Octagon) were seen to have much higher rates of energy dissipation, however, they also yield increased throughput Elaborate architectures also take up more space on the silicon chip VCT is preferred to Wormhole due to decreased latency, though both have same energy dissipation for given traffic loads Decide on priorities; communication reliability, energy efficiency, increased throughput, decreased latency….? Slide_20
References [1] E. Rijpkema, K. Goossens, A. Radulescu, J. Dielssen, J. van Meerbergen, P. Wielage, and E. Waterlander, “Trade- offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip,” IEE Proceedings Computers and Digital Techniques, vol. 150, no. 5, pp. 294-302, Sept. 2003. [2] W. Dally, C. Seitz, “Deadlock-free Message Routing in Multiprocessor Interconnection Networks,” IEEE Transactions on Computers, vol. C-34, no. 10, pp. 547-553, May 1987. [3] S. Kumar, A. Jantsch, J. Soininen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja, and A. Hemani, “A Network on Chip Architecture and Design Methodology,” Proceedings International Symposium VLSI (ISVLSI), pp. 117-124, 2002. [4] W. J. Dally and B. Towles, “Route Packets, Not Wires: On-Chip Interconnection Networks,” Proceedings Design and Automation Conference (DAC), pp. 683-689, 2001. [5] P. P. Pande, C. Grecu, A. Ivanov, and R. Saleh, “Design of a Switch for Network on Chip Applications,” Proceedings International Symposium on Circuits and Systems (ISCAS), vol. 5, pp.217-220, May 2003. [6] P. Guerrier and A. Greiner, “A Generic Architecture for On-Chip Packet-Switched Interconnections,” Proceedings Design and Test in Europe (DATE), pp. 250-256, Mar. 2000. [7] F. Karim, A. Nguyen, and Sujit Dey, “An Interconnect Architecture For Networking Systems on Chips,” IEEE Micro, vol. 22, no. 5, pp. 36-45, Sept./Oct. 2002. [8] Ateris, “A comparison of Network-on-Chip and Buses,” http://www.arteris.com/noc_whitepaper.pdf. [9] Nilanjan Banerjee, Praveen Vellanki, Karam S. Chatha, "A Power and Performance Model for Network-on-Chip Architectures," Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE) , p. 21250, 2004. [10] Partha Pratim Pande, Cristian Grecu, Michael Jones, Andre Ivanov, Resve Saleh, "Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures," IEEE Transactions on Computers ,vol. 54, no. 8, pp. 1025-1040, August, 2005. Slide_21