Fault-Tolerant Network-Interface for Spatial Division Multiplexing Based Network-on-Chip By Anup Das
Content NoC Overview TDM-Based SDM-Based Existing NI Architecture New Area Optimized Architecture Need for Fault-Tolerance Fault-Tolerant NI Architectures Centralized Approach Distributed Approach Results Conclusion The Scope of this presentation is shown on this slide.
Network-on-Chip Increasing Number of IPs/PEs per die Communication bottleneck with shared bus Need for a scalable alternative Use of networking concepts NoC proposed by Benini et al. Switch NI IP
Network-on-Chip (contd.) Two techniques for communication Time Division Multiplexing Spatial Division Multiplexing NI IP A B C TDM-based NoC Switch SDM-based NoC
Network Interface Architecture N to 1 bit serializers – one for each outgoing wire Data Distributor to send data from output queues to one of the serializers Each distributor can send data to each of the serializers Not all the distributors are loaded all the time A single distributor can serve all the serializers
Network Interface Architecture n to 1 Distributor 1 Queue 1 Queue 2 Queue 3 Distributor 2 Distributor 3 PE 32 Switch out[7] out[1] out[0]
New Area Optimized NI Single distributor for all the serializers New component called “requester” added for interfacing with the queue sID qID 001 000, 001, 010 010 011, 100, 101 100 110, 111 2 IDs introduced – serializer ID (sID) and queue ID (qID) At connection setup time – each serializer assigned to a queue Serializer requests for data which is then forwarded to corresponding queue Data from queues travels back to the requesting serializer
New Area Optimized NI PE Switch Queue 1 Queue 2 Queue 3 32 to 1 32 Distributor out[0] out[1] out[7] Requester Queue 2 Queue 1 Queue 3 32 Switch PE
Need for Fault-Tolerance Transistor density on the rise Shrinking feature size Increasing number of faults manifesting post fabrication Yield Loss Need for fault-tolerance IP/PE level Interconnect Level Idea is to provide graceful degradation of performance in event of faults
NI Fault-Tolerance - Centralized Controller introduced between distributor and IP queues Changes data mapping dynamically when fault occurs with load balancing n to 1 Distributor 1 Queue 1 Queue 2 Queue 3 Distributor 2 Distributor 3 PE 32 Switch out[0] Controller out[1] out[7]
Centralized NI Operation Controller S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 Queue 1 Queue 2 Queue 3
NI Fault-Tolerance - Distributed Multiple Distributors and Requestors –each capable of fault recovery Two other IDs included – dID (distributor ID) and rID (requester ID) When forwarding request to requester, distributor forwards dID, sID and qID qID – used by requester to forward request to a queue dID – used by requester to send back data from the queue to the requesting distributor sID – used by the distributor to send data to the requesting serializer
Distributed NI Operation Queue 1 Queue 2 Queue 3
Results 14
Experimental Setup NoC considered with 8 links per node Data packets of size 32 bits Centralized Design coded in VHDL Distributed Design in Verilog Synopsys Design Compiler for ASIC synthesis UMC 65nm Standard Cells Area and Power number from the synthesis tool Area number converted to gate count for comparison across technologies
Area Breakup Centralized Design Distributed Design Components Distributed Deign Distributor 1.8K 2.2K Requester - 0.5K Controller 1.5K Serializer + Other 5K 4.5K Total (2 Distributors) 10.1K 9.9K
Area and Power Comparison
Increasing Fault-Tolerance
Throughput
Summary Distributed Design more area and power efficient but centralized design becomes more efficient with more distributors Single fault in the controller of centralized design will render it useless No single fault will affect distributed NI behavior Next Step – Increase granularity of load balancing Fault-tolerance of Serializer
Thank you 21