SARC Proprietary and Confidential Processor-to-Memory-Blocks NoC with Pre-Configured (but run-time reconfigurable) Low-Latency Routes G. Mihelogiannakis, M. Katevenis, D. Pnevmatikatos FORTH-ICS, Crete, Greece SARC – Preliminary Draft of May 2006
SARC Proprietary and Confidential Traditional Multiprocessor View Local (cache) memory(ies) seen as monolithic blocks, each
SARC Proprietary and Confidential Proposed View for Chip Multiprocessors Simple processors Lots of memory –to compensate for limited chip I/O throughput Large memories need to be built out of multiple smaller blocks –in order to bound word line & bit line capacitance within each block
SARC Proprietary and Confidential Opportunities for (Re-) Configurability Uniform allocation of memory blocks to processors Non-uniform allocation of memory blocks to processors Challenge: make reconfigurable alloc. almost as fast as fixed
SARC Proprietary and Confidential Long on-chip Wires already contain Active Elements Periodic buffers, due to quadratic nature of RC wire delay Approximate worst-case numbers for a 130-nm technology –as currently available to European Universities as synthesized, placed-&-routed, optimized –Synopsys DC V SP2, SOC-Encounter 3.3, Cadence NC Verilog
SARC Proprietary and Confidential Turn these into Low-Latency Configurability Elements 2-to-1 multiplexor made of (semi-custom) and-or-buffer gates –can we do better with (custom) transmission gates?
SARC Proprietary and Confidential Pre-Configuration is critical for Low Latency Control logic plus fan-out to 32 mux bits add considerable delay
SARC Proprietary and Confidential Configure “Preferred” Paths before Data Arrival Preconfigure (speculatively set) control for “preferred” path Alternate paths still work, at increased latency Configuration can change at run-time, quite fast
SARC Proprietary and Confidential Prior Art: Low Latency NoC Routers Optimize routing decision, crossbar arbitration, VC allocation for one-clock-cycle operation –Mullins, West, Moore: “Low-Latency Virtual-Channel Routers for On- Chip Networks”, ISCA 2004 –Kim, Park, Theocharides, Vijaykrishnan, Das: “A Low Latency Router Supporting Adaptivity for On-Chip Interconnects”, DAC 2005
SARC Proprietary and Confidential Contribution: Decouple Data Rate from Configuration Configure “preferred” paths at whatever convenient rate When header/address/data arrive, forward along preferred path and, in parallel, check header –if destination was not along preferred path, recover at longer latency
SARC Proprietary and Confidential Conclusion Coarse-grain reconfigurability –at the level of memory block, compute processor, compute engine, or (simple) control processor (FSM) Configure “preferred routes” in the chip, along which information flows at very low latency Other routes still available, but at longer latency Preferred routes easily reconfigurable, at run-time