Using Packet Information for Efficient Communication in NoCs Prasanna Venkatesh R, Madhu Mutyam PACE Lab IIT Madras 20-11-2018
Agenda Motivation Existing techniques to handle multicasts at NoC Dynamic Multicast Tree VC as Cache Packet Concatenation IPC Results Energy Analysis Conclusion 20-11-2018
Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the average is 7.5% only. 20-11-2018
Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the maximum communication exists for < 4% of the time. 20-11-2018
Multicasts: Solutions in the literature Separate injections flood the network with redundant copies Multicasts: Single copy till a common path and forks to multiple copies Simplifies routing logic Dynamic Multicast routing can make use of idle paths to avoid congestion. But is it possible to meet timing constraints? 20-11-2018
Our Proposals to achieve multicast efficiency Dynamic multicast tree construction using redundant route computation units Will penalize unicasts and create starvation? Three optimizations on unicasts to enhance dynamic multicasting VC as cache Packet Concatenation Critical word first 20-11-2018
Critical Word First Borrowed from Cache data transfer optimization technique Make efficient use of the flit level split of a packet containing cache block Send the requested word with the header flit 20-11-2018
Dynamic Multicast Tree Method Compute Odd-Even route at each router for all multicast destinations Takes one RC cycle per destination Add a redundant RC unit to speed this process No extra chip area because of the simplicity Caveats Bottlenecks unicasts Slow when there is no congestion 20-11-2018
VC as Cache: Scenario A shared cache block is requested by more than one node at a given time frame The owner sends a multicast of the block to all the requestors A request arrives after this multicast The owner resends the block after processing this request 20-11-2018
Solution – Add the new requestor to the processed multicast midway! Compare up to five multicast packets with an incoming request packet at the router If matched, Forward the request to the owner for coherence and book keeping with a time stamp of the previous message Add this requestor to the multicast destinations 20-11-2018
Packet Concatenation A request is a single flit packet When RC units are busy, we can club single flit packets to the same destination to form a “super- packet” This means it is going to take one RC cycle to compute multiple packet routes from there on. 20-11-2018
Configuration for simulations Simulators Multi2sim 4.0.1, Booksim 2.0, Orion 2.0 Real time simulation 64 Nodes with 32 cores + L1 nodes and 32 shared distributed L2 cache banks 1 Flit for request and coherence packets, 5 flits for cache block Benchmarks: SPLASH2 and PARSEC workloads with 32 threads All high injection workloads are picked after an initial study on their injection rates 20-11-2018
IPC Results Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018
IPC Results Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018
Scaling to 512 Nodes: IPC Results Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018
Fine Grained Energy Footprint of Barnes Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018
Conclusion and future extensions Scalable solution for multicasts Can fit with existing techniques Easy to implement Energy Efficient Packet Concatenation can be switched on selectively depending on the load requirements Other architecture level inputs can also be used for further performance. Example: #Instructions waiting, memory level parallelism 20-11-2018
Thank you 20-11-2018