Download presentation
Presentation is loading. Please wait.
Published byPhilip Kelley Modified over 6 years ago
1
Using Packet Information for Efficient Communication in NoCs
Prasanna Venkatesh R, Madhu Mutyam PACE Lab IIT Madras
2
Agenda Motivation Existing techniques to handle multicasts at NoC
Dynamic Multicast Tree VC as Cache Packet Concatenation IPC Results Energy Analysis Conclusion
3
Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the average is 7.5% only.
4
Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the maximum communication exists for < 4% of the time.
5
Multicasts: Solutions in the literature
Separate injections flood the network with redundant copies Multicasts: Single copy till a common path and forks to multiple copies Simplifies routing logic Dynamic Multicast routing can make use of idle paths to avoid congestion. But is it possible to meet timing constraints?
6
Our Proposals to achieve multicast efficiency
Dynamic multicast tree construction using redundant route computation units Will penalize unicasts and create starvation? Three optimizations on unicasts to enhance dynamic multicasting VC as cache Packet Concatenation Critical word first
7
Critical Word First Borrowed from Cache data transfer optimization technique Make efficient use of the flit level split of a packet containing cache block Send the requested word with the header flit
8
Dynamic Multicast Tree
Method Compute Odd-Even route at each router for all multicast destinations Takes one RC cycle per destination Add a redundant RC unit to speed this process No extra chip area because of the simplicity Caveats Bottlenecks unicasts Slow when there is no congestion
9
VC as Cache: Scenario A shared cache block is requested by more than one node at a given time frame The owner sends a multicast of the block to all the requestors A request arrives after this multicast The owner resends the block after processing this request
10
Solution – Add the new requestor to the processed multicast midway!
Compare up to five multicast packets with an incoming request packet at the router If matched, Forward the request to the owner for coherence and book keeping with a time stamp of the previous message Add this requestor to the multicast destinations
11
Packet Concatenation A request is a single flit packet
When RC units are busy, we can club single flit packets to the same destination to form a “super- packet” This means it is going to take one RC cycle to compute multiple packet routes from there on.
12
Configuration for simulations
Simulators Multi2sim 4.0.1, Booksim 2.0, Orion 2.0 Real time simulation 64 Nodes with 32 cores + L1 nodes and 32 shared distributed L2 cache banks 1 Flit for request and coherence packets, 5 flits for cache block Benchmarks: SPLASH2 and PARSEC workloads with 32 threads All high injection workloads are picked after an initial study on their injection rates
13
IPC Results Abbreviations: C – Critical Word first V – VC as cache
D – Dynamic Multicast Tree P – Packet Concatenation
14
IPC Results Abbreviations: C – Critical Word first V – VC as cache
D – Dynamic Multicast Tree P – Packet Concatenation
15
Scaling to 512 Nodes: IPC Results
Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation
16
Fine Grained Energy Footprint of Barnes
Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation
17
Conclusion and future extensions
Scalable solution for multicasts Can fit with existing techniques Easy to implement Energy Efficient Packet Concatenation can be switched on selectively depending on the load requirements Other architecture level inputs can also be used for further performance. Example: #Instructions waiting, memory level parallelism
18
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.