Using Packet Information for Efficient Communication in NoCs

Slides:



Advertisements
Similar presentations
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.
Multicasting in Mobile Ad hoc Networks By XIE Jiawei.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Computer Networking A Top-Down Approach Chapter 4.7.
Internetworking II: MPLS, Security, and Traffic Engineering
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Router modeling using Ptolemy Xuanming Dong and Amit Mahajan May 15, 2002 EE290N.
Chapter 9 Classification And Forwarding. Outline.
WAN Technologies.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Router Architectures An overview of router architectures.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
CN2668 Routers and Switches Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
On-Chip Networks and Testing
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Networks-on-Chips (NoCs) Basics
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
CS 4396 Computer Networks Lab Router Architectures.
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
University of Michigan, Ann Arbor
ECE 544 Project3 Group 9 Brien Range Sidhika Varshney Sanhitha Rao Puskuru.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
Virtual-Channel Flow Control William J. Dally
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Network On Chip Cache Coherency Midterm presentation Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
22.1 Network Layer Delivery, Forwarding, and Routing.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
Ethernet Packet Filtering - Part1 Øyvind Holmeide Jean-Frédéric Gauvin 05/06/2014 by.
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Cellular IP: A New Approach to Internet Host Mobility
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Advanced Computer Networks
Architecture and Design of AlphaServer GS320
Lecture 23: Interconnection Networks
CS 268: Router Design Ion Stoica February 27, 2003.
ESE532: System-on-a-Chip Architecture
A New Coherence Method Using A Multicast Address Network
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Azeddien M. Sllame, Amani Hasan Abdelkader
Lecture 23: Router Design
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
On-Time Network On-chip
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Multicasting and Multicast Routing Protocols
Overlay Networking Overview.
Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Hybrid Programming with OpenMP and MPI
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
EE 122: Lecture 22 (Overlay Networks)
Lecture 25: Interconnection Networks
Presentation transcript:

Using Packet Information for Efficient Communication in NoCs Prasanna Venkatesh R, Madhu Mutyam PACE Lab IIT Madras 20-11-2018

Agenda Motivation Existing techniques to handle multicasts at NoC Dynamic Multicast Tree VC as Cache Packet Concatenation IPC Results Energy Analysis Conclusion 20-11-2018

Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the average is 7.5% only. 20-11-2018

Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the maximum communication exists for < 4% of the time. 20-11-2018

Multicasts: Solutions in the literature Separate injections flood the network with redundant copies Multicasts: Single copy till a common path and forks to multiple copies Simplifies routing logic Dynamic Multicast routing can make use of idle paths to avoid congestion. But is it possible to meet timing constraints? 20-11-2018

Our Proposals to achieve multicast efficiency Dynamic multicast tree construction using redundant route computation units Will penalize unicasts and create starvation? Three optimizations on unicasts to enhance dynamic multicasting VC as cache Packet Concatenation Critical word first 20-11-2018

Critical Word First Borrowed from Cache data transfer optimization technique Make efficient use of the flit level split of a packet containing cache block Send the requested word with the header flit 20-11-2018

Dynamic Multicast Tree Method Compute Odd-Even route at each router for all multicast destinations Takes one RC cycle per destination Add a redundant RC unit to speed this process No extra chip area because of the simplicity Caveats Bottlenecks unicasts Slow when there is no congestion 20-11-2018

VC as Cache: Scenario A shared cache block is requested by more than one node at a given time frame The owner sends a multicast of the block to all the requestors A request arrives after this multicast The owner resends the block after processing this request 20-11-2018

Solution – Add the new requestor to the processed multicast midway! Compare up to five multicast packets with an incoming request packet at the router If matched, Forward the request to the owner for coherence and book keeping with a time stamp of the previous message Add this requestor to the multicast destinations 20-11-2018

Packet Concatenation A request is a single flit packet When RC units are busy, we can club single flit packets to the same destination to form a “super- packet” This means it is going to take one RC cycle to compute multiple packet routes from there on. 20-11-2018

Configuration for simulations Simulators Multi2sim 4.0.1, Booksim 2.0, Orion 2.0 Real time simulation 64 Nodes with 32 cores + L1 nodes and 32 shared distributed L2 cache banks 1 Flit for request and coherence packets, 5 flits for cache block Benchmarks: SPLASH2 and PARSEC workloads with 32 threads All high injection workloads are picked after an initial study on their injection rates 20-11-2018

IPC Results Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018

IPC Results Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018

Scaling to 512 Nodes: IPC Results Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018

Fine Grained Energy Footprint of Barnes Abbreviations: C – Critical Word first V – VC as cache D – Dynamic Multicast Tree P – Packet Concatenation 20-11-2018

Conclusion and future extensions Scalable solution for multicasts Can fit with existing techniques Easy to implement Energy Efficient Packet Concatenation can be switched on selectively depending on the load requirements Other architecture level inputs can also be used for further performance. Example: #Instructions waiting, memory level parallelism 20-11-2018

Thank you 20-11-2018