José Vicente Escamilla José Flich Pedro Javier García 1.

Slides:

Advertisements

Similar presentations

Interconnection Networks: Flow Control and Microarchitecture.

Advertisements

Prof. Natalie Enright Jerger

GCA: Global Congestion Awareness for Load Balance in Networks-on- Chip Mukund Ramakrishna, Paul V. Gratz & Alex Sprintson Department of Electrical and.

Presenter : Cheng_Ta Wu Masoumeh Ebrahimi, Masoud Daneshtalab, N P Sreejesh, Pasi Liljeberg, Hannu Tenhunen Department of Information Technology, University.

CSCI 4550/8556 Computer Networks Comer, Chapter 23: An Error Reporting Mechanism (ICMP)

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.

Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.

Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.

1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.

L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.

Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.

1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.

1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:

Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.

1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

Dragonfly Topology and Routing

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.

Switching, routing, and flow control in interconnection networks.

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:

Elastic-Buffer Flow-Control for On-Chip Networks

Networks-on-Chips (NoCs) Basics

Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.

SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,

QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.

Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.

Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario, Reetuparna.

Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.

CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.

Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.

Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Microprocessors and Microsystems Volume 35, Issue 2, March 2011, Pages 230–245 Special issue on Network-on-Chip Architectures and Design Methodologies.

Lecture 16: Router Design

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Efficient Microarchitecture for Network-on-Chip Routers

Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.

Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.

Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.

1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.

1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.

Virtual-Channel Flow Control William J. Dally

Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.

Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.

Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.

Network On Chip Cache Coherency Midterm presentation Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University

Lecture 23: Interconnection Networks

Azeddien M. Sllame, Amani Hasan Abdelkader

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Lecture 17: NoC Innovations

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Presentation transcript:

José Vicente Escamilla José Flich Pedro Javier García 1

 Introduction / Motivation  ICARO overview  ICARO description ◦ Detection ◦ Notification ◦ Isolation  Results  Conclusions  Questions 2

CMP MPSoC  CMP and MPSoCs use a network to interconnect nodes  Network performance degradation due to:  Power saving mechanisms (DVFS)  Bursty traffic patterns  Heterogeneous systems designs  Performance degradation may lead to congestion Tile-Gx (72 cores) 3

 ICARO does not remove congestion. ICARO isolates it.  Two types of traffic  Congested  Non-congested  Goal: To isolate congested traffic from non- congested one in order to avoid HoL-Blocking. 4

5  RCA, P. Gratz et al. ◦ Redirects traffic at each router based on congestion metrics. ◦ Metrics are piggybacked.  Vicious cycles may be created.  “Prediction-based Flow Control for Network-on-Chip Traffic”, U. Ogras et al. ◦ Injection control based on prediction-models. ◦ Prediction-model uses links status sent through a dedicated network.  Injection throttling may produce performance oscillations.  AVADA/FVADA, Yi Xu et al. ◦ Map different flows to different queues based on the output port requested in the next router (lookahead routing).  Require lookahead routing and credit-based flow control.  Congested flows and non-congested ones may share queues, generating HoL-blocking in some degree since the mapping policy only consider one hop of the message path.

Credits=2 Credits=0 6

 ICARO uses two types of Virtual Networks (VNs) ◦ Regular VN: Non-congested traffic ◦ Extra VN: Congested traffic  Three stages: ◦ Detection  Congestion is detected at routers. ◦ Notification  Routers notify to all Networks Interfaces (NIs). ◦ Isolation  NIs isolate congested traffic from not-congested one. 7

NI 0 SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9 SW10SW11 SW12SW13 SW14SW15 NI 1NI 2NI 3 NI 4 NI 5NI 6NI 7 NI 8 NI 9NI 10NI 11 NI 12 NI 13NI 14NI 15 Regular VN queue Extra VN queue 8

 It is performed at routers  Detects congestion points ({router, port} pairs)  When a message arrives/leaves ◦ Buffer saturation checking  If buffer.level > HIGH_THR such buffer is marked as saturated.  If buffer.level < LOW_THR such buffer is marked as NOT- saturated (hysteresis).  If any of the buffers of an input port is marked as saturated the whole input port is marked as well. ◦ Congestion checking  Requests from saturated input ports against each output port are computed  Each output port requested by more than 1 saturated input port is marked as congested 9

 Segmented ring connecting routers and NIs  Network width (wires)  Process: ◦ Notifications are injected to the register (when it is free). ◦ Notifications are delivered from a register to the next one at each cycle. ◦ Notifications are discarded when reach their origin register. N=Number of nodes p=Router radix 1 p (N)log 2  10

SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9SW10SW11 SW12SW13 SW14SW15 Register Notification 11 NI 7 CNN out CNN in Notification Injection Notification Reception in2 out in1 RegReg SW 7

12

 Notifications are stored in a cache memory.  Useless notifications are discarded ◦ Unreachable CPs ◦ Redundant notifications (merge) SWPort 5E 10S 13

SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9SW10SW11 SW12SW13 SW14SW15 NI 0 SWPort 10S -- NI 4 SWPort 5E 10S XY routing 14

SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9SW10SW11 SW12SW13 SW14SW15 XY routing NI 4 SWPort 5E 10S {SW10, Port S} notification is IGNORED {SW5, Port E} and {SW10, Port S} notifications are MERGED 15

 It is performed at NIs  Process: ◦ Initially all traffic is allocated into regular-VNs. ◦ At each cycle the post-processor module checks messages at the header of all regular-VNs in parallel. ◦ If the route crosses any of the CPs stored in the CPs cache memory the message is reallocated into extra-VNs. 16

Arbiter Post-processor CPs Cache SW Port 5E Regular-VN Extra-VN Network Interface 4 17 Regular-VN Router 4 Extra-VN in out2 out1 dst:12dst:15dst:6

18  Simulation: ◦ NoC simulator developed in our research group.  Compared against FVADA/AVADA with different number of virtual queues ◦ FVADA: Restricted to 4 VCs ◦ ICARO: Uses VNs instead of VCs  Overheads analysis: ◦ Tools used:  Synthesis: Design vision (Synopsys)  Place & Route: Encounter (Cadence)  Library: 45nm Nangate Open Cell (typical conditional) ParameterValue Topology8x8 2D mesh RoutingXY SwitchingWormhole (flit-level switching) Flow controlCredits Flit size128 bits Message size5 flits Traffic0.3 f/c (background) + 1 f/c (hotspot 4-to-1, from cycle 10k to 20k)

4VC/VN 2VC/VN 8VC/VN 19

20  Area overhead: ~6%.  Power overhead: varies from 6% to 10%.

21  Area overhead: varies from 3,8% to 6%  Power overhead: varies from 4,5% to 5,4%.

 Conclusions: ◦ A mechanism to avoid HoL-Blocking on networks- on-chip has been presented. ◦ ICARO manages to isolate harmful traffic from non- harmful one by using VNs achieving an overall latency improvement of up to 82%.  Future work: ◦ To analyze hierarchical CNN to improve scalability. ◦ To implement in-order delivery support 22

Questions? 23