Lecture 17: NoC Innovations

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
1 Flow Identification Assume you want to guarantee some type of quality of service (minimum bandwidth, maximum end-to-end delay) to a user Before you do.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Lecture # 03 Switching Course Instructor: Engr. Sana Ziafat.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Lecture 16: Router Design
Ch 8. Switching. Switch  Devices that interconnected with each other  Connecting all nodes (like mesh network) is not cost-effective  Some topology.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
Connection-orientated Service. Connection in data networks Another type of network, such as the telephone network, was developed based on the connection.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 Muhammad Waseem Iqbal Lecture # 20 Data Communication.
Congestion Control in Data Networks and Internets
The network-on-chip protocol
Chapter 8 Switching Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Dynamic connection system
Lecture 23: Interconnection Networks
Datacenter Interconnection Network Design
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Static and Dynamic Networks
Lecture 14: Interconnection Networks
Lecture: Transactional Memory, Networks
Lecture: Interconnection Networks
PRESENTATION COMPUTER NETWORKS
Switching Techniques.
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
A Case for Interconnect-Aware Architectures
Lecture 25: Interconnection Networks
Multiprocessors and Multi-computers
Presentation transcript:

Lecture 17: NoC Innovations Today: power and performance innovations for NoCs

Network Power Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton Energy for a flit = ER . H + Ewire . D = (Ebuf + Exbar + Earb) . H + Ewire . D ER = router energy H = number of hops Ewire = wire transmission energy D = physical Manhattan distance Ebuf = router buffer energy Exbar = router crossbar energy Earb = router arbiter energy This paper assumes that Ewire . D is ideal network energy (assuming no change to the application and how it is mapped to physical nodes)

Segmented Crossbar By segmenting the row and column lines, parts of these lines need not switch  less switching capacitance (especially if your output and input ports are close to the bottom-left in the figure above) Need a few additional control signals to activate the tri-state buffers Overall crossbar power savings: ~15-30%

Cut-Through Crossbar Attempts to optimize the common case: in dimension-order routing, flits make up to one turn and usually travel straight 2/3rd the number of tristate buffers and 1/2 the number of data wires “Straight” traffic does not go thru tristate buffers Some combinations of turns are not allowed: such as E  N and N  W (note that such a combination cannot happen with dimension-order routing) Crossbar energy savings of 39-52%

Write-Through Input Buffer Input flits must be buffered in case there is a conflict in a later pipeline stage If the queue is empty, the input flit can move straight to the next stage: helps avoid the buffer read To reduce the datapaths, the write bitlines can serve as the bypass path Power savings are a function of rd/wr energy ratios and probability of finding an empty queue

Express Channels Express channels connect non-adjacent nodes – flits traveling a long distance can use express channels for most of the way and navigate on local channels near the source/destination (like taking the freeway) Helps reduce the number of hops The router in each express node is much bigger now

Express Channels Routing: in a ring, there are 5 possible routes and the best is chosen; in a torus, there are 17 possible routes A large express interval results in fewer savings because fewer messages exercise the express channels

Express Virtual Channels To a large extent, maintain the same physical structure as a conventional network (changes to be explained shortly) Some virtual channels are treated differently: they go through a different router pipeline and can effectively avoid most router overheads

Router Pipelines If Normal VC (NVC): at every router, must compete for the next VC and for the switch will get buffered in case there is a conflict for VA/SA If EVC (at intermediate bypass router): need not compete for VC (an EVC is a VC reserved across multiple routers) similarly, the EVC is also guaranteed the switch (only 1 EVC can compete for an output physical channel) since VA/SA are guaranteed to succeed, no need for buffering simple router pipeline: incoming flit directly moves to ST stage If EVC (at EVC source/sink router): must compete for VC/SA as in a conventional pipeline before moving on, must confirm free buffer at next EVC router

Title Use buses to reduce routers Use bloom filters to stifle broadcasts Use page coloring to minimize travel beyond a bus

MC Placement Abts et al., ISCA 2009 Bullet Tilera

MC Placement Diamond placement has the least link contention as it distributes traffic XY routing for requests and YX routing for replies is also effective in distributing traffic

NoX Router Hayenga and Lipasti, MICRO 2011 In speculative routers, two packets may be sent out at at the same time on the link; receiver gets garbage Instead, send the XOR of the colliding packets In the next cycles, send the XOR of colliding packets, minus one of the colliding packets Receiver decodes all of the packets with XORs Saves one wasted cycle on every collision or mis-speculation

Title Bullet