1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
CS 7810 Lecture 17 Managing Wire Delay in Large CMP Caches B. Beckmann and D. Wood Proceedings of MICRO-37 December 2004.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
On-Chip Networks and Testing
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Lecture 16: Router Design
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 CH. 8: SWITCHING & DATAGRAM NETWORKS 7.1.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
University of Utah 1 Interconnect Design Considerations for Large NUCA Caches Naveen Muralimanohar Rajeev Balasubramonian.
The network-on-chip protocol
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Interconnection Networks: Flow Control
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Lecture 16: On-Chip Networks
Lecture 17: NoC Innovations
Lecture 13: Large Cache Design I
Lecture 14: Interconnection Networks
Lecture: Interconnection Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
A Case for Interconnect-Aware Architectures
Lecture 25: Interconnection Networks
Presentation transcript:

1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design Considerations for Large NUCA Caches, ISCA’07, Utah Reminders: Take-home final (20%): available this weekend, due May 10 th Project (50%): due May 10 th ; peer-review due May 12 th Project reports: conference paper style, at least 4 pages double-column; abstract, intro, background, proposal, methodology, results, related work, conclusions

2 Router Pipeline Four typical stages:  RC routing computation: compute the output channel  VA virtual-channel allocation: allocate VC for the head flit  SA switch allocation: compete for output physical channel  ST switch traversal: transfer data on output physical channel RCVASAST -- SAST -- SAST -- SAST Cycle Head flit Body flit 1 Body flit 2 Tail flit

3 Express Physical Channels Express channels connect non-adjacent nodes – flits traveling a long distance can use express channels for most of the way and navigate on local channels near the source/destination (like taking the freeway) Helps reduce the number of hops The router in each express node is much bigger now

4 Express Virtual Channels To a large extent, maintain the same physical structure as a conventional network (changes to be explained shortly) Some virtual channels are treated differently: they go through a different router pipeline and can effectively avoid most router overheads

5 Router Pipelines If Normal VC (NVC):  at every router, must compete for the next VC and for the switch  will get buffered in case there is a conflict for VA/SA If EVC (at intermediate bypass router):  need not compete for VC (an EVC is a VC reserved across multiple routers)  similarly, the EVC is also guaranteed the switch (only 1 EVC can compete for an output physical channel)  since VA/SA are guaranteed to succeed, no need for buffering  simple router pipeline: incoming flit directly moves to ST stage If EVC (at EVC source/sink router):  must compete for VC/SA as in a conventional pipeline  before moving on, must confirm free buffer at next EVC router

6 Bypass Router Pipelines Non aggressive pipeline in a bypass node: an express flit simply goes through the crossbar and then on the link; the prior SA stage must know that an express flit is arriving so that the switch control signals can be appropriately set up; this requires the flit to be preceded by a single-bit control signal (similar to cct-switching, but much cheaper) Aggressive pipeline: the express flit avoids the switch and heads straight to the output channel (dedicated hardware)… will still need a mechanism to control ST for other flits

7 Dynamic EVCs Any node can be an EVC source/sink The EVC can have length 2 to l max

8 VC Allocation All the VCs at a router are now partitioned into l max bins More buffers for short-hop EVCs Flow control credits have to propagate l max nodes upstream Can also dynamically allocate buffers to EVCs (although one buffer must be reserved per EVC to avoid deadlock) EVCs can potentially starve NVCs at bypass nodes: if a bypass node is starved for n cycles, it sends a token upstream to prevent EVC transmission for the next p cycles

9 Ideal Network Fully-connected: every node has a dedicated link to every other node Bisection bandwidth: For a 7x7 network, L edge will be 69mm and chip area will be 4760mm 2 (for a single metal layer) An ideal network will provide the least latency, least power, and highest throughput, but will have an inordinate overhead, as specified above

10 Approaching the Ideal Network The interconnection network models employ different forms of speculative router pipelines

11 Results Roughly 40% of all nodes are bypassed

12 Non-Uniform Cache Access From Beckmann et al. (MICRO’04) and Huh et al. (ICS’05)

13 Improving NUCA Methodologies Design space exploration: iterate over bank counts, organizations, and destinations to compute the optimal cache structure

14 Forms of Heterogeneity Different networks for data and address – the latter has lower bandwidth demands and can employ faster wires on higher metal layers Parts of the address are more critical: the index bits are transmitted on low-latency links so that cache access can begin early – the rest of the address arrives in time for tag comparison

15 Hybrid Network Topology

16 Title Bullet