Physical constraints (1/2)

Slides:



Advertisements
Similar presentations
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Advertisements

What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Storage area network and System area network (SAN)
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.
On-Chip Networks and Testing
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Blue Gene/L Torus Interconnection Network N. R. Adiga, et.al IBM Journal.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
Lecture 16: Router Design
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Topologies.
The network-on-chip protocol
Lecture 23: Interconnection Networks
Switching and High-Speed Networks
ESE532: System-on-a-Chip Architecture
Interconnection Networks: Flow Control
Prof John D. Kubiatowicz
Lecture 23: Router Design
Lecture 16: On-Chip Networks
Static and Dynamic Networks
Interconnection Network Routing, Topology Design Trade-offs
Mechanics of Flow Control
Introduction to Scalable Interconnection Network Design
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
Interconnection Network Design Lecture 14
Introduction to Scalable Interconnection Networks
Storage area network and System area network (SAN)
Lecture: Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Interconnection Networks Contd.
Embedded Computer Architecture 5SAI0 Interconnection Networks
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CS 6290 Many-core & Interconnect
CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)
Networks: Routing and Design
Lecture 25: Interconnection Networks
Multiprocessors and Multi-computers
Presentation transcript:

Physical constraints (1/2) One of the physical constraints facing the implementation of interconnection network is the available wiring area. The minimum number of wires that must be cut when the network is divided into two equal sets of nodes, is referred as bisection width. Even the failure of a single link can destroy the deadlock freedom properties. A second constraint is the number of I/Os available per router, that is referred as node size. 分散処理論2 (No9)

Physical constraints (2/2) Network Bisection width Node size k-ary n-cude 2Wkn-1 2Wn Binary n-cube nW/2 nW n-D mesh Wkn-1 Omega net. NW 2tW channel of width W bits, t×t switches in Omega with N nodes 分散処理論2 (No9)

Bisection width 分散処理論2 (No9)

Hardware cost and speed model Chien proposed cost and speed model for wormhole routers to compare their complexity and performance[1993]. He showed gate counts of each router component and its delay based on a canonical router model. Intrarouter delay can be parametrized by the number of ports on a crossbar switch and routing freedom with technology dependent constants. 分散処理論2 (No9)

Canonical Router model LC LC Output channels Input channels switch LC LC Injection channel Ejection channel LC LC Routing and arbitration LC: Link Controller 分散処理論2 (No9)

Buffer partitioning LC LC switch LC LC LC LC Output channels Input VC LC LC Output channels Input channels switch LC VC LC Injection channel Ejection channel LC LC Routing and arbitration LC: Link Controller, VC: virtual channel controller 分散処理論2 (No9)

Alternative partitioning switch mux mux VC LC LC Output channels Input channels mux mux LC VC LC Injection channel Ejection channel LC LC Routing and arbitration LC: Link Controller, VC: virtual channel controller 分散処理論2 (No9)

Circular buffer d c b a e h g f head tail tail head 分散処理論2 (No9)

A parametrization Module parameter Gate count Delay Crossbar P(#ports) O(P2) c0+c1×log P Flow control - O(1) c2 Address decoder c3 Routing decision F(freedom) O(F2) c4+c5 ×log F Header selection O(log F) c6+c7 ×log F VC V(#VCs) O(V) c8+c9 ×log V 分散処理論2 (No9)

Pipelined routing RC VA SA ST SA ST SA ST SA ST Cycle Head flit Body flit 1 Body flit 2 Tail flit 1 2 3 4 5 6 7 RC VA SA ST SA ST SA ST SA ST RC: Routing computation, VA: Virtual channel allocation, SA: Switch allocation, ST: Switch traversal 分散処理論2 (No9)

Pipeline stalls Pipeline stalls occur if a given pipeline stage cannot be completed in the current cycles. Stalls may occur at any stage. No available output ports (virtual channels) Input buffer is empty, etc. Latency of an interconnection network is directly related to pipeline depth. 分散処理論2 (No9)

Virtual-channel allocation stall Cycle Head flit 1 tail flit 2 Body flit 1 Tail flit 1 1 2 3 4 5 6 7 8 9 RC VA SA ST SA ST SA ST SA ST Header flit 1 is not able to allocate virtual channel until cycle 5, and reallocation is taken. 分散処理論2 (No9)

Physical channel (1/2) Half-duplex channels have the disadvantage that both sides of the link must arbitrate for the use of the link. Unidirectional channels doubles the average distance traveled by a message in tori. In unidirectional full-duplex channels, channel widths are reduced, i.e. approximately halves as bandwidth is statically allocated in each direction. 分散処理論2 (No9)

Physical channel (2/2) Lower dimensionality, and communication traffic below network saturation would tend to favor the use of half-duplex channels. Cost considerations would encourage the use of low cost packaging which would also favor half-duplex channels and the use of command/data encodings to reduce the overall pin count. For large systems with a higher number of dimensions (wire delays would dominate), the use of higher clock speeds, communication intensive applications and the use of pipelined links would favor the use of full-duplex channels. 分散処理論2 (No9)

A bidirectional half-duplex channel data R1 R1 control Fair arbitration requires status information (ownership) to be transmitted across the channel to indicate the availability of data to be transmitted. 分散処理論2 (No9)

A unidirectional channel data control R1 R1 Arbitration overhead is avoided, and therefore links can generally be run faster. 分散処理論2 (No9)

A bidirectional full-duplex channel data control R1 R1 data control When the data are being transmitted in only one direction across the channel, 50% of the pin bandwidth is unused. 分散処理論2 (No9)

A demand-driven mutual exclusion ring ready VC ARB VC ARB ack VC ARB Link control Link control VC ARB A signal representing the privilege to drive the physical channel circulates around the mutual ring. 分散処理論2 (No9)

Available buffer space Assume that propagation delay is specified as P ns per unit length and the length of the wire in the channel is L units. When the receiver requests the sender to stop transmitting, there may be LPB phits in transit on a channel running at B Gphits/s. An additional LPB phits may be placed in the channel to propagate the flow control signal to the sender. If the flow control operation latency is F ns, the sender place another FB phits on the channel. Thus, available buffer space = 2LPB + FB (phits) 分散処理論2 (No9)

Simultaneous bidirectional signaling It allows simultaneous signaling between two routers across a single signal line (full-duplex bidirectional communication). It transmits a logic 1(0) as a positive (negative) current. The received signal is the superposition of the two signals transmitted from both sides of the channel. Each transmitter generates a reference signal which is subtracted from the superimposed signal to generate the received signal. 分散処理論2 (No9)

Separate crossbar X+ X+ crossbar From node To Yin X- X- Y+ Y+ crossbar AD FC X+ crossbar From node AD FC To Yin X- X- AD FC Y+ AD FC Y+ crossbar Yin AD FC To node Y- Y- AD FC 分散処理論2 (No9)

Planar-adaptive router vc L3 crossbar vc L3 mux L3 L1 L2 vc vc L4 L4 L1 vc crossbar mux L4 L2 vc 分散処理論2 (No9)

Cray T3D router Y+ Z+ X+ NI xbar xbar NI xbar X- Y- Z- 分散処理論2 (No9)

Message processing datapath on T3D Data translation buffer processor addressing Messaging support Local memory Msg queue control Addressing and routing tag lookup Network interface Input buf. Output buf. 分散処理論2 (No9)

Intel Cavallino router VC0 VC1 VC2 VC3 VC0 VC1 VC2 VC3 X+ X- VC0 VC1 VC2 VC3 VC0 VC1 VC2 VC3 crossbar Y+ Y- VC0 VC1 VC2 VC3 VC0 VC1 VC2 VC3 Z+ Z- 分散処理論2 (No9)

SGI SPIDER chip I/F I/F I/F Link I/F crossbar VCs Msg cntl I/F I/F I/F It is the first router to compute the route one step ahead. 分散処理論2 (No9)

Routing control for PCS Input VC Routing header Channel mappings decode History store Decision unit Inc/Dec banks Modified header Output VC 分散処理論2 (No9)

Pipelined circuit switching (PCS) The implementation of routing protocols based on PCS is more complex than wormhole-switched routers to support backtracking. Backtracking must use history information. When a routing header is backtracking, its history mask must be retrieved from the history store. 分散処理論2 (No9)

Buffered wormhole (IBM SP family) FC FC serializer FIFO deserializer FIFO 8 8 Central queue 64 64 routing Bypass Crossbar 8 8 Packets are buffered in the central queue when they fail wining access to the bypass crossbar (output port). 分散処理論2 (No9)