Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Low Level Concepts

Similar presentations


Presentation on theme: "Basic Low Level Concepts"— Presentation transcript:

1 Basic Low Level Concepts

2 Course Outline Operation through multiple switches: Topologies & Routing Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock freedom Operation through a single switch: Router micro-architectures Buffering, arbitration, scheduling, datapath Case Studies Optimization: technology, congestion, reliability Operation of a single link: switching and flow control

3 Overview Main architectural issues for communication over a single link Message units Flow control (lossless links) Switching (next) Buffer management (later) Arbitration & Scheduling (later) System Goals: high levels of link utilization and minimal impact on end-to-end latency

4 Sources Chapters 1 & 2 Papers
“Interconnection Networks: An Engineering Approach”, J. Duato, S. Yalamanchili and L. Ni, Morgan Kaufmann (pubs.) Papers Virtual Channel Flow Control Optimistic Flow Control – Illinois Fast Messages

5 © T.M. Pinkston, J. Duato, with major contributions by J. Filch
Message Passing Communication Protocol Typical steps followed by the sender: System call by application Copies the data into OS and/or network interface memory Packetizes the message (if needed) Prepares headers and trailers of packets Checksum is computed and added to header/trailer Timer is started and the network interface sends the packets processor memory ni ni memory processor proc/mem bus IO bus or proc/mem bus IO bus or proc/mem bus proc/mem bus user space user space register file Interconnection network register file FIFO FIFO system space system space packet user writes data in memory system call sends 1 copy pipelined transfer e.g., DMA © T.M. Pinkston, J. Duato, with major contributions by J. Filch

6 © T.M. Pinkston, J. Duato, with major contributions by J. Filch
Message Passing Communication Protocol Typical steps followed by the receiver: NI allocates received packets into its memory or OS memory Checksum is computed and compared for each packet If checksum matches, NI sends back an ACK packet Once all packets are correctly received The message is reassembled and copied to user's address space The corresponding application is signalled (via polling or interrupt) processor memory ni ni memory processor proc/mem bus IO bus or proc/mem bus IO bus or proc/mem bus proc/mem bus user space user space register file Interconnection network register file FIFO FIFO system space system space packet pipelined reception e.g., DMA interrupt 2 copy data ready © T.M. Pinkston, J. Duato, with major contributions by J. Filch

7 Shared Memory L2 miss Message reception
Miss Status Handling Register (MSHR) allocation, address mapping, packet construction Message injection, flow control set-up, rate control Message reception Packet ejection, update message status (return), and control processing (end-to-end flow control) Packet servicing, message injection

8 Shared Memory L2 miss Message reception
Miss Status Handling Register (MSHR) allocation, address mapping, packet construction Message injection, flow control set-up, rate control Message reception Packet ejection, update message status (return), and control processing (end-to-end flow control) Packet servicing, message injection thewere42.worldpress.com

9 The Network Model Metrics (for now): latency and bandwidth
Routing, switching, flow control, error control message path

10 Basic Switch Microarchitecture
Link Traversal Switch Traversal Route Computation Physical channel Physical channel CrossBar Link Control Control Link DEMUX MUX DEMUX MUX ... ... Route Computation Physical channel Physical channel Control Link Link Control DEMUX MUX DEMUX MUX ... ... D hops L bit message W bit wide channels Switch & VC Allocation Route Computation

11 Off-Chip vs. On-Chip On-Chip Off-Chip Wide links Shallow pipelines
Not pin limited Low flow control latency Smaller buffers Narrow links Deeper pipelines Pin limited Larger flow control latency Deeper buffers

12 The Hardware Message Stack
Routing Layer Where?: Destination decisions, i.e., which output port Largely responsible for deadlock and livelock properties Switching Layer When?: When is data forwarded Largely responsible for latency, bandwidth and energy properties Physical Layer How?: synchronization of data transfer Switching is tightly coupled with flow control & buffer management Relative timing is key to performance

13 Messaging Units Data/Message Packets Flits: flow control digits head flit tail flit Phits: physical flow control digits type Dest Info Seq # misc Data is transmitted based on a hierarchical data structuring mechanism Messages  packets  flits  phits While flits and phits are fixed size, packets and data may be variable sized

14 Link Level Flow Control

15 Flow Control A synchronization protocol for the lossless transmission of bits Determines how network resources are allocated Buffers Determines how conflicts are resolved How (e.g., priorities) and when resources are assigned

16 For Synchronized Transfers
Acknowledge Receipt Unit of synchronized communication Smallest unit whose transfer is requested by the sender and acknowledged by the receiver No restriction on the relative timing of control vs. data transfers Is a form of backpressure

17 For Buffer Management Flow control occurs at two levels
Buffer availability information Flow control occurs at two levels Level of buffer management (flits/packets) Level of physical transfers (phits) Relationship between flits and phits is machine & technology specific What if there are no buffers? Bufferless switching/flow control (later)

18 Physical Channel Flow Control
Asynchronous Flow Control What is the limiting factor on link throughput? Synchronous Flow Control How is buffer space availability indicated?

19 Flow Control Mechanisms
Credit Based flow control On/off flow control Optimistic/Reliable Flow control Virtual Channel Flow Control

20 Credit-Based Flow Control
Basic Network Structure and Functions Credit-based flow control Sender sends packets whenever credit counter is not zero sender receiver Credit counter 3 2 4 1 7 10 5 9 6 8 X Queue is not serviced pipelined transfer © T.M. Pinkston, J. Duato, with major contributions by J. Filch

21 Credit-Based Flow Control
Basic Network Structure and Functions Credit-based flow control Sender resumes injection Receiver sends credits after they become available sender receiver Credit counter 5 3 1 2 4 3 8 9 10 2 7 6 5 4 X Queue is not serviced +5 pipelined transfer © T.M. Pinkston, J. Duato, with major contributions by J. Filch

22 Timeline* Node 1 Node 2 credit process credit Round trip credit time equivalently expressed in number of flow control buffer units - trt flit process credit credit credit *From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004

23 Performance of Credit Based Schemes
The control bandwidth can be reduced by submitting block credits Buffers must be sized to maximize link utilization Large enough to host packets in transit #flit buffers link bandwidth flit size *From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004

24 a packet is injected if control bit is in Xon
On/Off Flow Control Basic Network Structure and Functions Xon/Xoff flow control Xon Xoff a packet is injected if control bit is in Xon sender receiver Control bit Xoff Xon pipelined transfer © T.M. Pinkston, J. Duato, with major contributions by J. Filch

25 When Xoff threshold is reached, an Xoff notification is sent
On-Off Flow Control Basic Network Structure and Functions Xon/Xoff flow control When Xoff threshold is reached, an Xoff notification is sent When in Xoff, sender cannot inject packets Xon Xoff sender receiver Control bit Xoff X Queue is not serviced Xon pipelined transfer © T.M. Pinkston, J. Duato, with major contributions by J. Filch

26 When Xon threshold is reached, an Xon notification is sent
On-Off Flow Control Basic Network Structure and Functions Xon/Xoff flow control When Xon threshold is reached, an Xon notification is sent Xon Xoff sender receiver Control bit Xoff X Queue is not serviced Xon pipelined transfer © T.M. Pinkston, J. Duato, with major contributions by J. Filch

27 Timeline* Node 1 Node 2 Hit the high water mark (stop)
off flit Go Stop flit process FC flit flit flit flit on flit flit Hit the low water mark (go) *From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004

28 Performance of On-Off Schemes
Buffer sizing and position of Stop and Go watermarks To operate at full speed buffer size must be at least 2F Go Stop *From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004

29 Comparison of Flow Control Schemes
Basic Network Structure and Functions Comparison of Xon/Xoff vs credit-based flow control Go Stop Go Stop Stop Go Go Stop Stop signal returned by receiver Sender stops transmission Last packet reaches receiver buffer Packets in buffer get processed Go signal returned to sender Sender resumes transmission First packet reaches buffer Stop & Go Time Flow control latency observed by receiver buffer Sender uses last credit Last packet reaches receiver buffer Packets get processed and credits returned Sender transmits packets First packet reaches buffer # credits returned to sender Credit based Time © T.M. Pinkston, J. Duato, with major contributions by J. Filch

30 Comparing Credit-Based & On/Off Flow Control
Both schemes can fully utilize buffers Restart latency is lower for credit-based schemes and therefore Credit-based flow control has higher average buffer occupancy at high loads Credit-based flow control leads to higher throughput at high loads Smaller inter-packet gap

31 Comparing Credit-Based & On/Off Flow Control (cont.)
Control traffic is higher for credit schemes Block credits can be used to tune link behavior Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance) Not true for On/Off without dropping packets Credit schemes have higher information content  useful for QoS schemes On-off schemes better suited for many to one relationships

32 Optimistic Flow Control
Reject Queue DATA DATA DATA Sending Endpoint Receiving Endpoint Net bus bus ACK/NACK Network Interface Network Interface Optimistically send messages Allocate space for returned messages Deallocate on reception of Ack Retransmit on reception of Nack Buffer sizes are proportional to the number of packets rather than the number of senders

33 Reliable Flow Control Transmit packets when available
DATA DATA DATA Sending Endpoint Receiving Endpoint Net bus bus ACK/NACK Network Interface Network Interface Transmit packets when available De-allocate when reception is acknowledged Re-transmit if packet is dropped (and negative ACK is received) Derived from traditional telecom networks Employed over long and error prone links Extended to operate over the network  end-to-end

34 Reliable Flow Control Packets are tagged with sequence numbers
DATA DATA DATA Sending Endpoint Receiving Endpoint Net bus bus ACK Network Interface Network Interface 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 last tx’d packet last ack’d packet last rcv’d packet Retransmission interval Packets are tagged with sequence numbers Need to recycle sequence numbers Receiver acknowledges (Ack) received packets Detect out of sequence reception Time-outs to detect lost packets

35 Reliable Flow Control DATA DATA DATA Sending Endpoint Receiving Endpoint bus Net bus ACK Data structures to hold transmitted packets and the order in which they were transmitted Utilize the send buffers Go-back-N strategy Maintain a separate data structure Maintain original order Minimize redundant transmissions Block acknowledgements to minimize flow control bandwidth used

36 Buffering Used for long and error prone links
Also known as Ack/Nack flow control #flit buffers link bandwidth flit size

37 Optimistic/Reliable Flow Control
Optimism Inefficient/increased buffer usage Messages held at source Re-ordering may be required due to out of order reception Reliable Must deal with out-of-order reception Need sophisticated buffer management schemes for multi-source control Generally give way to credit-schemes or stop-and-go schemes Small buffers  credit-based Large buffers  stop-and-go

38 Virtual Channel Flow Control
B A B A Channels and buffers are dynamically allocated network resources Physical channels are idle when messages block

39 Unidirectional Physical
Virtual Channels status credits vc state Output Per VC state Input buffers Output buffers Unidirectional Physical Channel Control Link Control Link DEMUX MUX DEMUX MUX ... ... Each virtual channel is a unidirectional channel Independently managed buffers multiplexed over the physical channel Each channel is independently flow controlled Improves performance through reduction of blocking delay Important in realizing deadlock freedom (later)

40 Virtual Channel Flow Control
packet flit type VC As the number of virtual channels increase, the increased channel multiplexing has multiple effects (more later) Overall performance Router complexity and critical path Flits/phits must now record VC information Or send VC information out of band

41 Intel Single Chip Cloud Computer (SCC)
Tile R V1 Tile R Tile R Tile R Tile R Tile R Memory Controller Memory Controller Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R V2 Tile R Tile R Memory Controller Memory Controller Tile R Tile R Tile R Tile R Tile R Tile R 24 dual core tiles 8 voltage and 28 frequency islands X-Y routed mesh: 144 bit physical channels

42 Intel SCC Message Format
Flit Types: Null Credit Body/tail control J. Howard et. Al, “A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling.” IEEE Journal of Solid-State Circuits, vol. 46, no. 1, January 2011.

43 Flow Control: Global View
Flow control parameters are tuned based on link length, link width and processing overhead at the end-points Effective FC and buffer management is necessary for high link utilizations  network throughput In-band vs. out of band flow control Links maybe non-uniform, e.g., lengths/widths on chips Buffer sizing for long links

44 Flow Control: Global View
Latency: overlapping FC, buffer management and switching  impacts end-to-end latency In-band vs. out-of band flow control Use link bandwidth vs. additional side-band signals

45 Commercial Examples AMD HyperTransport – credit based
Intel QuickPath – credit based Infiniband – credit based Ethernet – On/Off Myrinet – Stop-and-Go PCI Express – credit based IBM Blue Gene – token flow control Cray T3E – credit based

46 Some Research Questions
Reliable Flow Control PVT effects for high speed links Encoding schemes, e.g., for power efficiency Adaptive flow control Buffer and congestion management Quality of Service (QoS) End-to-End Flow control for multicast Multisource flow control (networks) Low power designs Error rate vs. voltage scaling Link and buffer widths and depths On-off schemes

47 Summary Flow control, buffer management and switching are closely related and generally co-designed Closest to the physical layer and directly impact utilization and latency Object of significant tuning How are these schemes impacted by and integrated with switch designs?


Download ppt "Basic Low Level Concepts"

Similar presentations


Ads by Google