Virtual-Channel Flow Control William J. Dally Presented by: Nick Kirchem March 5, 2004
Motivation Interconnection network is critical Performance sensitive to network latency & throughput Interconnect = large fraction of cost and power consumption Interconnect throughput is limited to a fraction of capacity due to coupled resource allocation Single buffers associated with physical channels Blocks entire physical channel True for circuit switching & wormhole routing
Solution: Virtual Channels Add “lanes” for each physical channel (lane = virtual channel)
VCs and Flow Control Background Virtual Channels decouple physical channels from buffer memory The most costly resources of interconnect’n network Associate multiple virtual channels with single physical channel Paper analyzes Flow Control Determines how resources are allocated, and How collisions over resources are resolved Most beneficial to flow control strategies that block
Virtual Channels Structure Each node contains set of buffers and a switch
Virtual Channels Structure Organize flit buffers into several lanes
Virtual Channels State Logic Status Register for Transmitting Node Lane-is-free bit Number of free flit buffers in lane (Optionally) Priority of packet in lane Status Register for Receiving Node Input & Output pointers for each lane buffer Channel state (free, waiting, active)
Virtual Channels State Logic
VC State Logic Storage Overhead Number of bits of storage required for l lanes, b flit buffers, and pri priority bits: Typical scenario (b=16, l=4, pri=0) requires: 36 bits of overhead with virtual channels 17 bits with no virtual channels Small compared to total storage of 512 bits
VC Operation Packet arrives at node Assigned output channel by routing algorithm Based on destination and output channel status Assigned to any free virtual channel (lane) Blocks if none are available Flit advanced by flow control Must gain access to a path through switch, and Access to the physical channel to input of next node Lane is deallocated when last flit leaves node
Allocation Policies Allocate physical channel bandwidth for lanes that: Have flit ready to transmit Have room for flit at receiving end Can use any arbitration algorithm Random, round-robin, priority Deadline scheduling (schedule by age)
VC Implementation Issues Integration design changes Replace FIFO buffers with multilane buffers Modify switch for larger # of inputs and outputs Flow control protocol modification Switch Complexity Added complexity to ACK when free buffer space opens up (identify lane = additional bits)
Virtual Channel Analysis Some assumptions: Packet destinations uniformly randomly distributed Arriving packet is consumed without waiting Single flit buffer for each lane Packet blocking probabilities are independent Lots of Math…
VC Analysis Results
Experimental Results Simulator (C Program) Various topologies and VC depth Throughput and Latency Analysis match predicted performance Better to have more lanes with less depth than vice versa Scheduling Algorithms show possibilities of performance given priorities or deadlines
Experimental Results
Conclusion and Questions Network throughput and latency improved by decoupling physical channels from buffers Is it worth the added complexity? Under which systems/network topologies would it be useful? Where would it not be so useful?