Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Authors: Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford University Presenter: Han Liu University of California, San Diego
Background NoCs become huge – Hundreds of cores on a single die Currently using: Input-queued routers – Input buffer resources become significant Input buffer sharing is attractive in NoCs – Pros: Improves area and power efficiency – Cons: facilitates spread of congestion Han Liu204/29/13
Overview Adaptive Backpressure mitigates performance degradation by avoiding unproductive use of buffer space in the presence of congestion Avoid downsides of buffer sharing while maintaining benefits in benign case 04/29/13Han Liu3
Motivation Assumption: buffers are good – More flexible routing – Helps traffic waiting closer to the destination Is this always true? – Energy, area efficiency – Implementation difficulty 04/29/13Han Liu4
Train Example 04/29/13Han Liu5 San Diego (Source) Denver (buffer) Boston (Destination) Buffers are good
Motivation Static buffer vs Dynamic buffer management 04/29/13Han Liu6 Wasted buffer Static Dynamic VC1 VC2 VC1 VC2
Dynamic Buffer Management Buffer space is expensive resource in NoCs – 30-35% network power (MIT RAW, UT TRIPS) Dynamic management increases utilization by sharing buffer space among multiple VCs – Optimize use of expensive buffer resources – Decrease incremental cost of VCs Improved area and power efficiency 25% more throughput or 34% less power [Nicopoulos06] Han Liu704/29/13
Sharing Pros – Economic – Efficient Cons – Inconvenient – Trouble 04/29/13Han Liu8
Boarder Example 04/29/13Han Liu9 HWY5HWY805 Mexico US
Buffer Monopolization Blocked flits from congested VC accumulate in buffer Effective buffer size reduced for other VCs Performance degradation (latency / throughput) Congestion spreads across VCs (flows / apps / VMs / …) Han Liu1004/29/13 VC 0 VC 1
Adaptive Backpressure Goal: Avoid unproductive use of buffer space in dynamic buffer management But allow sharing when beneficial Approach: Match arrival and departure rate for each VC by regulating credit availability (backpressure) Derive quota from credit round trip times 04/29/13Han Liu11
Buffer Monopolization Han Liu1204/29/13 VC 0 VC 1 Want a way to regulate unlimited credits supply to congested VC1 – Give VC0 more credits and buffer space
Quota Motivation (1) Han Liu T crt,0 Without congestion, full throughput requires T crt,0 credits Router 0Router 1Router 0Router /29/13 Credit stall Insufficient credit supply causes idle cycle downstream Idle cycle time
Quota Motivation (2) Han Liu Congestion stall Credit stall Matching stalls avoids unproductive buffer occupancy Router 0Router 1Router 0Router 1 Excess drained 1404/29/13 Queuing stall T crt,0 +T stall Congestion stall Queuing stall Excess flits Congestion stall causes unproductive buffer occupancy Excess flits time
Quota Algorithm 04/29/13Han Liu15 VCs quota value = Throughput * RRT min -Throughput of upstream router is hard to measure -> Compute quota values based on observefd RTT for individual credits
Quota Heuristic Track credit RTT for each output VC RTT=RTT min set quota to RTT min – No downstream congestion Allow one flit in each cycle of RTT interval RTT>RTT min subtract difference from RTT min – Each congestion and queuing stall adds to RTT Allow one credit stall per downstream stall Han Liu1604/29/13
Quota Equation Q = max(T crt,base - (T crt,obs - T crt,base ), 1 ) = max(2 * T crt,base - T crt,obs, 1) – When T crt,obs is large, Q is small – Q min = 1 in order to guarantee that quota values can continue to be updated 04/29/13Han Liu17
Implementation Network design determines RTT min for each link Track RTT for single in-flight credit per VC Update quota value upon return Switch allocator masks all VCs that exceed quota Simple extension to existing flow control logic No additional signaling required < 5% overhead for 16x64b buffer with 4 VCs 04/29/13Han Liu18
Evaluation Methodology BookSim 2.0 8x8 2D mesh, 64-bit channels, DOR 16-slot input buffers, 4 VCs Combined VC and switch allocation Synthetic traffic and application benchmarks Compare ABP to unrestricted sharing 04/29/13Han Liu19
Network Stability (1) For adversarial traffic, throughput in Mesh is unstable at high load – Traffic merging causes starvation – Tree saturation causes widespread congestion ABP improves stability – Throttles sources that inject at very high rate – Efficient buffer use reduces tree saturation Faster recovery from transient congestion 04/29/13Han Liu20
Network Stability (2) Han Liu [tornado traffic] 6.3x 2104/29/13
Network Stability (3) Han Liu [foreground traffic at 50% injection rate] 3.3x -13% saturation rate 2204/29/13
Performance Isolation (1) Inject two classes of traffic into network – Shared buffer space, separate VCs Sharing causes interference between classes (leads to latency problem) ABP reduces interference – Contains effects of congestion within a class Better isolation between workloads, VMs, … 04/29/13Han Liu23
Performance Isolation (2) Han Liu [uniform random foreground traffic] [hotspot background traffic][uniform random background traffic] -33% -38% 2404/29/13
Performance Isolation (3) Han Liu [50% uniform random background traffic] -31% w/o background 2504/29/13
Application Performance Han Liu [12.5% injection rate for streaming traffic] -31% w/o background 2604/29/13
Conclusions Sharing improves buffer utilization, but can lead to undesired interference effects Adaptive Backpressure regulates credit flow to avoid unproductive use of shared buffer space Mitigates performance degradation in presence of adversarial traffic But maintains key benefits of buffer sharing under benign conditions Han Liu2704/29/13
THE END Thank you for your attention! Han Liu2804/29/13 Question?