Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Authors: Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally.

Slides:



Advertisements
Similar presentations
George Nychis✝, Chris Fallin✝, Thomas Moscibroda★, Onur Mutlu✝
Advertisements

EE384y: Packet Switch Architectures
Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.
1 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. On the Capacity of Wireless CSMA/CA Multihop Networks Rafael Laufer and Leonard Kleinrock Bell.
1 On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects George Nychis ✝, Chris Fallin ✝, Thomas Moscibroda.
Misbah Mubarak, Christopher D. Carothers
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
Predictive Load Balancing Reconfigurable Computing Group.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Issues in System-Level Direct Networks Jason D. Bakos.
Lecture 5: Congestion Control l Challenge: how do we efficiently share network resources among billions of hosts? n Last time: TCP n This time: Alternative.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
1 IEEE Meeting July 19, 2006 Raj Jain Modeling of BCN V2.0 Jinjing Jiang and Raj Jain Washington University in Saint Louis Saint Louis, MO
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Efficient Microarchitecture for Network-on-Chip Routers
Virtual-Channel Flow Control William J. Dally
HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
How to Train your Dragonfly
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
The network-on-chip protocol
ESE532: System-on-a-Chip Architecture
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Rachata Ausavarungnirun, Kevin Chang
Exploring Concentration and Channel Slicing in On-chip Network Router
Interconnection Networks: Flow Control
Lecture 23: Router Design
Rahul Boyapati. , Jiayi Huang
EE382C Lecture 15 Quality of Service 5/19/11
Using Packet Information for Efficient Communication in NoCs
Virtual-Channel Flow Control
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
CEG 4131 Computer Architecture III Miodrag Bolic
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Lecture 6, Computer Networks (198:552)
Presentation transcript:

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Authors: Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford University Presenter: Han Liu University of California, San Diego

Background NoCs become huge – Hundreds of cores on a single die Currently using: Input-queued routers – Input buffer resources become significant Input buffer sharing is attractive in NoCs – Pros: Improves area and power efficiency – Cons: facilitates spread of congestion Han Liu204/29/13

Overview Adaptive Backpressure mitigates performance degradation by avoiding unproductive use of buffer space in the presence of congestion Avoid downsides of buffer sharing while maintaining benefits in benign case 04/29/13Han Liu3

Motivation Assumption: buffers are good – More flexible routing – Helps traffic waiting closer to the destination Is this always true? – Energy, area efficiency – Implementation difficulty 04/29/13Han Liu4

Train Example 04/29/13Han Liu5 San Diego (Source) Denver (buffer) Boston (Destination) Buffers are good

Motivation Static buffer vs Dynamic buffer management 04/29/13Han Liu6 Wasted buffer Static Dynamic VC1 VC2 VC1 VC2

Dynamic Buffer Management Buffer space is expensive resource in NoCs – 30-35% network power (MIT RAW, UT TRIPS) Dynamic management increases utilization by sharing buffer space among multiple VCs – Optimize use of expensive buffer resources – Decrease incremental cost of VCs Improved area and power efficiency 25% more throughput or 34% less power [Nicopoulos06] Han Liu704/29/13

Sharing Pros – Economic – Efficient Cons – Inconvenient – Trouble 04/29/13Han Liu8

Boarder Example 04/29/13Han Liu9 HWY5HWY805 Mexico US

Buffer Monopolization Blocked flits from congested VC accumulate in buffer Effective buffer size reduced for other VCs Performance degradation (latency / throughput) Congestion spreads across VCs (flows / apps / VMs / …) Han Liu1004/29/13 VC 0 VC 1

Adaptive Backpressure Goal: Avoid unproductive use of buffer space in dynamic buffer management But allow sharing when beneficial Approach: Match arrival and departure rate for each VC by regulating credit availability (backpressure) Derive quota from credit round trip times 04/29/13Han Liu11

Buffer Monopolization Han Liu1204/29/13 VC 0 VC 1 Want a way to regulate unlimited credits supply to congested VC1 – Give VC0 more credits and buffer space

Quota Motivation (1) Han Liu T crt,0 Without congestion, full throughput requires T crt,0 credits Router 0Router 1Router 0Router /29/13 Credit stall Insufficient credit supply causes idle cycle downstream Idle cycle time

Quota Motivation (2) Han Liu Congestion stall Credit stall Matching stalls avoids unproductive buffer occupancy Router 0Router 1Router 0Router 1 Excess drained 1404/29/13 Queuing stall T crt,0 +T stall Congestion stall Queuing stall Excess flits Congestion stall causes unproductive buffer occupancy Excess flits time

Quota Algorithm 04/29/13Han Liu15 VCs quota value = Throughput * RRT min -Throughput of upstream router is hard to measure -> Compute quota values based on observefd RTT for individual credits

Quota Heuristic Track credit RTT for each output VC RTT=RTT min set quota to RTT min – No downstream congestion Allow one flit in each cycle of RTT interval RTT>RTT min subtract difference from RTT min – Each congestion and queuing stall adds to RTT Allow one credit stall per downstream stall Han Liu1604/29/13

Quota Equation Q = max(T crt,base - (T crt,obs - T crt,base ), 1 ) = max(2 * T crt,base - T crt,obs, 1) – When T crt,obs is large, Q is small – Q min = 1 in order to guarantee that quota values can continue to be updated 04/29/13Han Liu17

Implementation Network design determines RTT min for each link Track RTT for single in-flight credit per VC Update quota value upon return Switch allocator masks all VCs that exceed quota Simple extension to existing flow control logic No additional signaling required < 5% overhead for 16x64b buffer with 4 VCs 04/29/13Han Liu18

Evaluation Methodology BookSim 2.0 8x8 2D mesh, 64-bit channels, DOR 16-slot input buffers, 4 VCs Combined VC and switch allocation Synthetic traffic and application benchmarks Compare ABP to unrestricted sharing 04/29/13Han Liu19

Network Stability (1) For adversarial traffic, throughput in Mesh is unstable at high load – Traffic merging causes starvation – Tree saturation causes widespread congestion ABP improves stability – Throttles sources that inject at very high rate – Efficient buffer use reduces tree saturation Faster recovery from transient congestion 04/29/13Han Liu20

Network Stability (2) Han Liu [tornado traffic] 6.3x 2104/29/13

Network Stability (3) Han Liu [foreground traffic at 50% injection rate] 3.3x -13% saturation rate 2204/29/13

Performance Isolation (1) Inject two classes of traffic into network – Shared buffer space, separate VCs Sharing causes interference between classes (leads to latency problem) ABP reduces interference – Contains effects of congestion within a class Better isolation between workloads, VMs, … 04/29/13Han Liu23

Performance Isolation (2) Han Liu [uniform random foreground traffic] [hotspot background traffic][uniform random background traffic] -33% -38% 2404/29/13

Performance Isolation (3) Han Liu [50% uniform random background traffic] -31% w/o background 2504/29/13

Application Performance Han Liu [12.5% injection rate for streaming traffic] -31% w/o background 2604/29/13

Conclusions Sharing improves buffer utilization, but can lead to undesired interference effects Adaptive Backpressure regulates credit flow to avoid unproductive use of shared buffer space Mitigates performance degradation in presence of adversarial traffic But maintains key benefits of buffer sharing under benign conditions Han Liu2704/29/13

THE END Thank you for your attention! Han Liu2804/29/13 Question?