FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.
Energy-Efficient Congestion Control Opportunistically reduce link capacity to save energy Lingwen Gan 1, Anwar Walid 2, Steven Low 1 1 Caltech, 2 Bell.
Misbah Mubarak, Christopher D. Carothers
A Novel 3D Layer-Multiplexed On-Chip Network
International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Lizhong Chen and Timothy M. Pinkston SMART Interconnects Group
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
1/42 Changkun Park Title Dual mode RF CMOS Power Amplifier with transformer for polar transmitters March. 26, 2007 Changkun Park Wave Embedded Integrated.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Adaptive Routing in (Q)NoC
FTDCS 2003 Network Tomography based Unresponsive Flow Detection and Control Authors Ahsan Habib, Bharat Bhragava Presenter Mohamed.
Performance and Robustness Testing of Explicit-Rate ABR Flow Control Schemes Milan Zoranovic Carey Williamson October 26, 1999.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Enhancing TCP Fairness in Ad Hoc Wireless Networks Using Neighborhood RED Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
5/3/2011 International Symposium on Network-on-Chip 1 DART: A Programmable Architecture for NoC Simulation on FPGAs Danyao Wang*† Natalie Enright Jerger*
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
Guy Lemieux, Mehdi Alimadadi, Samad Sheikhaei, Shahriar Mirabbasi University of British Columbia, Canada Patrick Palmer University of Cambridge, UK SoC.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.
Efficient Microarchitecture for Network-on-Chip Routers
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Incremental Run-time Application Mapping for Heterogeneous Network on Chip 2012 IEEE 14th International Conference on High Performance Computing and Communications.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
On Reliable Modular Testing with Vulnerable Test Access Mechanisms Lin Huang, Feng Yuan and Qiang Xu.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Lecture 23: Interconnection Networks
SECTIONS 1-7 By Astha Chawla
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
Queue Dynamics with Window Flow Control
Rahul Boyapati. , Jiayi Huang
Columbia University in the city of New York
Deadlock Free Hardware Router with Dynamic Arbiter
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Using Packet Information for Efficient Communication in NoCs
Lecture: Interconnection Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Meshed Multipath Routing: An Efficient Strategy in Wireless Sensor Networks Swades DE Chunming QIAO Hongyi WU EE Dept.
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Presentation transcript:

FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers Gwangsun Kim, John Kim Dept. of Computer Science Korea Advance Institute of Science and Technology Sungjoo Yoo Dept. of Electronic and Electrical Engineering Pohang University of Science and Technology

Motivation Use power-gating and turn off unused entries! [Kumar et al., ICCD’07] Allocator 3% Router Power Breakdown Crossbar Switch Input buffer Clock 46% 16% 35% On-chip network is becoming more critical. Buffer size has a huge impact on performance. Buffers take a large portion of router power. However, not all of the buffers are fully utilized even at a high load. Buffer size Use power-gating and turn off unused entries!

Our Approach Dynamically adjust the active window size. Active window: set of ON (or active) entries of a buffer. At a low traffic load Active window F ON OFF At a high traffic load F

Issue 1: Flow Control Need to communicate the availability of buffers Case 1: Increase the active window size using early credit Router 0 Router 1 Router 2 flit flit F ON OFF credit credit CR 2 1 When? There is an incoming flit. There is an OFF buffer entry. There is congestion in both upstream and local router.

Issue 1: Flow Control (cont’d) Case 2: decrease the active window size by withholding credit. Router 0 Router 1 Router 2 flit flit F F credit credit CR 2 When? There is an outgoing flit. There is more than the minimum # of ON entries.

Issue2: Circular Queue Problem When utilization is low, each incoming flit turns on an entry. → Each activation of an entry incurs power overhead! Problematic circular buffer Each flit activates an entry. Ideal buffer management The same entry is reused. OFF ON OFF FLIT 0 OFF FLIT 2 FLIT 4 FLIT 1 FLIT 3 FLIT 0 ON FLIT 1 OFF ON OFF FLIT 2 OFF ON OFF FLIT 3 OFF ON OFF FLIT 4 OFF ON OFF Large power overhead No power overhead

Unified mode Split Queue A buffer is separated into two regions. Use the primary region only (as long as possible). Adjust the active window size dynamically. Operate like a circular queue Unified mode FLIT 0 ON Primary region ON FLIT 1 FLIT 2 ON OFF Not used ON OFF Secondary region OFF OFF

Split Queue (cont’d) Cannot stay in the unified mode indefinitely. Switch to split mode. When the primary region is empty, Switch back to unified mode. Primary region is empty! Flits are read out from here. Primary region Secondary region Primary region Secondary region FLIT 3 ON Next flit’s place is NOT available. ON FLIT 1 OFF FLIT 2 ON Yet, there are unused entries. FLIT 4 ON Flits are written to here. OFF FLIT 5 ON OFF ON OFF OFF Split queue Unified mode

Summary of Evaluation 13% 39% Simulator : Cycle-accurate OCN simulator - Booksim Power Measurement - Orion 2.0 Parameter Topology 8x8 2D mesh Technology node 32nm # of VCs 4 Clock frequency 1.5GHz VC buffer depth 8 Vdd 1.0 V Performance Power consumption 13% 39%

Conclusions There’s a huge opportunity of power-saving with fine- grained power gating when buffers are large. Proposed modified credit-based flow control. Split queue is proposed to minimize activation power overhead. Our simulation results show that, with minimal performance loss, FlexiBuffer + SQ can save 39% of router power at low traffic load 13% of router power at high traffic load

For more discussion, please come to my poster! Thank you! Questions? For more discussion, please come to my poster!