Rahul Boyapati. , Jiayi Huang

Slides:



Advertisements
Similar presentations
Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.
Advertisements

QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Lizhong Chen and Timothy M. Pinkston SMART Interconnects Group
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Issues in System-Level Direct Networks Jason D. Bakos.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) Daihan Wang (Keio.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Elastic-Buffer Flow-Control for On-Chip Networks
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Yu Cai Ken Mai Onur Mutlu
A+MAC: A Streamlined Variable Duty-Cycle MAC Protocol for Wireless Sensor Networks 1 Sang Hoon Lee, 2 Byung Joon Park and 1 Lynn Choi 1 School of Electrical.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Efficient Microarchitecture for Network-on-Chip Routers
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Architecture and Algorithms for an IEEE 802
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Lecture 23: Interconnection Networks
A Study of Group-Tree Matching in Large Scale Group Communications
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Rachata Ausavarungnirun, Kevin Chang
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Lecture 17: NoC Innovations
Israel Cidon, Ran Ginosar and Avinoam Kolodny
APPROX-NoC: A Data Approximation Framework for Network-On-Chip Architectures Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, Eun Jung Kim.
ECE453 – Introduction to Computer Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
CS 6290 Many-core & Interconnect
Presentation transcript:

Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-Chip Rahul Boyapati*, Jiayi Huang*, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum, Eun Jung Kim

Motivation NoC static power portion increases as technology shrinks. 2 NoC static power portion increases as technology shrinks. Static power saving is crucial for power-efficient NoC design. Power-Gating is one solution to save static power.

Router Power-Gating Router power-gating categories: 3 Router power-gating categories: Power state of the attached core OS power-gates cores based on workloads Router uses this information to power-gate attached routers Network traffic status Independent of attached core’s power state Inaccurate traffic detection leads to frequent on/off transitions

Router Power-Gating Router power-gating categories: 4 Router power-gating categories: Power state of the attached core (Fly-Over) OS power-gates cores based on workloads Router uses this information to power-gate attached routers Network traffic status Independent of attached core’s power state Inaccurate traffic detection leads to frequent on/off transitions

Challenges and Prior Work 5 Challenges Packet detour can degrade performance Network disconnection Network (re)configuration overhead Prior Work: Router Parking (Samih et al. HPCA’13) Power on more routers for network connectivity More detour around off routers Centralized control incurs high reconfiguration overhead

Fly-Over (FLOV)

Key Idea Inspired by Fly-Over transportation network 7 Inspired by Fly-Over transportation network Source: http://cartoonisawadhesh.blogspot.com/2010/06/?m=0 Packets can Fly Over the off routers without detour.

Key Idea 8 D S D S Detour Fly-Over (FLOV)

Fly-Over Implementation 9 FLOV Router Microarchitecture Handshake Controller Power State Registers Credit Control Logic Handshake Protocol Dynamic FLOV Routing Algorithm

FLOV Router Microarchitecture 10 Baseline Router Handshake Controller Credit Control Logic Input E Input N Output E FLOV Latch PSRs Input W Input S Output W Output S Output N 6 Handshake Controller Handshaking with neighbors for power state transitions Power State Registers (PSRs) Keeps power states of physical/logical neighbors Credit Control Logic Augmented to relay credits while router core is gated

FLOV Handshake Protocols 11 Need to facilitate distributed router power transitions. Restricted FLOV (rFLOV) No consecutive routers in a row/column can be power-gated. Simpler control but power savings limited. R C

FLOV Handshake Protocols 12 Need to facilitate distributed router power transitions. Generalized FLOV (gFLOV) Consecutive routers can be power-gated. Complex protocol but aggressive power savings. R C

FLOV Handshake Protocols 13 Active Draining Sleep Wakeup Power-Gating: Active – Draining (finish intermittent transmission) – Sleep. Power On: Sleep – Wakeup (finish intermittent transmission) – Active.

FLOV Routing Algorithm 14 FLOV Architecture Right-most column ALWAYS active They maintain network connectivity

FLOV Routing Algorithm 15 (a) Destination Partitioning. (b) Routing Example. Dynamic routing algorithm based on YX routing. best effort minimal routing.

Evaluation

Experimental Setup Architecture Tools Evaluated Schemes NoC 17 Architecture 2 GHz Alpha cores 32 KB L1 I/D$, 8 MB L2$ MESI, 4 MCs at 4 corners Tools Gem5+Booksim2 DSENT Power Model Evaluated Schemes No Power-Gating (Baseline) Router Parking (RP) Restricted FLOV (rFLOV) Generalized FLOV (gFLOV) NoC 8x8 mesh Default Y-X routing 3-stage pipeline router 3 regular virtual channel (VCs) and 1 escape VC 6-flit input buffer depth 4-flit packet for synthetic 1 mm link, 1 cycle, 16 Byte width Power Parameters 32 nm technology node 17.7 pJ power-gating overhead 10-cycle wakeup latency

Static Power for Synthetic Workload 18 Uniform Random (0.08 flits/node/cycle) gFLOV power-gates more routers RP keeps more routers on for network connectivity rFLOV power saving limited

Dynamic Power for Synthetic Workload 19 Uniform Random (0.08 flits/node/cycle) FLOV even consumes less dynamic power by Fly-Over router pipelines RP consumes highest power due to detour

Packet Latency for Synthetic Workload 20 Uniform Random (0.08 flits/node/cycle) FLOV is close to Baseline Best-effort minimum routing Fast FLOV links

Energy for PARSEC 2.1 NoC energy consumption normalized to Baseline: 21 NoC energy consumption normalized to Baseline: FLOV achieves 43% and 22% static energy reduction compared to Baseline and RP. FLOV saves 36% and 18% total energy over Baseline and RP.

Performance for PARSEC 2.1 22 Application full system runtime normalized to Baseline: FLOV degrades the performance less than 1%.

Network Reconfiguration Overhead 23 Reconfiguration starts Reconfiguration starts FLOV power-gating is light-weight in terms of latency RP’s centralized power-gating control has more than 700 cycle reconfiguration overhead, leading to high average packet latency

Summary Proposed Fly-Over (FLOV) power-gating mechanism 24 Proposed Fly-Over (FLOV) power-gating mechanism Distributed mechanism. Seamless NoC functionality ensured. Performance-power tradeoff achieved. FLOV comprises of Router Microarchitecture enhancement. Handshake protocols. Dynamic Routing Algorithm. FLOV saves 22% static energy compared to state-of- the-art with less than 1% performance degradation.

Fly-Over: Thank you

Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-Chip Rahul Boyapati*, Jiayi Huang*, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum, Eun Jung Kim

Backup

Total Power for Synthetic Workload 20 Uniform Random (0.08 flits/node/cycle) FLOV Aggressive power-gating Net gain from both dynamic and static power saving

Packet Latency Decomposition 30 Uniform Random (0.08 flits/node/cycle) FLOV latency: fast FLOV link through gated routers. Accumulated router latency: router pipeline latency in active routers

Static Power for Synthetic Workload 31 Tornado (0.08 flits/node/cycle) gFLOV power-gates more routers RP keeps more routers on for network connectivity rFLOV power saving limited

Dynamic Power for Synthetic Workload 32 Tornado (0.08 flits/node/cycle) FLOV even consumes less dynamic power by Fly-Over router pipelines RP consumes highest power due to detour

Packet Latency for Synthetic Workload 33 Tornado (0.08 flits/node/cycle) FLOV is close to Baseline Best-effort minimum routing Fast FLOV links

Total Power for Synthetic Workload 34 Uniform Random (0.08 flits/node/cycle) FLOV Aggressive power-gating Net gain from both dynamic and static power saving

Packet Latency Decomposition 35 Tornado (0.08 flits/node/cycle) FLOV latency: fast FLOV link through gated routers. Accumulated router latency: router pipeline latency in active routers