Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-Chip Rahul Boyapati*, Jiayi Huang*, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum, Eun Jung Kim
Motivation NoC static power portion increases as technology shrinks. 2 NoC static power portion increases as technology shrinks. Static power saving is crucial for power-efficient NoC design. Power-Gating is one solution to save static power.
Router Power-Gating Router power-gating categories: 3 Router power-gating categories: Power state of the attached core OS power-gates cores based on workloads Router uses this information to power-gate attached routers Network traffic status Independent of attached core’s power state Inaccurate traffic detection leads to frequent on/off transitions
Router Power-Gating Router power-gating categories: 4 Router power-gating categories: Power state of the attached core (Fly-Over) OS power-gates cores based on workloads Router uses this information to power-gate attached routers Network traffic status Independent of attached core’s power state Inaccurate traffic detection leads to frequent on/off transitions
Challenges and Prior Work 5 Challenges Packet detour can degrade performance Network disconnection Network (re)configuration overhead Prior Work: Router Parking (Samih et al. HPCA’13) Power on more routers for network connectivity More detour around off routers Centralized control incurs high reconfiguration overhead
Fly-Over (FLOV)
Key Idea Inspired by Fly-Over transportation network 7 Inspired by Fly-Over transportation network Source: http://cartoonisawadhesh.blogspot.com/2010/06/?m=0 Packets can Fly Over the off routers without detour.
Key Idea 8 D S D S Detour Fly-Over (FLOV)
Fly-Over Implementation 9 FLOV Router Microarchitecture Handshake Controller Power State Registers Credit Control Logic Handshake Protocol Dynamic FLOV Routing Algorithm
FLOV Router Microarchitecture 10 Baseline Router Handshake Controller Credit Control Logic Input E Input N Output E FLOV Latch PSRs Input W Input S Output W Output S Output N 6 Handshake Controller Handshaking with neighbors for power state transitions Power State Registers (PSRs) Keeps power states of physical/logical neighbors Credit Control Logic Augmented to relay credits while router core is gated
FLOV Handshake Protocols 11 Need to facilitate distributed router power transitions. Restricted FLOV (rFLOV) No consecutive routers in a row/column can be power-gated. Simpler control but power savings limited. R C
FLOV Handshake Protocols 12 Need to facilitate distributed router power transitions. Generalized FLOV (gFLOV) Consecutive routers can be power-gated. Complex protocol but aggressive power savings. R C
FLOV Handshake Protocols 13 Active Draining Sleep Wakeup Power-Gating: Active – Draining (finish intermittent transmission) – Sleep. Power On: Sleep – Wakeup (finish intermittent transmission) – Active.
FLOV Routing Algorithm 14 FLOV Architecture Right-most column ALWAYS active They maintain network connectivity
FLOV Routing Algorithm 15 (a) Destination Partitioning. (b) Routing Example. Dynamic routing algorithm based on YX routing. best effort minimal routing.
Evaluation
Experimental Setup Architecture Tools Evaluated Schemes NoC 17 Architecture 2 GHz Alpha cores 32 KB L1 I/D$, 8 MB L2$ MESI, 4 MCs at 4 corners Tools Gem5+Booksim2 DSENT Power Model Evaluated Schemes No Power-Gating (Baseline) Router Parking (RP) Restricted FLOV (rFLOV) Generalized FLOV (gFLOV) NoC 8x8 mesh Default Y-X routing 3-stage pipeline router 3 regular virtual channel (VCs) and 1 escape VC 6-flit input buffer depth 4-flit packet for synthetic 1 mm link, 1 cycle, 16 Byte width Power Parameters 32 nm technology node 17.7 pJ power-gating overhead 10-cycle wakeup latency
Static Power for Synthetic Workload 18 Uniform Random (0.08 flits/node/cycle) gFLOV power-gates more routers RP keeps more routers on for network connectivity rFLOV power saving limited
Dynamic Power for Synthetic Workload 19 Uniform Random (0.08 flits/node/cycle) FLOV even consumes less dynamic power by Fly-Over router pipelines RP consumes highest power due to detour
Packet Latency for Synthetic Workload 20 Uniform Random (0.08 flits/node/cycle) FLOV is close to Baseline Best-effort minimum routing Fast FLOV links
Energy for PARSEC 2.1 NoC energy consumption normalized to Baseline: 21 NoC energy consumption normalized to Baseline: FLOV achieves 43% and 22% static energy reduction compared to Baseline and RP. FLOV saves 36% and 18% total energy over Baseline and RP.
Performance for PARSEC 2.1 22 Application full system runtime normalized to Baseline: FLOV degrades the performance less than 1%.
Network Reconfiguration Overhead 23 Reconfiguration starts Reconfiguration starts FLOV power-gating is light-weight in terms of latency RP’s centralized power-gating control has more than 700 cycle reconfiguration overhead, leading to high average packet latency
Summary Proposed Fly-Over (FLOV) power-gating mechanism 24 Proposed Fly-Over (FLOV) power-gating mechanism Distributed mechanism. Seamless NoC functionality ensured. Performance-power tradeoff achieved. FLOV comprises of Router Microarchitecture enhancement. Handshake protocols. Dynamic Routing Algorithm. FLOV saves 22% static energy compared to state-of- the-art with less than 1% performance degradation.
Fly-Over: Thank you
Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-Chip Rahul Boyapati*, Jiayi Huang*, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum, Eun Jung Kim
Backup
Total Power for Synthetic Workload 20 Uniform Random (0.08 flits/node/cycle) FLOV Aggressive power-gating Net gain from both dynamic and static power saving
Packet Latency Decomposition 30 Uniform Random (0.08 flits/node/cycle) FLOV latency: fast FLOV link through gated routers. Accumulated router latency: router pipeline latency in active routers
Static Power for Synthetic Workload 31 Tornado (0.08 flits/node/cycle) gFLOV power-gates more routers RP keeps more routers on for network connectivity rFLOV power saving limited
Dynamic Power for Synthetic Workload 32 Tornado (0.08 flits/node/cycle) FLOV even consumes less dynamic power by Fly-Over router pipelines RP consumes highest power due to detour
Packet Latency for Synthetic Workload 33 Tornado (0.08 flits/node/cycle) FLOV is close to Baseline Best-effort minimum routing Fast FLOV links
Total Power for Synthetic Workload 34 Uniform Random (0.08 flits/node/cycle) FLOV Aggressive power-gating Net gain from both dynamic and static power saving
Packet Latency Decomposition 35 Tornado (0.08 flits/node/cycle) FLOV latency: fast FLOV link through gated routers. Accumulated router latency: router pipeline latency in active routers