Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

Similar presentations


Presentation on theme: "MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK."— Presentation transcript:

1 MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK

2 2/19 Future performance gains will primarily come from increasing the number of IP cores in a system not their complexity or operating frequency Many reasons: –Diminishing returns from simply scaling what we have –Energy efficiency –Complexity –Fault tolerance –Economics Communication-Centric Architectures

3 3/19 On-Chip Networks An efficient general purpose chip-wide communication infrastructure is becoming essential One flexible networking option is to use packet- switched networks with support for virtual- channels

4 4/19 The Lochside Router Router Architecture –Highly parameterised implementation –Packet-switched network with virtual-channel flow- control –Best case latency is one cycle per network hop. Results presented here are from post P&R simulations targeting a 90nm technology TILE Traffic Generator, Debug & Test R Lochside Chip (2004/05) 180nm Technology

5 5/19 Exploiting Speculation to Reduce Communication Latency Peh/Dally (2001)

6 6/19 Exploiting Speculation to Reduce Communication Latency

7 7/19 Apply existing power saving techniques to an on-chip network design –e.g. clock and signal gating, gate-level optimisations etc. –Importance of applying such techniques before making comparisons Measure power consumption and provide an accurate breakdown of where the remaining power is dissipated Where is best place to look for future power savings? Aims of this work

8 8/19 Measuring and Optimizing Dynamic Power Our Test Case –8mm x 8mm die –4x4 mesh network –Low-latency routers, best case latency is one cycle per hop (incl. interconnect) –1.2V, 90nm technology –4 input-buffers/ VC –4 VC/ input port –48 x 80-bit network links –800MHz @ WC PVT ~32 FO4 clock period –Results reported at 250MHz

9 9/19 Interconnect Delay/Energy Trade-offs Power dissipated in network links depends on how links are spaced and buffered At least a factor of 3 difference in energy consumption over range of potential interconnect options Could move to low-swing differential schemes for even greater energy savings For results we assume min. spaced wires, opt. energy x delay product

10 10/19 Clock gating optimisations applied at two levels: –Local Clock Gating Automated clock gating within router Some tuning of RTL involved to maximise opportunities for synthesis tool –Router Level Clock Gating Exploit opportunities to gate clock as it enters the router Isolates router’s clock completely, only static power consumption remains Clock Gating

11 11/19 Clock gating exposes clock tree insertion delay Need to know early if router will be required Generate ‘early valid’ signals in neighbouring routers –Early-valid signals are slightly pessimistic –Based on what is requested not granted Router-Level Clock Gating

12 12/19 Automated signal gating and gate-level power optimisations had minimal impact Inserting signal gating logic manually did reduce input FIFO power requirements significantly The reported results could be further improved (by 12%) by enabling logic optimisation across module boundaries –This was restricted to accurately determine where power is dissipated Gate-Level Optimizations and Signal Gating

13 13/19 Simple power optimisations can quarter power requirements + many more opportunities to save power Network is ~5% of core area Perhaps 10% of system power at present Don’t make comparisons without optimizing power! Power consumption of a single router and its links Analysis of Power Consumption

14 14/19 22% Static power, 11% Inter-Router Links ~1% Global Clock tree 65% Dynamic Power –Power Breakdown ~50% of dynamic power is consumed in local clock tree and input FIFOs ~30% on router datapath ~20% on scheduling and arbitration –Scheduling is probably more complex than typical implementations due to speculation Analysis of Power Consumption

15 15/19 Low-Power On-Chip Networks Interconnect and static power set to increase –Many low-power link technologies Low-swing differential techniques –Power gating and other leakage reduction techniques Potential power savings begin to require lots of different techniques – no one silver bullet?

16 16/19 Low-Power On-Chip Networks Topology –Don’t want to sacrifice general or at least multi- purpose nature of our networked SoC –Results suggest higher radix routers and longer interconnects could reduce power Probably not a long term solution Reduces path diversity, bad for fault-tolerance Architecture –Scope for minimising memory required to store precomputed router schedule (particular to our router) –Simpler routers –Single cycle routers reduce power? Speculation for low-power?

17 17/19 Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently Current timing of the datapath and link suggests additional GS data could be routed in the same clock cycle –Allocate datapath/link to GS traffic for first ½ of clock cycle Double capacity of network –Exploit simpler GS circuit-switched routing when possible –Reduce power Very little additional overhead

18 18/19 Network system timing issues are interesting –naturally event-driven not synchronous Work is investigating placing local data-driven clock generators in each network router –Clock is stretched when no data to be routed –Clock matches rate of incoming data streams –Robust synchronisation solution (true GALS) –Also investigating incorporating power gating support See also Distributed Clock Generator – DCG (Fairbanks/Moore) Clocking On-Chip Networks

19 19/19 Challenges and Future Work These are early results in a much more rigorous study on the power requirements of networked on-chip comummunication –Much more soon! Exploiting a general-purpose on-chip network –Exploiting execution diversity to improve energy-efficiency –Multi-use platforms and Virtual-IP –Fault tolerance –Networks of processing elements or networks that process? Scope for removing unnecessary interfaces and boundaries Impact of networking on IP and processor core design

20 Thank You


Download ppt "MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK."

Similar presentations


Ads by Google