Download presentation
Presentation is loading. Please wait.
1
Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University
2
Power Dissipation Static dissipation due to leakage circuit Short-circuit dissipation Charge and discharge of output load capacitor
3
Power Dissipation Static dissipation due to leakage circuit Short-circuit dissipation Charge and discharge of output load capacitor V in V out V DD GND o
4
Dynamic Power Dissipation Model P: power dissipation C: load capacitance E: avg. transition count of the gate/ clock cycle V dd : supply voltage T cyc : clock period
5
Dynamic Power Dissipation Model P: power dissipation C: load capacitance E: Avg. transition count of the gate/ clock cycle V dd : supply voltage T cyc : clock period
6
Motivation Execution frequency of instructions is uneven Take MOV class as an example three instructions 22% execution frequency Profiling from Powerstone
7
Coupling Sub-decoders Partition an instruction decoder into two coupling sub-decoders The smaller decoder decodes only a small number of instructions When the smaller decoder is active, the larger decoder is turned off The smaller decoder is active frequently
8
Architecture of Coupling Sub- decoders Controls to turn on/off sub-decoders Activate-Control Input AND-OR Output OR 0 10 0Output bit0 I-Decoder0I-Decoder1 I-Activate Control FF1FF2FF3FFn … instruction I-Control0 I-Control1... S-Decoder0S-Decoder1 S-Activate Control S-Control0S-Control1... FF1 FFn 1101
9
Instruction Grouping Problem How to decompose Decoder so that the smaller sub-decoder is small the smaller sub-decoder is executed frequently the activate logic is small
10
Weighted Graph Model of Execution Sequence Node : instruction type Edge (U,V) : instruction U (V) executed after V (U) Weights on nodes and edges: execution frequency mov ldr mov b mul mov mul cmp mul b mov b ldr b mul 14 2 4 3 cmp 14 1 2 2 15 5 1 3 1 5
11
Power Model SF i : transition frequency from Mi to Mi CF ij : transition frequency between Mi and Mj Power i : power of Mi estimated by Synopsys mov ldr b mul 14 2 4 3 cmp 14 1 2 2 15 5 1 3 1 5 Mj Mi
12
Instruction Grouping Problem : Graph Partitioning Generation of transition graph Initial clustering by random walk Initial partition of clusters Iterative improvement by moving clusters among groups
13
Experimental Process ARM7tdmi Circuit described by Verilog Circuit synthesized by Synopsys Design Compiler Power estimated by PrimePower: switching activities are collected by simulating Powerstone benchmark set
14
Results on Two-way Decomposition
15
Power Consumption Comparisons Power (W) Orig.Decomp.Improve Instruction Decoder 4.01E-42.81E-429.97% Control Unit 1.03E-38.35E-418.94% Lower power consumption
16
Critical Path and Area Comparisons Shorter critical path timing Area overhead Critical Path Timing (ns)Area Orig.Decomp.ImproveOrig.Decomp.Overhead Instruction Decoder 35.4930.9012.93%31190342429.78% Control Unit37.0532.4712.36%66268704386.29%
17
Results on Multiple-way Decomposition
18
Power Consumption for Different Multi-way Grouping Two-way decomposition has best power reduction more groups more overhead 0 1.E-04 2.E-04 3.E-04 4.E-04 5.E-04 Original 2way 3way 4way DecoderOverhead Power (W)
19
Critical Path Timing for Different Multi-way Grouping Four-way decomposition has best timing reduction 40 0 5 10 15 20 25 30 35 Original2 way3 way4 way 5 way T i m i n g ( n s ) DecoderOverhead
20
Area Comparisons Area for different multi-way grouping 30000 32000 34000 36000 38000 40000 42000 Original 2way 3way 4way 5way Area
21
Conclusions Two-way partitioning has the best results for 142-instruction set Compared to un-decomposed decoder 30% reduction in power consumption 13% improvement in critical path timing Compared to un-decomposed control-U 19% reduction in power consumption 12% improvement in critical path timing
22
Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.