Download presentation
Presentation is loading. Please wait.
Published byRonald Hubbard Modified over 9 years ago
1
Approaches to Low-Power Implementations of DSP Systems Class Advisor : Dr. Fakhraie Presentor : Nariman Moezi DSP Design & Implementation Course Seminar Spring 2004
2
Out line Reduced two’s complement representation Low power Scheduling Techniques for embedded DSP software Low power multiplier - Mitchell-Based logarithm multiplier - Power-Aware pipelined multiplier
3
Reduced two’s complement representation two’s complement representation is widely used in the implementation of arithmetic operations. If X has a small magnitude and switches between a positive and a negative value,it’s sign extension changes between strings of zeros and ones. If X has magnitude less than 2 m-1 (m<N), We van represent this number by the sum of an m-bit vector and a constant vector having a string of ones from bit N-1 to bit m-1 at the MSB side : (Zhan Yu et al, 2002)
4
APPLICATION : Low power FIR filter using Reduced Two’s Complement Representation Consider a hybrid-form adaptive FIR filter,where the inputs are 5-level data symbols and take values in {-2,-1,0,-1,2}. Assuming coefficients are N-bit two’s complement numbers Such multiplications are simply shift and complement operations Assume that we detect that the maximum magnitude of a coefficient H is less than 2 m-2.We know that corresponding partial product P has a magnitude less than 2 m-1.
5
- Coefficient Maximum Magnitude Detection (An example with two taps and 6 bit coefficients) - Partial-Product generation using reduced two’s complement representation
6
-As the adaptive filter updates the coefficients, the word-length of the reduced representation will change. So does the error introduced by using the reduced representation.We can build a compensation vector correction path that imitates the error propagation in the accumulation path. -A test chip was implemented in 0.25 um CMOS technology.There were used a hybrid-form filter of 160 taps and having 8 taps per hybrid section.The coefficient word-length is 10 bits.when operating at 2.5V with a 100MHz clock, a 32% power saving has been measured as summarized in this table :
7
Low-Power Scheduling Techniques for Embedded DSP Software This section describes an instructional-level power model for a processor (Fujitsu), and techniques to reduce the power of this processor. The DSP processor has a special architecture that allows instructions to be packed into pairs. The Booth multiplier on this processor is a major source of energy consumption for DSP programs. So a micro-architectural power model for the on chip Booth-multiplier is developed and analyzed for further power minimization. Based on this model, an effective technique of local code modification by operand swapping is used to further reduce power consumption. (S. Malik,IEEE Trans 1997)
8
The sum of measured current for the four instructions is 204 mA. The sum of the base costs (37.2+14.4+36.6+14.4) and the overhead costs of adjacent instructions (18.4+18.4+18.4+18.4) is only 176.2,which under estimates the actual cost by 13.6%. The difference,27.8,in the two estimates comes from the circuit state overhead between non-adjacent instructions 1&3. This is due to a special design at the inputs of the multiplier.there is a latch between each operand and multiplier to retain the the old values until the next multiply instruction is executed. This overhead is dependent on the previous and current values of input latches for each multiply operation. An example of a sequence four instructions where the overhead cost between 1 and 3 can nat be ignored
9
Instruction packing for lowpower A special architecture of the target DSP processor is the capability of packing an ALU-type instruction and a data transfer instruction codeword for simultaneous execution. The average current for packed instructions is only slightly more than the average current for a sequence of the two unpacked instructions. Comparision of energy consumed by packed and unpacked instructions
10
As to the overhead cost of MAC instructions, when MAC is packed with a data transfer instruction, especially LAB,which changes data values in registers A and B used by MAC as inputs, significantly wide variation of overhead cost is observed(from 1.4mA to 33.0mA). Such wide variation is mainly due to the complex booth multiplier implemented in the MAC unit. The fundamental idea behind booth multiplier is to recode B by “skipping over 1s” technique. For example a 7-digit B value 0011110 that would need four additions of shifted A,can be recoded to a new value which requires one addition and a subtraction weight=4 weight=2 Micro architectural model for the booth multiplier
11
we can reduce the number of additions and subtractions by just swapping the operands in registers A and B, which can result in current reduction. The table gives three experiments where swapping : Another that determines power consumption of the multiplier,is switching activity For the booth multiplier the characteristic of A is it’s switching activity and for B, weight factor and switching activity Variation of measured current by swapping operands op1 and op2 in registers A and B for MAC:LAB instructions.
12
Average current drawn by MAC:LAB for different characteristics of consecutive values in A and B. For a typical DSP application MAC:LAB instructions are usually applied to a sequence data for filter operations such as As we know only C and there is no information about X we, consider C as the value B.If switching activity or weight factor of value C is high we can swap operands. Comparison of power consumption for 5 DSP programs by different scheduling techniques
13
Improved Mitchell-Based Logarithmic Multiplier for Low- power DSP Applications The technique of multiplying two numbers using logarithms is simple. Take the logarithms of two multiplicands, add the logarithms together and then take the antilogarithm of the resulting summation. Mitchell method of calculating logarithms : assume N = 25 10 = 11001 2 The MSB is bit 4,that gives a characteristic of 100 2 and the retaining bits(1001 2 ) gives the fraction. This gives a value for the logarithm of 100.1001 2 (=4.5625 10 ). The correct value of log 2 (25) is 4.6439. (Duncan J. McLaren et al IEEE 2003)
14
A binary number N,can be written as: Note that k represents the characteristic and x the binary fraction,with x in the range 0< x < 1. The true logarithm and the approximation using the Mitchell method are: The logarithm of a product is equal to the sum of the logarithms of the multiplicands Antilogarithms of this two equations are: To correct the error the following is used:
15
This shows that to provide the correct answer, an error correction factor should be added to the summation before the antilogarithm is calculated. however this would be impractical. The approach is to average the value of the correction factor over a range of x values, and add this to the summation. This results in a multiplier of improved accuracy. multiplier of improved accuracy. The two fractional parts are split into 8 ranges, from 0 to 1 in steps of 0.125. This means that the 3 most significant bits of x can be used to determine the error correction factor (which is pre calculated).
16
To test the multiplier further, it was used as part of a real application, in this case a Finite Impulse Response (FIR) Filter. The filter was an 11-tap low-pass FIR, with a normalized cut-off frequency of 0.25. The filter was implemented in Verilog using the standard multiplier, the un-modified Mitchell multipliers and the Improved Mitchell multipliers. The input was 16-bit and the output was 32-bit. The figure below shows the magnitude response from each of the three implementations.
17
Power-aware Pipelined Multiplier Design Based On 2-Dimensional Pipeline Gating Power-aware Pipelined Multiplier Design Based On 2-Dimensional Pipeline Gating Although Boolean multipliers have natural power awareness to the changing of input precision, deeply pipelined designs do not have this benefit. In Boolean unpipelined multipliers, low input precision calculation (like 0001×0001) dissipates much less power than high input precision calculation (like 1111×1111). So Boolean unpipelined multipliers are naturally power aware to the changing of input precision. In deeply pipelined designs, the number of registers is much larger than that of other elements, these designs do not have the natural power awareness to the changing of input precision. (Jia Di, J. S. Yuan et al GLSVLSI 2003)
18
To solve this problem and improve the power awareness of deeply pipelined multipliers,a novel technique,2-dimensional pipeline gating is proposed.This technique is to gate the clock to the registers in both vertical and horizontal direction.
19
In a 4*4 multiplier, when the input precision is 4, for example, calculating 1111×1111, S is generated based on all inner partial products. If the input precision is 2, for example, calculating 0011 0011, the partial products containing X2 or Y2 (the ones enclosed by a rectangular) can also be disabled.
22
References M. T. Lee, V. Tiwari, S. Malik, and M. Fujita, “Power analysis and minimization techniques for embedded DSP software," IEEE Trans. VLSI Syst., vol. 5, pp. 123-135, Mar. 1997. Jia Di, J. S. Yuan et al,“Power-aware Pipelined Multiplier Design Based On 2- Dimensional Pipeline Gating “GLSVLSI’03, April 28-29, 2003 Zhan Yu et al,”A Low Power Adaptive Filter Using Dynamic Reduced 2’SC Representation”,IEEE Custom Integrated Circuits Conference 2002 Duncan J. McLaren et al,“Improved Mitchell-Based Logarithmic Multiplier for Low Power DSP Applications”IEEE 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.