Presentation is loading. Please wait.

Presentation is loading. Please wait.

Para-CORDIC: Parallel CORDIC Rotation Algorithm and Architecture

Similar presentations


Presentation on theme: "Para-CORDIC: Parallel CORDIC Rotation Algorithm and Architecture"— Presentation transcript:

1 Para-CORDIC: Parallel CORDIC Rotation Algorithm and Architecture
(IEEE T-CAS I, Vol. 51, No. 8, pp , Aug. 2004) Tso-Bing Juang, Ph.D VLSI Design LAB, Dept. CSE, NSYSU

2 My Research – Computer Arithmetic
Applications of arithmetic components DSP (Digital Signal Processing) 3-D graphics Computer communications, etc. Topics of arithmetic [Ercegovac 2004]: Addition/Subtraction Multiplication/Division Floating-point operations CORDIC (COordinate Rotation DIgital Computer)

3 International Conference
My Publications ( ) Topics SCI Journal International Conference Domestic Conference CORDIC 3 Multiplier 2 4 1 DCT

4 Academic Honors Best thesis award, Xerox Co. Ltd, 1995
Join Midwest Symposium of Circuits and Systems (MWSCAS) supported by NSC, 1999 First prize award of FPGA, National Intellectual Property Contest. FPGA, 2000 First prize award of Full Custom Design Contest, 2001 Join Asia-Pacific Conference on Circuits and Systems (APCCAS) supported by MOE, 2002 2005 Marquis, Who’s who in Science and Engineering, Edition 2006 Marquis, Who’s who in the World

5 Outline Basic Concept of CORDIC Bottleneck of CORDIC Rotation
Proposed Methods Previous Methods Comparisons Applications Conclusions

6 1. Basic Concept of CORDIC

7 What is CORDIC? CORDIC (COordinate Rotation DIgital Computer)
Rotate vector (1,0) by f to get (cos f, sin f) Can evaluate many arithmetic functions Rotation realized by shift-add operations Convergence method (iterative) About n iterations for n-bit accuracy Methods that we have seen so far: Table lookup: too much area Polynomial approx: too many multiply/adds

8 Conventional CORDIC Rotation
. Each iteration, x and y performs one micro-rotation based on the sign of z

9 CORDIC Functions

10 Pre-computation of tan(ai)
Find ai such that tan(ai)=2-i (or, ai=tan-1(2-i)) Possible to write any angle f = a0  a1  …  an as long as -99.7°  f  99.7° (which covers –90..90) Let’s say we want to rotate (x,y) by only 7.1 degrees. How do we rewrite the equations in the box on the prev slide? What if we want to rotate by 10.7 degrees? (note: 10.7= ). What are the exact sequence of operations you perform? What if 8.9 degrees ( )? So, for example, to get 90 degrees, you have to add almost all the angles. For –90, subtract almost all Do we have to use ALL alpha_i’s? (e.g., if you want to evaluate 71.6 degrees, adding the first two angles will do it) Important question. We will get back to that later Why is it good news that we can cover –90..90?

11 Conventional CORDIC Rotation
Algorithm: (z is the current angle) “At each step, try to make z approach to zero” Initialize x0=K= ,y0=0,z0= For i = 0 n i= 1 when zi>=0, else -1 [i.e., i=sign(zi)] xi+1 = xi – i 2-i yi yi+1 = yi + i 2-ixi zi+1 = zi – i ai End For Result: xn+1=cos(), yn+1=sin() Precision: n bits This is the basic CORDIC iteration Note: xi, yi shown on the right are misleading. The real axis is –phi degrees rotated

12 Example (z0==30= ) The angels come from the table in the previous slide The figure shows only a few steps Note that the sign is not always alternating between + and – (we have two consecutive +’s)

13 CORDIC Hardware What do each of the adders do?
What is the table lookup for? Not shown: logic for determining di

14 Three Important Factors of CORDIC
Large additions/subtractions Scaling factor (constant vs. non-constant) Sequential execution

15 Research Topics about CORDIC
Redundant CORDIC architecture Error analysis of CORDIC Application of CORDIC architectures CORDIC algorithm with non-constant scaling factors Parallel CORDIC architecture

16 2. Bottleneck of CORDIC Rotation

17 Conventional CORDIC Rotation (Revisited)
. Sequential determination of σi based on zi

18 Sequential CORDIC Rotation Architecture
The actual speed bottleneck lies in the sequential determination of the value of

19 3. Proposed Methods

20 How to parallelize? Using each bit of input angle to determine σi
Remove the bottleneck (B: bit accuracy) In the first m-1 iterations  sequential In other iterations  parallel

21 Our Proposed Techniques
MAR (Micro-rotation to Angle Recoding) Obtain the combinations of tan-1 terms in each 2-i, i=1 to m-1 BBR (Binary to Bipolar Recoding) Obtain the polarity{-1,+1} of each binary {1,0} weight of input angle  hardware free For example, B=24

22 Example (B=24) Three extra micro-rotation stages are required Phase 1

23 Architecture of a 24-b CORDIC –based SIN/COS Generator

24 Algorithm of MAR

25 Our MAR Results

26 Our MAR Results

27 Para-CORDIC Architecture -1/2

28 Para-CORDIC Architecture -2/2
S(1) S(5) S(8) σ1 R(1) R(i)

29 Carry-save Adder-Based Realization for Micro-Rotation Stages
A 4:2 compressor is exploited to produce the carry save form (a sum and a carry)

30 Evaluation of the Z Datapath
Delay is: Area is:

31 The delay of Z Datapath

32 Merged Rotations of the Second Half Iterations
Delay savings

33 4. Previous Methods

34 Comments of Previous Proposed CORDIC Rotation – 1/4
[Wang 1997]: IEEE T-Computers The first m-1 iterations are sequential Area saving

35 Comments of Previous Proposed CORDIC Rotation - 2/4
[Phatak 1998]: IEEE T-Computers Double hardware to perform clockwise/counterclockwise rotations Area cost is high (signed-digit realization of X/Y/Z iterations)

36 Comments of Previous Proposed CORDIC Rotation - 3/4
[Kwak 2000] Proc. MWSCAS Complicated logic circuits to generate the first m-1 rotation directions

37 Comments of Previous Proposed CORDIC Rotation - 4/4
[Kuhlmann 2002] : EUROSIP Using ROM to generate the first m-1 directions

38 Our Proposed Para-CORDIC
The delay and the area costs of para-CORDIC is: and

39 5. Comparisons

40 Latency Comparisons

41 Area Comparisons

42 6. Applications

43 ROM-based Implementations for sine/cosine generation
When x1 and y1 are constant (x1=K, y1=0, xB+1=cos(), yB+1=sin()) Can reduce the extra micro-rotation stages

44 Optimal Number of ROM Entries

45 Optimal Number of ROM Entries

46 7. Conclusions

47 Summary Parallel CORDIC rotation (Para-CORDIC) Better latency/area
Improve the original sequential execution of CORDIC rotation Complete proof of the proposed theorems Submission information 2003/7/11 submitted 2004/4/21 fully accepted 2004/ published Better latency/area

48 Future Work Physical implementation of Para-CORDIC
Dealing with the negative numbers when perform carry-save addition Floating-point representation of data Reduced micro-rotation stages in MAR Parallel CORDIC Vectoring Methods Must deal with two concurrent variables

49 Low-Error Fixed-Width Carry-Free Multipliers Design
( To appear in IEEE T-CAS II, 2005)

50 Definition An n  n fixed-width multiplier ECV
Has n most significant product bits Needs a small compensation circuit to generate error compensation value (ECV) ECV Constant Fixed Simple implementation, large errors Adaptive Variable Complex implementation, lower errors

51 An 88 Carry-Free Fixed-Width Multiplier using Modified Booth Encoding (MBE)
LPminor = others in truncated parts Mpost = truncates the bit after multiplication

52 Direct Implementation – Mdirect (only considers LPmajor)
The ECV is for n-bit accuracy RFA/RHA : Redundant Full/Half Adders

53 The Concept of Our Derivation of Compensation Circuits
Using the basic definition of MBE to obtain the possibility of each partial product digit equals to 1, -1 and 0. Previous works: same probability of each partial product  Using statistical analysis to derive the relationship between LPminor and LPmajor Previous works: only makes use of LPmajor 

54 Derivation Process

55 Derivation of Compensation Value and Circuit

56 Probability of the Partial Product Digits After MBE

57 Derivation of Compensation Value and Circuit
The expected value can be derived by considering three conditions when (1)

58 Derivation of Compensation Value and Circuit
(2)

59 Derivation of Compensation Value and Circuit
(3)

60 Derivation of Compensation Value and Circuit
Combining (1)(2)(3), Using similar methods, we have

61 Our Proposed Low-Error Carry-Free Fixed-Width Multipliers
Half of partial products are reduced in the compensation circuit, LPmajor only

62 Previous Proposed Fixed-Width Multipliers
All are binary representations [Kidambi 1996]: the ECV is a pre-determined constant [Jou 1999]: LPmajor to generate ECV. [Van 2000]: program-based exhaustive search method to obtain ECV [Jou 2000]: MBE, similar to the direct implementation [Cho 2004]: LPmajor and LPminor are required to calculate the ECV

63 Comparisons of Previous Methods

64 Absolute Average Error Analysis and Variance Analysis

65 Area ratios of three kinds of BSD fixed-width multipliers

66 Quality Analysis of Fixed-Width Multiplications in JPEG Image Compressions

67 Summary Our proposed fixed-width multipliers
Lower average errors and variances Low-cost compensation circuits Can be applied to high-speed DSP applications

68 Future Research Topics
Chip Implementation of proposed CORDIC and fixed-width multipliers Low-power RNS multiplier design Automatic datapath synthesizer for embedded systems Design and analysis of high-speed dividers using proposed multipliers

69 Thank you very much, I love Dept. of IECS at Feng Chia!


Download ppt "Para-CORDIC: Parallel CORDIC Rotation Algorithm and Architecture"

Similar presentations


Ads by Google