Download presentation
Presentation is loading. Please wait.
Published byAshley Benedict Gardner Modified over 6 years ago
1
Para-CORDIC: Parallel CORDIC Rotation Algorithm and Architecture
(IEEE T-CAS I, Vol. 51, No. 8, pp , Aug. 2004) Tso-Bing Juang, Ph.D VLSI Design LAB, Dept. CSE, NSYSU
2
My Research – Computer Arithmetic
Applications of arithmetic components DSP (Digital Signal Processing) 3-D graphics Computer communications, etc. Topics of arithmetic [Ercegovac 2004]: Addition/Subtraction Multiplication/Division Floating-point operations CORDIC (COordinate Rotation DIgital Computer)
3
International Conference
My Publications ( ) Topics SCI Journal International Conference Domestic Conference CORDIC 3 Multiplier 2 4 1 DCT
4
Academic Honors Best thesis award, Xerox Co. Ltd, 1995
Join Midwest Symposium of Circuits and Systems (MWSCAS) supported by NSC, 1999 First prize award of FPGA, National Intellectual Property Contest. FPGA, 2000 First prize award of Full Custom Design Contest, 2001 Join Asia-Pacific Conference on Circuits and Systems (APCCAS) supported by MOE, 2002 2005 Marquis, Who’s who in Science and Engineering, Edition 2006 Marquis, Who’s who in the World
5
Outline Basic Concept of CORDIC Bottleneck of CORDIC Rotation
Proposed Methods Previous Methods Comparisons Applications Conclusions
6
1. Basic Concept of CORDIC
7
What is CORDIC? CORDIC (COordinate Rotation DIgital Computer)
Rotate vector (1,0) by f to get (cos f, sin f) Can evaluate many arithmetic functions Rotation realized by shift-add operations Convergence method (iterative) About n iterations for n-bit accuracy Methods that we have seen so far: Table lookup: too much area Polynomial approx: too many multiply/adds
8
Conventional CORDIC Rotation
. Each iteration, x and y performs one micro-rotation based on the sign of z
9
CORDIC Functions
10
Pre-computation of tan(ai)
Find ai such that tan(ai)=2-i (or, ai=tan-1(2-i)) Possible to write any angle f = a0 a1 … an as long as -99.7° f 99.7° (which covers –90..90) Let’s say we want to rotate (x,y) by only 7.1 degrees. How do we rewrite the equations in the box on the prev slide? What if we want to rotate by 10.7 degrees? (note: 10.7= ). What are the exact sequence of operations you perform? What if 8.9 degrees ( )? So, for example, to get 90 degrees, you have to add almost all the angles. For –90, subtract almost all Do we have to use ALL alpha_i’s? (e.g., if you want to evaluate 71.6 degrees, adding the first two angles will do it) Important question. We will get back to that later Why is it good news that we can cover –90..90?
11
Conventional CORDIC Rotation
Algorithm: (z is the current angle) “At each step, try to make z approach to zero” Initialize x0=K= ,y0=0,z0= For i = 0 n i= 1 when zi>=0, else -1 [i.e., i=sign(zi)] xi+1 = xi – i 2-i yi yi+1 = yi + i 2-ixi zi+1 = zi – i ai End For Result: xn+1=cos(), yn+1=sin() Precision: n bits This is the basic CORDIC iteration Note: xi, yi shown on the right are misleading. The real axis is –phi degrees rotated
12
Example (z0==30= ) The angels come from the table in the previous slide The figure shows only a few steps Note that the sign is not always alternating between + and – (we have two consecutive +’s)
13
CORDIC Hardware What do each of the adders do?
What is the table lookup for? Not shown: logic for determining di
14
Three Important Factors of CORDIC
Large additions/subtractions Scaling factor (constant vs. non-constant) Sequential execution
15
Research Topics about CORDIC
Redundant CORDIC architecture Error analysis of CORDIC Application of CORDIC architectures CORDIC algorithm with non-constant scaling factors Parallel CORDIC architecture
16
2. Bottleneck of CORDIC Rotation
17
Conventional CORDIC Rotation (Revisited)
. Sequential determination of σi based on zi
18
Sequential CORDIC Rotation Architecture
The actual speed bottleneck lies in the sequential determination of the value of
19
3. Proposed Methods
20
How to parallelize? Using each bit of input angle to determine σi
Remove the bottleneck (B: bit accuracy) In the first m-1 iterations sequential In other iterations parallel
21
Our Proposed Techniques
MAR (Micro-rotation to Angle Recoding) Obtain the combinations of tan-1 terms in each 2-i, i=1 to m-1 BBR (Binary to Bipolar Recoding) Obtain the polarity{-1,+1} of each binary {1,0} weight of input angle hardware free For example, B=24
22
Example (B=24) Three extra micro-rotation stages are required Phase 1
23
Architecture of a 24-b CORDIC –based SIN/COS Generator
24
Algorithm of MAR
25
Our MAR Results
26
Our MAR Results
27
Para-CORDIC Architecture -1/2
28
Para-CORDIC Architecture -2/2
S(1) S(5) S(8) σ1 R(1) R(i)
29
Carry-save Adder-Based Realization for Micro-Rotation Stages
A 4:2 compressor is exploited to produce the carry save form (a sum and a carry)
30
Evaluation of the Z Datapath
Delay is: Area is:
31
The delay of Z Datapath
32
Merged Rotations of the Second Half Iterations
Delay savings
33
4. Previous Methods
34
Comments of Previous Proposed CORDIC Rotation – 1/4
[Wang 1997]: IEEE T-Computers The first m-1 iterations are sequential Area saving
35
Comments of Previous Proposed CORDIC Rotation - 2/4
[Phatak 1998]: IEEE T-Computers Double hardware to perform clockwise/counterclockwise rotations Area cost is high (signed-digit realization of X/Y/Z iterations)
36
Comments of Previous Proposed CORDIC Rotation - 3/4
[Kwak 2000] Proc. MWSCAS Complicated logic circuits to generate the first m-1 rotation directions
37
Comments of Previous Proposed CORDIC Rotation - 4/4
[Kuhlmann 2002] : EUROSIP Using ROM to generate the first m-1 directions
38
Our Proposed Para-CORDIC
The delay and the area costs of para-CORDIC is: and
39
5. Comparisons
40
Latency Comparisons
41
Area Comparisons
42
6. Applications
43
ROM-based Implementations for sine/cosine generation
When x1 and y1 are constant (x1=K, y1=0, xB+1=cos(), yB+1=sin()) Can reduce the extra micro-rotation stages
44
Optimal Number of ROM Entries
45
Optimal Number of ROM Entries
46
7. Conclusions
47
Summary Parallel CORDIC rotation (Para-CORDIC) Better latency/area
Improve the original sequential execution of CORDIC rotation Complete proof of the proposed theorems Submission information 2003/7/11 submitted 2004/4/21 fully accepted 2004/ published Better latency/area
48
Future Work Physical implementation of Para-CORDIC
Dealing with the negative numbers when perform carry-save addition Floating-point representation of data Reduced micro-rotation stages in MAR Parallel CORDIC Vectoring Methods Must deal with two concurrent variables
49
Low-Error Fixed-Width Carry-Free Multipliers Design
( To appear in IEEE T-CAS II, 2005)
50
Definition An n n fixed-width multiplier ECV
Has n most significant product bits Needs a small compensation circuit to generate error compensation value (ECV) ECV Constant Fixed Simple implementation, large errors Adaptive Variable Complex implementation, lower errors
51
An 88 Carry-Free Fixed-Width Multiplier using Modified Booth Encoding (MBE)
LPminor = others in truncated parts Mpost = truncates the bit after multiplication
52
Direct Implementation – Mdirect (only considers LPmajor)
The ECV is for n-bit accuracy RFA/RHA : Redundant Full/Half Adders
53
The Concept of Our Derivation of Compensation Circuits
Using the basic definition of MBE to obtain the possibility of each partial product digit equals to 1, -1 and 0. Previous works: same probability of each partial product Using statistical analysis to derive the relationship between LPminor and LPmajor Previous works: only makes use of LPmajor
54
Derivation Process
55
Derivation of Compensation Value and Circuit
56
Probability of the Partial Product Digits After MBE
57
Derivation of Compensation Value and Circuit
The expected value can be derived by considering three conditions when (1)
58
Derivation of Compensation Value and Circuit
(2)
59
Derivation of Compensation Value and Circuit
(3)
60
Derivation of Compensation Value and Circuit
Combining (1)(2)(3), Using similar methods, we have
61
Our Proposed Low-Error Carry-Free Fixed-Width Multipliers
Half of partial products are reduced in the compensation circuit, LPmajor only
62
Previous Proposed Fixed-Width Multipliers
All are binary representations [Kidambi 1996]: the ECV is a pre-determined constant [Jou 1999]: LPmajor to generate ECV. [Van 2000]: program-based exhaustive search method to obtain ECV [Jou 2000]: MBE, similar to the direct implementation [Cho 2004]: LPmajor and LPminor are required to calculate the ECV
63
Comparisons of Previous Methods
64
Absolute Average Error Analysis and Variance Analysis
65
Area ratios of three kinds of BSD fixed-width multipliers
66
Quality Analysis of Fixed-Width Multiplications in JPEG Image Compressions
67
Summary Our proposed fixed-width multipliers
Lower average errors and variances Low-cost compensation circuits Can be applied to high-speed DSP applications
68
Future Research Topics
Chip Implementation of proposed CORDIC and fixed-width multipliers Low-power RNS multiplier design Automatic datapath synthesizer for embedded systems Design and analysis of high-speed dividers using proposed multipliers
69
Thank you very much, I love Dept. of IECS at Feng Chia!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.