Download presentation
Presentation is loading. Please wait.
Published byTracey Byrd Modified over 9 years ago
1
Datapath Designs CK Cheng CSE Department UC, San Diego
2
Prefix Adder – Well-known and Well-developed? Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc.
3
Prefix Adder – New Respects, New Method Realistic design considerations: Timing, Power and Area. Integer Linear Programming for prefix adder: –Logic effort timing model (gate cap. + wire cap.) –Activity-statistic power model –Non-uniform signal arrival/required times Logic Levels Max FanoutsMax Wire Tracks Timing PowerArea
4
Prefix Adder – Optimum Prefix adders Uniform signal arrival/required times Sklansky AdderKogge-Stone Adder Fastest depth-4 optimal prefix adder Fastest depth-3 optimal prefix adder
5
Prefix Adder – Optimum Prefix adders Uniform signal arrival/required times
6
Prefix Adder – Optimum Prefix adders Non-uniform signal arrival/required times Increasing Signal Arrival TimesDecreasing Signal Arrival TimesConvex Signal Arrival Times
7
Division – Iteration effort Pencil and paper method: (A=Q B+2 -n R and R<B) 1 bit partial quotient per iteration, n iterations A = 0.1001, B = 0.1010; Q = A / B. Q = 0.1101 + Q i : Partial Quotient R i : Partial Remainder R i+1 = R i – B Q i 1 0 0 11 0 R0=AR0=A 1 0 1 0 0 R2R2 0 0 0 1 0 0 0 R3R3 1 1 0 0 1 1 0 R4R4 1 0 0.1 1 0 0 0 R1R1 Q 1 = 0.1 Q 2 = 0.01 Q 3 = 0.000 Q 4 = 0.0001
8
Division – Memory effort Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration. SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor. Table size: 2 2m m STR method is limited by memory wall.
9
Division – Arithmetic effort Partial quotient is calculated by arithmetic functions. Prescaling: Taylor expansion: Series expansion:
10
Division – Solution space Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider. Iteration Effort Memory Effort Arithmetic Effort Memory Wall Pencil-and-paper SRT Prescaling Taylor Expansion Low area Series Expansion Low latency Our target
11
Division – PST algorithm Utilize the power of series expansion, but need a good start point. Prescaling provide a scaled divisor close to 1. 0-order Taylor expansion iterates to reach the final quotient
12
Division – PST algorithm E 0 = Table (B (m) ) 1/B A 1 = A E 0 ; B 1 = B E 0 E 1 = (2 B 1 ) INV(B 1 (2m) ) Q i = R i-1 E 1 R i = R i-1 Q i B 1 Q = Q + Q i A = 0.1011,0110 B = 0.1100,1011 B (m) = 0.1100 E 0 = 1.0011 E 1 = INV(B 1 (2m) ) = 1.0000,1110 A 1 = A E 0 = 0.1101,1000,0010 B 1 = B E 0 = 0.1111,0001,0001 Q 1 = A 1 E 1 = 0.1110,0011 R 1 = B 1 – Q 1 B 1 = 0.0000,0010,0101,1110,1101 Q 2 = R 1 E 1 = 0.1001,1111 R 2 = R 1 – Q 2 B 1 = 0.0000,0001,1111,1011,0001 Q = 0.1110,0011 + 0.0000,0010,0111,11 = 0.1110,0101,0111,11
13
Division – FPGA Implementation PST algorithm is suitable for high- performance division unit design in FPGAs Fmax (Period) ALUT s Memor y Bits DSP Blocks Power Consumption (Dynamic+Static) Throughput IP Core (no DSP) 50.16M Hz (19.935n s) 1203840381mW (52mW+329mW) 50.16Mdiv/s PST (DSP) 72.8MHz (13.737n s) 21376828350mW (23mW+327mW) 24.3Mdiv/s PST (no DSP) 73.20M Hz (13.661n s) 14377680378mW (50mW+328mW) 24.4Mdiv/s PST-pipelined (DSP) 74.15M Hz (13.486n s) 26176840344mW (17mW+327mW) 74.15Mdiv/s PSTp (no DSP) 76.05M Hz (13.150n s) 19407680359mW (31mW+328mW) 76.05Mdiv/s 32-bit division with 5-cycle latency
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.