More Realistic 16-Tap FIR Presented By Lihua, DONG Deyan, LIU
Overflow FIR Architecture FIR Architecture Original Design Original Design Arithmetic Improvement Arithmetic Improvement Parallel Multiplication Tree Addition Carry-Save-Adder Implemented Datapath Implemented Datapath Simulation Waveform Simulation Waveform Synthesis Results Synthesis Results
FIR Architecture FIR ASIC Design Overview FIR ASIC Design Overview
FIR Architecture (cont’d) FIR Basic Structure FIR Basic Structure
Original Design Sequential Arithmetic Operations on “+” and “*” Sequential Arithmetic Operations on “+” and “*” acc <= rin * c0; acc <= rin * c0; acc <= rs1 * c1 + acc; acc <= rs1 * c1 + acc; acc <= rs2 * c2 + acc; acc <= rs2 * c2 + acc; ……………. ……………. acc<= rs15 * c15 + acc; acc<= rs15 * c15 + acc;
Arithmetic Improvement Observation Observation Redundant Mix-up-ed “+” and “*” NO Data Dependency on “*” rin * c0; rin * c0; rs1 * c1; rs1 * c1; rs2 * c2; rs2 * c2; ……… ……… rs15 * c15; rs15 * c15;
Arithmetic Improvement (cont’d) Improvement Strategy I Improvement Strategy I Partition on “+” and “*” 16 Parallel “*” rin <= sample; tmp0 <= sample * c0; tmp0 <= sample * c0; tmp1 <= rs1 * c1; tmp1 <= rs1 * c1; tmp2 <= rs2 * c2; tmp2 <= rs2 * c2; tmp3 <= rs3 * c3; tmp3 <= rs3 * c3; tmp15 <= rs15 * c15; tmp15 <= rs15 * c15;
Arithmetic Improvement (cont’d) Critical Path Critical Path Very Long Single Instruction of “+” result <= tmp0 + tmp1 + tmp tmp15; result <= tmp0 + tmp1 + tmp tmp15;
Arithmetic Improvement (cont’d) Improvement Strategy II Improvement Strategy II Partition on Level of “+” Tree-Structure “+”
Arithmetic Improvement (cont’d) Improvement Strategy III Improvement Strategy III Carry-Save-Adder (CSA)
Implemented Datapath Combinational Logic for Addition Combinational Logic for Addition Sequential Logic for Multiplication 2-states FIR Filter Design 2-states FIR Filter Design 1 st state: Data-waiting & Multiplication 2 nd state: Addition & Register-shifting
Simulation Waveform Only ONE Cycle Input-Output Delay Only ONE Cycle Input-Output Delay
Synthesis Results One Clock Cycle == 6ns One Clock Cycle == 6ns Clock Frequency == 167MHz Total Cell Area == Total Cell Area ==