FPGA Implementations for Volterra DFEs Andreas Emeretlis George Theodoridis
Outline Volterra Decision Feedback Equalizers Hardware Architecture PCI 2014 Outline Volterra Decision Feedback Equalizers Hardware Architecture Implementation Considerations Experimental Results and Comparisons Conclusions
Electronic Equalization in Optical Systems PCI 2014 Electronic Equalization in Optical Systems Limited capacity of optical fibers Channel impairements Chromatic Dispersion (CD) Polarization Mode Dispersion (PMD) Reduction of costly optical equalization Implementation of complex DSP algorithms Intersymbol Interference (ISI)
Decision Feedback Equalizer (DFE) PCI 2014 Decision Feedback Equalizer (DFE) General Form: Feed-Forward Filter (FFF) Pre-cursor ISI Feedback filter (FBF) Post-cursor ISI Quantizer Symbol Decision Adder Implementation challenges Pipelining the feedback loop Parallelism of quantizer loop Non-linear filters Increased complexity Hardware resources
Linear vs Non-linear DFEs PCI 2014 Linear vs Non-linear DFEs Linear DFEs Non-linear DFEs
Volterra Decision Feedback Equalizer PCI 2014 Volterra Decision Feedback Equalizer Direct Detection Non-linear distortion 2nd order Volterra filters (VDFE) Sensitivity to sampling phase Fractional Spacing Processing of 2 samples/symbol
Outline Volterra Decision Feedback Equalizers Hardware Architecture PCI 2014 Outline Volterra Decision Feedback Equalizers Hardware Architecture Implementation Considerations Experimental Results and Comparisons Conclusions
Feed-forward Transformations PCI 2014 Feed-forward Transformations Parallelism Unrolling the filter equation Pipelining Registers between filter elements Synchronization registers
Feedback Transformations PCI 2014 Feedback Transformations Loop Precomputation Computational units in the FF part Multiplexer loop Loop Pipelining Lookahead Loop Unrolling FB Input FB Output Î(n-1) Î(n-2) Î(n-3) yB(n) 1 b33= bp1 b22= bp2 b22+ b23+ b33= bp3 b11= bp4 b11+ b13+ b33= bp5 b11+ b12+ b22= bp6 b11+ b12+ b13+ b22 + b23+ b33= bp7 J0(n)=Î(n-1) Î(n-1) J1(n)=Î(n-1) Î(n-2) J2(n)=Î(n-1) Î(n-3)
Feedback Architectures – Area Reduction PCI 2014 Feedback Architectures – Area Reduction Straightforward Approach Incremental Processing Approach L-3 stages N L L-2 stages L-1 stages L-1 stages L-N
Outline Volterra Decision Feedback Equalizers Hardware Architecture PCI 2014 Outline Volterra Decision Feedback Equalizers Hardware Architecture Implementation Considerations Experimental Results and Comparisons Conclusions
Employed FPGA Platform (1/2) PCI 2014 Employed FPGA Platform (1/2) Configurable Logic Architecture Configurable Logic Blocks (CLB) CLBs are interconnected via Switch Matrix CLB 2 Slices Slice 4 Look-Up-Tables, Carry Computation Chain, 8 Flip-Flops Drawbacks Predefined geometry High routing delay No 100% occupation of each slice
Employed FPGA Platform (2/2) PCI 2014 Employed FPGA Platform (2/2) Hardcore DSP Logic Architecture On-chip hardwired modules Low area occupation High-speed implementation of DSP algorithms Dedicated high-speed interconnection resources DSP48E1 Slice 25 × 18 bits multiplier 48 bits accumulator Bypass multiplexers SIMD adder Internal pipeline registers Cascading I/O ports
Implementation Considerations: Wordlength PCI 2014 Implementation Considerations: Wordlength Input: 7 bits 6 bits fractional Volterra inputs: 9 bits 8 bits fractional Coefficients: 13 bits 12 bits fractional Datapath: 14 bits 13 bits fractional
Implementation Considerations PCI 2014 Implementation Considerations FIR filters Pipelined Mul-Add modules Adder cascades Volterra Kernel Pipelined standalone adders Fabric interconnection of DSP slices Pre-computation stage Manual SIMD mode (3 × 14 bits)
Outline Volterra Decision Feedback Equalizers Hardware Architecture PCI 2014 Outline Volterra Decision Feedback Equalizers Hardware Architecture Implementation Considerations Experimental Results and Comparisons Conclusions
Experimental Results Straightforward Approach PCI 2014 Experimental Results Straightforward Approach Incremental Processing Approach Speed [Gb/s] Parallel/ Pipeline Level Freq. [MHz] Area Slices DSPs 5 12/37 417 3,781 744 V6 7 18/43 405 5,812 1,116 10 25/50 400 9,941 1,550 11/27 463 3,021 682 V7 16/30 443 4,962 992 24/31 428 8,047 1,488 Speed [Gb/s] Parallel/ Pipeline Level Freq. [MHz] Area Slices DSPs 5 12/35 419 3,537 744 V6 7 17/49 418 4,646 1,054 10 24/70 417 6,324 1,488 10/42 503 3,068 620 V7 15/38 467 4,543 930 24/41 430 6,911
Experimental Results: Performance Comparison PCI 2014 Experimental Results: Performance Comparison Straightforward Approach Incremental Processing Approach
Experimental Results: DSP Utilization Comparison PCI 2014 Experimental Results: DSP Utilization Comparison Straightforward Approach Incremental Processing Approach
Outline Volterra Decision Feedback Equalizers Hardware Architecture PCI 2014 Outline Volterra Decision Feedback Equalizers Hardware Architecture Implementation Considerations Experimental Results and Comparisons Conclusions
Conclusions Not predictable performance PCI 2014 Conclusions Not predictable performance Important progress of reconfigurable technology Efficiency of hardwired modules FPGA: suitable platform for high-speed communications 10 Gb/s with ~50% of DSPs 17 Gb/s with ~100% of DSPs
Thank you for your attention Questions? PCI 2014 Thank you for your attention Questions?