Design and Implementation of Turbo Decoder for 4G standards IEEE e and LTE Syed Z. Gilani
Motivation Conventional serial decoding architectures can be performance bottleneck – 6144 bit block, 8 250MHz, 1 bit processed per cycle=> data rate < 6144/ (6144*8*4ns) – ~ 31Mbps Data rates for LTE can be 100Mbps-300Mbps Parallel architecture necessary to support high throughput decoding
Maximum-a posteriori (MAP) algorithm – Alpha – Beta – Gamma – LLR (De)Interleaver P(i) = (f 1 *i + f 2 *i 2 ) mod N switch (i mod 4) case 0: P(i) = (P 0 *i + 1 ) mod N case 1: P(i) = (P 0 *i N/2 + P 1 ) mod N case 2: P(i) = (P 0 *i P 2 ) mod N case 3: P(i) = (P 0 *i + 1 +N/2 + P 3 ) mod N Turbo Decoder Overview
Optimizations Resource Sharing Retiming Look-ahead transformation Variable and adaptive parallelism Multiplierless interleaver
Parallelization Time (cycles) States PE 1 PE 2 PE 3 PE 4
Variable Parallelization Parallel Interleaver Bank 0 Bank 1 Bank 0 Bank 1 Coded Bits Decoded Bits
Variable Parallelization Parallel Interleaver Bank 0 Bank 3 Bank 1 Bank 2 Bank 0 Bank 3 Bank 1 Bank 2 Coded Bits Decoded Bits
Interleaver Optimization Interleaving functions – P(i) = (f 1 *i + f 2 *i 2 ) mod N – switch (i mod 4) case 0: P(i) = (P 0 *i + 1 ) mod N case 1: P(i) = (P 0 *i N/2 + P 1 ) mod N case 2: P(i) = (P 0 *i P 2 ) mod N case 3: P(i) = (P 0 *i + 1 +N/2 + P 3 ) mod N Unoptimized Memory requirements – Don’t want to use multipliers and dividers – Storing all memory address in RAM – LTE alone supprts 40 different block lengths with different interleaving parameters – Block lengths vary from 40 bits to 6144 bits
Interleaver Optimization On-the-fly address generation LTE Interleaving Function P(i) = (f 1 *i + f 2 *i 2 ) mod N P(i+1)= (f 1 *(i+1) + f 2 *(i+1) 2 ) mod N = P(i) +( f 1 + f 2 +2 f 2 ) mod N Wimax Interleaving Function switch (i mod 4) case 0: P(i) = (P 0 *i + 1 ) mod N case 1: P(i) = (P 0 *i N/2 + P 1 ) mod N case 2: P(i) = (P 0 *i P 2 ) mod N case 3: P(i) = (P 0 *i + 1 +N/2 + P 3 ) mod N – P(i+1) = (P 0 (i) + P 0 + constant factor ) mod N Replace sum by residue whenever sum exceeds N to avoid mod N (subtraction)
Interleaver Optimization PEiP(i) Bank Add. Bit Add PEiP(i) Bank Add. Bit Add
Lookahead Transformation tktk t k+1 tktk t k+2 16 Comparisons required for lookahead transformation in Duo-binary Wimax turbo codes Increases throughput by 2x Maximum clock rate decreases from 500MHz to ~300MHz along with significant increase in area
Results No of IterationsNumber of PEsThroughputSerial throughput 22490Mbps243Mbps 24909Mbps243Mbps Mbps243Mbps 42245Mbps122Mbps 44455Mbps122Mbps 48833Mbps122Mbps 82 60Mbps 84228Mbps60Mbps 500Mhz
Questions
Outline Motivation Turbo Encoding Turbo Decoding Optimizations – Look-ahead transformation – Variable and adaptive parallelism – Multiplierless interleaver Results Summary
Turbo Encoder LTE Turbo EncodingWimax Turbo Encoding
Parallelization Example 4 state trellis 1 decoded symbol per cycle Time (cycles) States