1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali Chakrabarti 2, Alastair Reid 3, Krisztian Flautner 3 1 Advanced Computer Architecture Lab, University of Michigan 2 Department of Electrical Engineering, Arizona State University 3 ARM, Ltd.
2 Advantages of Software Defined Radio Multi-mode operations Lower costs –Faster time to market –Prototyping and bug fixes –Chip volumes –Longevity of platforms Protocol complexity favors software dominated solutions Enables future wireless communication innovations –Cognitive radio
3 SDR Design Objectives for W-CDMA Programmable processor –Same hardware should support Turbo decoder as well as other DSP algorithms Throughput requirements –2Mbps Power constraints –100mW ~ 500mW
4 SODA: DSP Processor for SDR
5 SODA PE SIMD Pipeline
6 SODA PE SIMD Shuffle Network
7 SODA PE Scalar Pipeline
8 Turbo Decoder on SODA Most computationally intensive algorithm in W-CDMA Hardest algorithm to parallelize Implementation outline –MaxLogMAP trellis computation with SIMD operations –Parallelizing trellis computations through sliding window –Interleaver implementation
9 Trellis Computation on SODA Two types of trellis diagram configurations –Blue edges: (0-branch), Red edges: (1-branch) Mapping trellis of size S onto SODA of SIMD size T
10 Forward Trellis on SODA (S = T) Misaligned SIMD operation
11 Handling SIMD Misalignment
12 Sliding Window on SODA Problem: –W-CDMA uses K=4, 8 wide trellis –SODA has 32-wide SIMD Solution: –parallelize trellis computation by implementing sliding window fully utilize SIMD width achieving higher-throughput in the process
13 Sliding Window Parallelization
14 Sliding Window on SODA (S < T)
15 Turbo Decoder System Operations
16 SODA DMA Modifications Traditional DMA controller –Designed for block data transfer –1 source and 1 destination address per block Modified DMA controller –Adding data interleaving functionality to DMA –Needs to handle scalar data transfers –1 source and 1 destination address per scalar
17 Achieved Performance on SODA SODA operates at 400MHz Can achieve 2.08Mbps with I = 5 Average number of cycles for one trellis block dummy calculation size of one trellis block 1 bit of Alpha, Beta and LLC computation data memory access Number of sliding windows processed in parallel 1 bit of Alpha, Beta and LLC computation Overall Turbo decoder throughput SODA operation frequency Number of Turbo iterations Cycles for 1bit trellis computaion = T block /L Extrinsic scaling
18 Conclusion & Future Work Implementation summary –SODA consumes <100mW in 90nm –Meets W-CDMA throughput requirements –Hardware features wide SIMD execution SIMD permutation network smart DMA Beyond 3G –Support for higher throughput 3G+ protocols Multi-processor SODA for Turbo decoder –LDPC decoding
19 Questions?