TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang Xu Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 5, 1999
TI DSPS FEST 1999 Outline u Introduction to W-CDMA u Use of DSPs in wireless communications u Channel Estimation and Multistage Detection u DSP implementation issues u Real-time requirements u Conclusions
TI DSPS FEST 1999 Wideband-CDMA u Third generation wireless communication systems u Multimedia capabilities u Multirate services u Quality of service u Higher Data Rates: 2 Mbps, 384 Kbps, 144 Kbps.
TI DSPS FEST 1999 DSPs in Wireless Communications u Digital Signal Processor u Signal ProcessingCommunications u Prototyping advanced communication algorithms u Features : – Low Power Consumption – Low Cost – High Performance
TI DSPS FEST 1999 The Wireless Channel : Multiuser, Multipath Direct Path Reflected Paths Faces Attenuation, Delays and Doppler Effects : Unknown Channel Parameters Antenna Noise + MAI Desired User
TI DSPS FEST 1999 Base-Station Receiver Channel Estimator Multiuser Detector Demux Decoder Data Pilot Estimated Amplitudes & Delays Demodulator Antenna
TI DSPS FEST 1999 CDMA Uplink System Channel Encoder Channel Encoder Channel Encoder Spreading AWGN Matched Filter Matched Filter Channel Estimator Matched Filter Multi- User Detector Channel Decoder + User 1 d 1 User 2 d 2 User K d K R(t) User 1 d 1 ' User 2 d 2 ' User K d K ' y1y1 y2y2 yKyK Demux
TI DSPS FEST 1999 Maximum Likelihood - Channel Estimation u Send a time-multiplexed Preamble (Pilot). u Channel properties extracted from received signal. u Compare received signal with known pilot and estimate channel parameters. u Keep estimate for remaining data bits (static). u Repeat preamble every frame, if no tracking.
TI DSPS FEST 1999 The Maximum Likelihood Algorithm u Compute the correlation matrices u Compute the channel estimate Calculate the noise covariance matrix K. Calculate the channel impulse response vector z. u Extract the ampitudes and delays from the channel impulse response vector using least squares fit.
TI DSPS FEST 1999 The ML Algorithm Complexity u Complex-Real Dot Product. u Complex-Real Matrix Product. u Complex -Real Product. u Real Square roots. – Solving quadratic equation for least squares fit. u Critical code : Matrix-vector multiplications / Dot Product Assuming Unity Noise Covariance Offline
TI DSPS FEST 1999 Differencing Multistage - Multiuser Detection u Based on the principle of Parallel Interference Cancellation (PIC) u Cross-correlation information used to remove interference of other users from desired user u Repeated iterations for convergence u Differencing techniques applied for improving the performance of the algorithm
TI DSPS FEST 1999 The Differencing Multistage Detector u Split the crosscorrelation matrix into lower, upper and the diagonal matrix. u Calculate the channel impulse response iteratively using x is called the differencing vector.
TI DSPS FEST 1999 Multistage Detector Complexity u Matrix Multiplication: – Computed only once for one frame u Dot Product: – Computed iteratively u Critical code: Dot Product
TI DSPS FEST 1999 TI Tools Used u Evaluation Modules (EVM) for C6201 and C6701 fixed and floating point DSPs – 64 KB each internal program & data memory – 256 KB SBSRAM, 8 MB SDRAM (external) u C Compiler ver 3.0 from Code Generation Tools u Code Composer ver 4.02 for profiling the code
TI DSPS FEST 1999 DSP Implementation: Channel Estimation u Floating point implementation found more feasible due to matrix inversions and square-roots. u Code optimized for the DSP u Use of Specialized approximate instructions – Approximate reciprocal square roots – Approximate reciprocals u Use of Assembly Code for critical part. – TI's C67 floating point benchmarks for Matrix- Vector Multiplication & Dot Product u Data Memory requirements for Channel Estimation
TI DSPS FEST 1999 Use of Approximate Instructions L = 150, P =3, N= 31, SNR = 5dB, SINR = -10 dB Number of users --> Execution time(in milliseconds) --> Use of specialized instructions and assembly code on C6701 DSP C6701: Original C6701: with Intrinsics C6701: with Assembly 10% improvement 100% improvement
TI DSPS FEST 1999 Optimization Effects for Channel Estimation Effect of optimizations for Channel Estimation on C6701--> Execution time(normalized) --> Base (-o3 -pm) Approx. (-o3 -pm with intrinsics) Assembly opt. (-o3 -pm with asm) 2.34X improvement 1.08X improvement
TI DSPS FEST 1999 Data Memory Requirements Data to be placed in External memory 130 6
TI DSPS FEST 1999 DSP Implementation: Multistage Detection u 16-bit Fixed Point C Code u Code optimized for the DSP u Use of Assembly Code for critical part – TI's C62 fixed point assembly benchmarks for Dot Product u Data memory requirements for Multistage Detection
TI DSPS FEST 1999 Optimization Effects for Multistage Detector Effect of optimizations for Multistage Detection on C > Execution time(normalized) --> Global opt. (-o3 -pm -mu) Software Pipelining (-o3 -pm) Assembly opt. (-o3 -pm with asm) 5.22X improvement 7.47X improvement
TI DSPS FEST 1999 Data Memory Requirements Data can be placed completely in Internal memory
TI DSPS FEST 1999 Flops Count x 10 4 Total Number of Iterations Number of Flops Users:K=15 SNR=6dB Conventional Method Differencing Method conventional differencing 2X speedup for a three-stage detector
TI DSPS FEST 1999 Real-Time Requirements Real-Time capability by C6201 DSP NUMBER OF USERS MAX BIT RATE PER USER (kb/s) SNR=10dB WindowSize=12 Conventional Method Differencing Method 12users 150kb/s
TI DSPS FEST 1999 Trends in Recent DSPs u More internal memory and higher clock speeds – C6203 : 512 KB data, 384 KB program, 250 MHz – useful for uplink channel estimation algorithms. u Specialized Blocks in the DSP Core. – Viterbi decoding in C54. u Lower Voltage operation – 1.2 V in C5402, useful for saving power consumption in the mobile.
TI DSPS FEST 1999 Conclusions u Implementation issues : Estimation & Detection Algorithms u Channel Estimation - Floating Point / External Memory u Multistage Detection - Fixed Point / Internal Memory u Specialized instructions : square root/reciprocals. u Additional support for complex arithmetic useful. u Recent trends in DSPs highly encouraging for next generation wireless communication applications.
TI DSPS FEST 1999 Future Work u Effect of caches & DMA Controllers – C6211 ( 4KB each L1 Program & Data, 64 KB L2) u DSP implementations for W-CDMA uplink and downlink – Blind Algorithms – Adaptive Algorithms u Architectural bottlenecks and compiler issues in DSPs to enhance suitability for next generation W-CDMA systems.