A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University {sridhar,cavallar}@rice.edu This work is supported by Nokia, TI, TATP and NSF
Multiuser detection Base-station noise Direct User 1 Reflections time amplitude Base-station noise Multiple access interference Direct User 1 Reflections User 2 Jointly detect data of all users
Benefits of multiuser detection 2 4 6 8 10 12 14 16 -4 -3 -2 -1 Error rate vs. SNR SNR (in dB) Bit error rate Single-user (channel estimation + detection) Multi-user estimation+ Single-user detection Multi-user (channel estimation + detection)
Motivation Unable to meet real-time requirements (3GPP) - 128 Kbps for 32 users with spreading = 32 chips/bit Challenges: -large complexity -block based algorithms (latency) Implement multiuser detection for 3G wireless CDMA base-station receivers
Contributions Developed a simple architecture for asynchronous multiuser detection for CDMA [ + , x ] Bit-streaming - reduced latency - eliminates window edge computations - lower memory requirements Pipelined stages - higher throughput (with more hardware) Real-time implementation for multiuser detection now possible for 3GPP!
Asynchronous multiuser interference Interference due to past, current and future bits of other users Delay I-1 I Interference from future bits of other users d1 desired user I I+1 Interference from previous bits of other users dk I I+1 dj Received Signal ri-1 ri ri+1 ri+2 TIME
Multistage Parallel Interference Cancellation (PIC) Received Signal r1...rD Channel Estimate B = AHA-diag(AHA) Channel Estimate A = [A0 A1] I(D) PIC Stage 1 MF Conventional code matched filter Delay (D) Stage 2 Delay (D) Stage 3 Detected bits
Block Pipelined Detector TIME 1 MF 12 11 MF 22 1 PIC 12 11 PIC 22 1 PIC 12 11 PIC 22 1 PIC 12 11 PIC 22 Bits 2-11 Bits 12-21 Latency - variable [Worst case (1st bit) D*latency] 2 extra edge bit computations per stage.
Bit-streaming the multiuser detection algorithm Tri- diagonal Block Toeplitz matrix B [KD * KD] D- detection window length Savings in memory by D2
Pipelining the multiuser detector Matched Filter (causal) PIC - Stage 1 PIC - Stage 2 PIC - Stage 3 TIME
Pipelined architecture for multiuser detection
FPGAs for pipelining DSPs not suitable for exploiting bit-level parallelism FPGAs - Flexibility of ASICs Good for parallelism and bit-level operations Received bits DSP [x] FPGA1 [+] FPGA2 [+] FPGA3 [+] MF PIC (Stage 1) PIC (Stage 2) PIC (Stage 3) Detected bits
Performance Comparisons 5 10 15 20 25 30 35 -6 -5 -4 -3 -2 Execution Time (in seconds) Users 1 DSP Implementation Target Data Rate - 128 Kbps MF on 1 DSP + PIC on 3FPGAs MF on K DSPs + PIC on 3FPGAs tMF = O(K) tPIC = O(K2) tMF tPIC
Summary Simple, bit-streaming pipelined multiuser detector Avoids block computations -Savings in memory by D2 No edge bit computations in a window - 2/D computational savings per stage Lower constant latency by D. Leads to a Real-time DSP implementation for 3GPP.
Prototype chip built @ Rice Number of users supported: 4 Area available: 3000x3000 inside the pad frame Area used: ~85% CMOS micron process: 0.5 micron Chip speed: 2Mbps http://www.owlnet.rice.edu/~sunbeam/422/
Multistage Parallel Interference Cancellation (PIC) Conventional code matched filter Parallel Interference Cancellation (PIC) Stages Received bits MF Multiuser estimation PIC (Stage 1) PIC (Stage 2) Detected bits PIC (Stage 3)
Structure of the B Matrix Tri- diagonal Block Toeplitz matrix B [KD * KD] D- detection window length Previous Work: Make the block Toeplitz matrix circulant S. Das, J. R. Cavallaro, and B. Aazhang. Computationally Efficient Multiuser Detectors PIMRC 1997