A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University {sridhar,cavallar}@rice.edu This work is supported by Nokia, TI, TATP and NSF
Motivation Implementing Multiuser Detection for 3G wireless systems at the base-station Challenges: -large complexity -block based algorithms (latency) Unable to meet real-time requirements (3GPP)
Contributions Developed a simple architecture for asynchronous multiuser detection [ + , x ] Bit-streaming - reduced latency - no window edge computations - lower memory requirements Pipelined stages - higher throughput (with more hardware) DSP-based implementation closer to real-time
Multiuser detection noise + interference Base-station Direct Reflections User 1 User 2 Jointly detect data of all users
Benefits of multiuser detection 2 4 6 8 10 12 14 16 -4 -3 -2 -1 Error rate vs. SNR SNR (in dB) Bit error rate Single-user (channel estimation + detection) Multi-user estimation+ Single-user detection Multi-user (channel estimation + detection)
Asynchronous multiuser interference Interference due to past, current and future bits of other users Delay I-1 I Interference from future bits of other users b1, i-i Desired user I I+1 Interference from previous bits of other users bk, i I I+1 bj, i+1 ri-1 ri ri+1 ri+2
Multistage Parallel Interference Cancellation (PIC) Conventional Code Matched filter: A- channel estimates y - soft decision d - detected bits Iterate for convergence (PIC) S=diag(AHA)
Multistage Parallel Interference Cancellation (PIC) Tri- diagonal Block Toeplitz matrix [KD * KD] D- detection window length Previous Work: Make the block Toeplitz matrix circulant S. Das, J. R. Cavallaro, and B. Aazhang. Computationally Efficient Multiuser Detectors PIMRC1997
Block Based Detector 2 extra edge bit computations per stage. Latency - variable [Worst case (1st bit) D*latency] 1 MF 12 1 PIC1 12 1 PIC3 12 1 PIC2 12 Bits 2-11 TIME 11 MF 22 11 PIC1 22 11 PIC3 22 11 PIC2 22 Bits 12-21 TIME
Bit-streaming the multiuser detection algorithm Savings in memory by D2
Pipelining the multiuser detector Matched Filter (causal) 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 PIC - Stage 1 1 2 3 4 5 6 7 8 9 10 11 12 PIC - Stage 2 1 2 3 4 5 6 7 8 9 10 11 12 PIC - Stage 3 TIME
Pipelined architecture for multiuser detection
Code matched filter detector FPGAs for pipelining Flexibility of ASICs Good for parallelism and bit-level operations DSP FPGA1 FPGA2 FPGA3 Code matched filter detector PIC (Stage 1) PIC (Stage 2) PIC (Stage 3) Received bits Multiuser estimation Detected bits
DSP simulations Execution time (in seconds) Users 5 10 15 20 25 30 35 5 10 15 20 25 30 35 -6 -5 -4 -3 -2 Execution time (in seconds) Users DSP implementation Target data rate - 128 Kbps/user DSP- MF + FPGAs - PIC
Summary Simple, bit-streaming pipelined multiuser detector Avoids block computations -Savings in memory by D2 No edge bit computations in a window - 2/D computational savings per stage Lower constant latency by D. Can achieve real-time for up to 7 users
Test chip built as part of a VLSI course project Number of users supported: 4 Area available: 3000x3000 inside the pad frame Area used: ~85% CMOS micron process: 0.5 micron Chip speed: 2Mbps http://www.owlnet.rice.edu/~sunbeam/422/
11 MF 22 11 PIC1 22 11 PIC3 22 11 PIC2 22 Bits 12-21