DSP Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 10, 2000 This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF
Overview Future Base-Stations Current DSP Implementation Our Approach Make Algorithms Computationally effective Task Partitioning for pipelining, parallelism DSP/ARM Extensions for Performance Acceleration TI Meeting 4/10/00
Evolution of Wireless Comm First Generation Voice Second/Current Generation Voice + Low-rate Data (9.6Kbps) Third Generation + Voice + High-rate Data (2 Mbps) + Multimedia W-CDMA TI Meeting 4/10/00
Communication System Uplink Direct Path Reflected Paths Noise +MAI User 1 User 2 Base Station TI Meeting 4/10/00
Main Processing Blocks Channel Estimation Detection Decoding Baseband Layer of Base-Station Receiver TI Meeting 4/10/00
No Multiuser Detection Proposed Base-Station TI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm) TI Meeting 4/10/00
Real -Time Requirements Multiple Data Rates by Varying Spreading Factors Detection needs to be done in real-time 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128 Kbps TI Meeting 4/10/00
Current DSP Implementation 9 10 11 12 13 14 15 2 4 6 8 16 18 x 10 Number of Users Data Rates Achieved Data Rate Comparisons for Matched Filter and Multiuser Detector Multiuser Detector(C67) Matched Filter(C67) Multiuser Detector(C64)* Matched Filter(C64)* Targeted Data Rate Targeted Data Rate = 128Kbps C67 at 166MHz Projected (8x) TI Meeting 4/10/00
Complexity Algorithm Choice Limited by Complexity Main Features Multistage reduces data rate by half. Main Features Matrix based operations High levels of parallelism Bit level computations 32x32 problem size for the Detector shown Estimation, Decoding assumed pipelined. TI Meeting 4/10/00
Reasons Sophisticated, Compute-Intensive Algorithms Need more MIPs/FLOPs performance Unable to fully exploit pipelining or parallelism Bit - level computations / Storage TI Meeting 4/10/00
Our Approach Make algorithms computationally effective without sacrificing error rate performance Task Partitioning on Multiple Processing Elements DSPs : Core FPGAs : Application Specific / Bit-level Computations VLSI Implementation to find extensions for DSPs. TI Meeting 4/10/00
Algorithms Channel Estimation Detection Avoid inversion by iterative scheme Detection Avoid block-based detection by pipelining TI Meeting 4/10/00
Computations Involved delay Model Compute Correlation Matrices ri bi bi+1 time Bits of K async. users aligned at times I and I-1 Received bits of spreading length N for K users TI Meeting 4/10/00
Solve for the channel estimate, Ai Multishot Detection Solve for the channel estimate, Ai Multishot Detection TI Meeting 4/10/00
Differencing Multistage Detection Stage 0- Matched Filter Stage 1 Successive Stages S=diag(AHA) y - soft decision d - detected bits (hard decision) TI Meeting 4/10/00
Iterative Scheme Tracking Method of Steepest Descent Stable convergence behavior Same Performance TI Meeting 4/10/00
Simulations - AWGN Channel 4 5 6 7 8 9 10 11 12 -3 -2 -1 Comparison of Bit Error Rates (BER) Signal to Noise Ratio (SNR) BER MF ActMF ML ActML O(K2N) O(K3+K2N) Detection Window = 12 SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15 10000 bits/user MF – Matched Filter ML- Maximum Likelihood ACT – using inversion TI Meeting 4/10/00
Block Based Detector Matched Filter 1 12 Stage 1 Stage 2 Stage 3 1 12 11 22 Matched Filter Stage 1 Stage 2 Stage 3 Bits 2-11 Bits 12-21 TI Meeting 4/10/00
Pipelined Detector 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Matched Filter 1 2 3 4 5 6 7 8 9 10 11 12 Stage 1 1 2 3 4 5 6 7 8 9 10 11 12 Stage 2 Stage 3 1 2 3 4 5 6 7 8 9 10 11 12 TI Meeting 4/10/00
Task Decomposition [Asilomar99] Block I Block II Block III Multistage Detector Correlation Matrices (Per Bit) Inverse Matrix Products Block IV M U X d A0HA1 O(K2N) Multistage Detection (Per Window) Rbr[R] O(KN) RbbAH = Rbr[R] O(K2N) b A0HA0 O(K2N) Rbr[I] O(KN) Data’ M U X RbbAH = Rbr[I] O(K2N) d O(DK2Me) Rbb O(K2) A1HA1 O(K2N) Pilot AHr O(KND) Data Channel Estimation Matched Filter TI Meeting 4/10/00
Achieved Data Rates 9 10 11 12 13 14 15 0.5 1 1.5 2 2.5 3 x 10 5 Number of Users Data Rates Data Rates for Different Levels of Pipelining and Parallelism (Parallel A) (Parallel+Pipe B) (Parallel A) (Pipe B) (Parallel A) B A B Sequential A + B Data Rate Requirement = 128 Kbps TI Meeting 4/10/00
VLSI Implementation Channel Estimation as a Case Study Area - Time Efficient Architecture Real - Time Implementation Bit- Level Computations - FPGAs Core Operations - DSPs TI Meeting 4/10/00
DSP Extensions for Performance Bit-level storage / processing support Registers / Memory / ALU Efficient Matrix -Based operations Matrix- Vector Multiply Support for Complex-valued data Efficient memory accesses Pre-fetching Data - C64 TI Meeting 4/10/00
(Converts Frames to Bits) Use of ARM Core Work on Higher Base Station Layers User Interface Translation Synchronization Transport Network OSI Layers 3-7 Data Link Layer (Converts Frames to Bits) Layer 2 Physical Layer (hardware; raw bit stream) 1 ARM DSP TI Meeting 4/10/00
Software Suggestions Limited OS Support Compiler Efficiency No more Assembly! Performance Analysis Tools Code Composer Studio 1.2 TI Meeting 4/10/00
Conclusions DSPs to play major role in Future Base-Station Implementations. Search for Computationally Efficient Algorithms and Better Processor Designs to meet Real-Time TI Meeting 4/10/00