Download presentation
Presentation is loading. Please wait.
Published byMegan Bates Modified over 9 years ago
1
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF
2
Introduction A real-time VLSI architecture for channel estimation Usually neglected, but high computational complexity Current DSP solutions do not meet real-time Iterative fixed point algorithm developed Area-Time Tradeoffs discussed –Area-Constrained (Pico-cells) –Time-Constrained (Theoretical Data Rates) –Area-Time efficient (Real-Time Solution)
3
Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions
4
Evolution of mobile communications First generation Voice Second/Current generation Voice + Low-rate data (9.6Kbps) Third generation + Voice + High-rate data (2 Mbps/384 Kbps/128 Kbps) + multimedia
5
Channel estimation Direct Path Reflected Path Noise +MAI User 1 User 2 Base Station
6
Need for channel estimation To compensate for unknown fading amplitudes and asynchronous delays. Detector performance depends on accuracy of channel estimator
7
Computing channel estimates Computed by sending a training sequence of known bits to the receiver. When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. Importance usually neglected May exceed detector complexity
8
Baseband signal processing Base-Station Receiver Channel estimation DetectionDecoding Multiple Users Antenna Detected Bits Tracking Training
9
Implementation complexity Matrix inversions (size 32x32) per window Unable to meet real-time on DSPs [Asilomar’99] VLSI fixed-point architectures for matrix inversions –Precision problems Typically, simpler single-user sliding correlator structures used.
10
Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions
11
Iterative scheme for channel estimation Method of Gradient Descent Stable convergence behavior Same Performance Simpler Bit-Streaming Hardware Implementation
12
Simulations - Static multipath channel SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15
13
Fading channel with tracking Doppler = 10 Kmph
14
Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions
15
Area-Time Tradeoffs Design for 32 users (K) and spreading code (N) 32 Target Data Rate = 128 Kbps Low Power Issues ignored! Area-Constrained Architecture –Pico-cells ; lower data rates Time-Constrained Architecture –Maximum achieve-able data rates Area-Time Efficient Architecture –Real-Time with minimum area overhead
16
Task Decomposition IterateCorrelation Matrices (Per Bit) Pilot Bits Pilot MUXMUX Detected Bits Data MUXMUX A O(4K 2 N,8) R br O(2KN,8) R bb O(2K 2,8) TIME Channel Estimate to Detector b 0 (2K,1) Tracking Window r 0 (N,8) b(2K,1) r(N,8) L
17
Architecture Design: Auto-correlation b = {+1,-1} Multiplication is a XNOR operation Entire matrix can be updated sequentially or in parallel using XNOR gates Auto-correlation matrix implemented as an UP/DOWN counter(s)
18
Architecture Design: Cross-Correlation b = {+1,-1}, r = 8-bit integer vector (complex) Multiplications reduce to additions/subtractions Entire matrix (complex) can be updated sequentially or in parallel using 8-bit adders Cross-correlation matrix stored as RAM.
19
Architecture Design: Channel Estimate A = 8-bit integer matrix (complex) µ << 1 : Truncated Multiplication [Schulte’93] Matrix-matrix (real-complex) multiplication of integers Forms the bottleneck Can be done sequentially with a single multiplier or totally parallel or partially parallel Concentrate on multiplication for area-time tradeoffs!
20
Area-Constrained Architecture b0 b
21
Area-constrained Architecture: Hardware Requirements
22
Time-constrained Architecture b*b T b 0 *b 0 T b b0b0 MUX R br MUXMUX r r0r0 MUXMUX R bb A Mult Subtract >> Subtract 2K*1 K(2K-1)*1 2K 2 *8 2KN*16 2KN*8 2K*1 N*8 2KN*8 Channel Estimate
23
Auto-correlation Update in Parallel Rbb(i,j) Counter bb T (i,j) U/D# Rbb(i,i) Counter 1 U/D# Array of XNORs a·ba·ca·d b·cb·d c·d bcda b (2K) bb T (K*{2K-1}*1) R bb (2K 2 *8) Array of Counters
24
Cross-Correlation Update in Parallel bcda b (2K*1) r (N*8) R br (2KN*8) r(j) Rbr(i,j) Adder b(i) Add/ Sub# 88 1
25
Time-constrained Architecture: Hardware Requirements
26
Area-Time Efficient Architecture b*b T b 0 *b 0 T b b0b0 MUX MUXMUX r r0r0 Mult Subtract >> Subtract 2K*1 2K*8 1*16 1*8 1*1 1*8 N*8 1*8 R br Counters StoreLoad R bb A DEMUX MUX A new 1*8 Adder 1*8 2K*1 2K*8
27
Area-Time Efficient Architecture: Hardware Requirements
28
Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions
29
DSP Comparisons DSPs unable to exploit bit-level parallelism Inefficient storage of bits Replacing multiplications by additions/subtractions
30
Related Work: DSP Extensions 64-bit Register D[i][j] +/- 64-bit Register D[i][j] 8-bit Control Register b[i] 8 8 8 (Cross-Correlation) For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j]
31
Related Work: Online Arithmetic Multiuser Detection –Need to compute only the Sign Bit (Most Significant Digit ) –No back-conversion to conventional representation –complex-number representation possible –Integration with channel estimation also.
32
Related Work : DSP-FPGA solutions Multiple DSP-FPGA task partitioning Bit level parallelism on FPGAs Multiplications on DSPs. Sundance Multi-DSP System –2 TI C67 DSPs –2 Xilinx Virtex FPGAs –http://www.sundance.com
33
Conclusions Real-Time VLSI architecture for multiuser channel estimation Iterative fixed-point algorithm developed to avoid matrix inversions Area-Time Tradeoffs discussed –Area-Constrained (Pico-cells) –Time-Constrained (Data Rates) –Area-Time efficient (Real-Time) VLSI architectures better exploit bit-level computations and parallelism to meet real-time constraints than DSPs.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.