Download presentation
Presentation is loading. Please wait.
1
Sridhar Rajagopal April 26, 2000
Baseband Architecture Design for Future Wireless Base-Station Receivers Sridhar Rajagopal April 26, 2000 This work is supported by Nokia, Texas Instruments, Texas Advanced Technology Program and NSF
2
Outline Background Multiuser Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs
3
Evolution of Wireless Communications
First Generation Voice Second/Current Generation Voice + Low-rate Data (9.6Kbps) Third Generation + Voice + High-rate Data (2 Mbps) + Multimedia W-CDMA
4
Communication System Uplink
Direct Path Reflected Paths Noise +MAI User 1 User 2 Base Station
5
Main Processing Blocks
Multiple Users Channel Estimation Multiuser Detection Decoder Data Pilot Demod -ulator Antenna Decision Feedback MU X Detected Bits + Base-station Receiver Delay d b Baseband Layer of Base-Station Receiver
6
Real -Time Requirements
Multiple Data Rates by Varying Spreading Factors Detection needs to be done in real-time 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128 Kbps
7
Outline Background Multiuser Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs
8
Bits of K async. users aligned at times I and I-1
Channel Model delay Compute Correlation Matrices ri bi bi+1 time Bits of K async. users aligned at times I and I-1 Received bits of spreading length N for K users
9
Solve for the channel estimate, Ai
Channel Estimation Solve for the channel estimate, Ai Multishot
10
Differencing Multistage Detection
Stage 0- Matched Filter Stage 1 Successive Stages S=diag(AHA) y - soft decision d - detected bits (hard decision)
11
Block Bi-Diagonal Matrix
Structure of AHA Block Bi-Diagonal Matrix
12
Outline Background Multiuser Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs
13
Current DSP Implementation
9 10 11 12 13 14 15 2 4 6 8 16 18 x 10 Number of Users Data Rates Achieved Data Rate Comparisons for Matched Filter and Multiuser Detector Multiuser Detector(C67) Matched Filter(C67) Multiuser Detector(C64)* Matched Filter(C64)* Targeted Data Rate Targeted Data Rate = 128Kbps C67 at 166MHz Projected (8x)
14
Reasons for Poor Performance
Sophisticated, Compute-Intensive Algorithms Need more MIPs/FLOPs performance Unable to fully exploit pipelining or parallelism Bit - level computations / Storage
15
Task Decomposition [Asilomar’99]
Correlation Matrices (Per Bit) Rbr[I] O(KN) b Pilot M U X d Data’ Rbr[R] Rbb O(K2) Block I Channel Estimation Inverse RbbAH = Rbr[I] O(K2N) = Rbr[R] Block II Matrix Products A0HA1 O(K2N) AHr O(KND) A1HA1 A0HA0 Data Block III Matched Filter Multistage Detection (Per Window) O(DK2Me) d Block IV Multistage Detector
16
Achieved Data Rates 5 9 10 11 12 13 14 15 0.5 1 1.5 2 2.5 3 x 10 Number of Users Data Rates Data Rates for Different Levels of Pipelining and Parallelism Data Rate Requirement = 128 Kbps
17
Task Partitioning Hardware Req.
O(K2) processing elements 1024 for K =32 Can meet Real-Time Not feasible in hardware
18
Outline Background Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs
19
Iterative Scheme for Estimation
Tracking Method of Gradient Descent Stable convergence behavior Symmetric, Positive Definite Rbb µ - MAI, SNR, Preamble length Same Performance
20
Simulations - AWGN Channel
4 5 6 7 8 9 10 11 12 -3 -2 -1 Comparison of Bit Error Rates (BER) Signal to Noise Ratio (SNR) BER MF ActMF ML ActML O(K2N) O(K3+K2N) Detection Window = 12 SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15 10000 bits/user MF – Matched Filter ML- Maximum Likelihood ACT – using inversion
21
Fading Channel with Tracking
Doppler = 10 Hz, 1000 Bits,15 users, 3 Paths 4 5 6 7 8 9 10 11 12 -3 -2 -1 SNR BER MF - Static MF - Tracking ML - Static ML - Tracking
22
Pre-computed Preamble
Preamble bits bi known at the receiver Reduces Complexity, if pre-computed.
23
Computational Savings in Estimation
Pre-computed Auto-correlation has large savings Can be used only for quasi-static channels and initial acquisition.
24
Detection
25
Pipelined Detection Scheme
ri bi bi+1 time bi-2 bi-1 bi bi+1 Interference from previous bits of other users future bits of other users Desired User User 1 User j
26
Block Based Detector Matched Filter 1 12 Stage 1 Stage 2 Stage 3
Matched Filter Stage 1 Stage 2 Stage 3 Bits 2-11 Bits 12-21
27
Pipelined Detector Matched Filter Stage 1 Stage 2 Stage 3
28
Computational Savings in Detection
Edge Bits are not computed Bit-streaming Simpler Hardware Structure 6K2 per Window Savings
29
Outline Background Real-Time Requirements
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs
30
VLSI Implementation [ASAP’2000]
Channel Estimation as a Case Study Area - Time Efficient Architecture Real - Time Implementation Minimum Area Overhead Bit- Level Computations - FPGAs Core Operations - DSPs
31
Area-Time Tradeoffs Area-Constrained Architecture
Pico-cells ; lower data rates Time-Constrained Architecture Maximum achieve-able data rates Area-Time Efficient Architecture Real-Time with minimum area overhead
32
Outline Background Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs
33
Motivation for Architecture
Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements
34
Characteristics of Wireless Algorithms
Massive Parallelism Bit-level Computations Matrix Based Operations Memory Intensive Complex-valued Data Approximate Computations
35
Why Reconfigurable Adapt algorithms to environment
Seamless and Continuous Data Processing during Handoffs Home Area Wireless LAN High Speed Office Wireless LAN Outdoor CDMA Cellular Network
36
Different Protocols MPEG-4, H.723 - Voice,Multimedia
Convolutional,Turbo - Channel Coding Source Coding Channel Coding Channel Decoding Source Multiuser Detection Estimation
37
A New Architecture Main Memory Cache Q Q Crossbar Real-Time I/O
Processor Core (GPP/DSP) Cache Q Q Crossbar Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Add-on PCMCIA Card Processor
38
Reconfigurable Support
Configuration Caches Recently Displaced Configurations (5 cycles) Can hold 4 full size Configurations Independent Execution
39
Permutation Based Interleaved Memory
High Memory Bandwidth Needed Stride-Insensitive Memory System for Matrices Randomizes access Multiple Banks Sustained Peak Throughput (95%)
40
Instruction Set Extensions
To accelerate Bit level computations in Wireless Integer - Bit Multiplications Multiuser Detection, Decoding, Cross Correlation Bit - Bit Multiplications Auto-Correlation, Channel Estimation Useful in other Signal Processing applications Speech, Video,,,
41
SIMD Parallelism 64-bit Register A 64-bit Register C 64-bit Register B
+ 64-bit Register C 8 64-bit Register B x
42
Integer - Bit Multiplications
64-bit Register C[j] 64-bit Register D[i][j] +/- 8-bit Control Register b[i] 8 For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j] (Cross-Correlation)
43
Computational Savings
Avoid bit multiplications and control structures 4 8-bit Multiply Latency 3 8 8-bit Add Latency 1 Cross-Correlation Example 64 multiply, 64 add
44
Truncated Multipliers
Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Half the area/half the delay Can do 2 truncated multiplies in parallel with regular Multiplier 1 Multiplier 2 Truncated Multiplier ALU Multipliers
45
Future Work Long Codes - Implementation Online Arithmetic
Multiprocessing on DSPs and FPGAs
46
Conclusions Architecture and Algorithms to meet real-time
Task Decomposition Real Time with Multiple Processing Elements Iterative Algorithms Reduce Complexity, Simpler Implementation VLSI Implementation Real-Time with minimum Area Overhead Architecture/Extensions to DSPs and GPPs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.