Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sridhar Rajagopal April 26, 2000

Similar presentations


Presentation on theme: "Sridhar Rajagopal April 26, 2000"— Presentation transcript:

1 Sridhar Rajagopal April 26, 2000
Baseband Architecture Design for Future Wireless Base-Station Receivers Sridhar Rajagopal April 26, 2000 This work is supported by Nokia, Texas Instruments, Texas Advanced Technology Program and NSF

2 Outline Background Multiuser Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs

3 Evolution of Wireless Communications
First Generation Voice Second/Current Generation Voice + Low-rate Data (9.6Kbps) Third Generation + Voice + High-rate Data (2 Mbps) + Multimedia W-CDMA

4 Communication System Uplink
Direct Path Reflected Paths Noise +MAI User 1 User 2 Base Station

5 Main Processing Blocks
Multiple Users Channel Estimation Multiuser Detection Decoder Data Pilot Demod -ulator Antenna Decision Feedback MU X Detected Bits + Base-station Receiver Delay d b Baseband Layer of Base-Station Receiver

6 Real -Time Requirements
Multiple Data Rates by Varying Spreading Factors Detection needs to be done in real-time 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128 Kbps

7 Outline Background Multiuser Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs

8 Bits of K async. users aligned at times I and I-1
Channel Model delay Compute Correlation Matrices ri bi bi+1 time Bits of K async. users aligned at times I and I-1 Received bits of spreading length N for K users

9 Solve for the channel estimate, Ai
Channel Estimation Solve for the channel estimate, Ai Multishot

10 Differencing Multistage Detection
Stage 0- Matched Filter Stage 1 Successive Stages S=diag(AHA) y - soft decision d - detected bits (hard decision)

11 Block Bi-Diagonal Matrix
Structure of AHA Block Bi-Diagonal Matrix

12 Outline Background Multiuser Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs

13 Current DSP Implementation
9 10 11 12 13 14 15 2 4 6 8 16 18 x 10 Number of Users Data Rates Achieved Data Rate Comparisons for Matched Filter and Multiuser Detector Multiuser Detector(C67) Matched Filter(C67) Multiuser Detector(C64)* Matched Filter(C64)* Targeted Data Rate Targeted Data Rate = 128Kbps C67 at 166MHz Projected (8x)

14 Reasons for Poor Performance
Sophisticated, Compute-Intensive Algorithms Need more MIPs/FLOPs performance Unable to fully exploit pipelining or parallelism Bit - level computations / Storage

15 Task Decomposition [Asilomar’99]
Correlation Matrices (Per Bit) Rbr[I] O(KN) b Pilot M U X d Data’ Rbr[R] Rbb O(K2) Block I Channel Estimation Inverse RbbAH = Rbr[I] O(K2N) = Rbr[R] Block II Matrix Products A0HA1 O(K2N) AHr O(KND) A1HA1 A0HA0 Data Block III Matched Filter Multistage Detection (Per Window) O(DK2Me) d Block IV Multistage Detector

16 Achieved Data Rates 5 9 10 11 12 13 14 15 0.5 1 1.5 2 2.5 3 x 10 Number of Users Data Rates Data Rates for Different Levels of Pipelining and Parallelism Data Rate Requirement = 128 Kbps

17 Task Partitioning Hardware Req.
O(K2) processing elements 1024 for K =32 Can meet Real-Time Not feasible in hardware

18 Outline Background Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs

19 Iterative Scheme for Estimation
Tracking Method of Gradient Descent Stable convergence behavior Symmetric, Positive Definite Rbb µ - MAI, SNR, Preamble length Same Performance

20 Simulations - AWGN Channel
4 5 6 7 8 9 10 11 12 -3 -2 -1 Comparison of Bit Error Rates (BER) Signal to Noise Ratio (SNR) BER MF ActMF ML ActML O(K2N) O(K3+K2N) Detection Window = 12 SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15 10000 bits/user MF – Matched Filter ML- Maximum Likelihood ACT – using inversion

21 Fading Channel with Tracking
Doppler = 10 Hz, 1000 Bits,15 users, 3 Paths 4 5 6 7 8 9 10 11 12 -3 -2 -1 SNR BER MF - Static MF - Tracking ML - Static ML - Tracking

22 Pre-computed Preamble
Preamble bits bi known at the receiver Reduces Complexity, if pre-computed.

23 Computational Savings in Estimation
Pre-computed Auto-correlation has large savings Can be used only for quasi-static channels and initial acquisition.

24 Detection

25 Pipelined Detection Scheme
ri bi bi+1 time bi-2 bi-1 bi bi+1 Interference from previous bits of other users future bits of other users Desired User User 1 User j

26 Block Based Detector Matched Filter 1 12 Stage 1 Stage 2 Stage 3
Matched Filter Stage 1 Stage 2 Stage 3 Bits 2-11 Bits 12-21

27 Pipelined Detector Matched Filter Stage 1 Stage 2 Stage 3

28 Computational Savings in Detection
Edge Bits are not computed Bit-streaming Simpler Hardware Structure 6K2 per Window Savings

29 Outline Background Real-Time Requirements
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs

30 VLSI Implementation [ASAP’2000]
Channel Estimation as a Case Study Area - Time Efficient Architecture Real - Time Implementation Minimum Area Overhead Bit- Level Computations - FPGAs Core Operations - DSPs

31 Area-Time Tradeoffs Area-Constrained Architecture
Pico-cells ; lower data rates Time-Constrained Architecture Maximum achieve-able data rates Area-Time Efficient Architecture Real-Time with minimum area overhead

32 Outline Background Channel Estimation and Detection
DSP Implementation and Task Partitioning Reduced Complexity Algorithms VLSI Architecture Architecture/Extensions for DSPs and GPPs

33 Motivation for Architecture
Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements

34 Characteristics of Wireless Algorithms
Massive Parallelism Bit-level Computations Matrix Based Operations Memory Intensive Complex-valued Data Approximate Computations

35 Why Reconfigurable Adapt algorithms to environment
Seamless and Continuous Data Processing during Handoffs Home Area Wireless LAN High Speed Office Wireless LAN Outdoor CDMA Cellular Network

36 Different Protocols MPEG-4, H.723 - Voice,Multimedia
Convolutional,Turbo - Channel Coding Source Coding Channel Coding Channel Decoding Source Multiuser Detection Estimation

37 A New Architecture Main Memory Cache Q Q Crossbar Real-Time I/O
Processor Core (GPP/DSP) Cache Q Q Crossbar Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Add-on PCMCIA Card Processor

38 Reconfigurable Support
Configuration Caches Recently Displaced Configurations (5 cycles) Can hold 4 full size Configurations Independent Execution

39 Permutation Based Interleaved Memory
High Memory Bandwidth Needed Stride-Insensitive Memory System for Matrices Randomizes access Multiple Banks Sustained Peak Throughput (95%)

40 Instruction Set Extensions
To accelerate Bit level computations in Wireless Integer - Bit Multiplications Multiuser Detection, Decoding, Cross Correlation Bit - Bit Multiplications Auto-Correlation, Channel Estimation Useful in other Signal Processing applications Speech, Video,,,

41 SIMD Parallelism 64-bit Register A 64-bit Register C 64-bit Register B
+ 64-bit Register C 8 64-bit Register B x

42 Integer - Bit Multiplications
64-bit Register C[j] 64-bit Register D[i][j] +/- 8-bit Control Register b[i] 8 For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j] (Cross-Correlation)

43 Computational Savings
Avoid bit multiplications and control structures 4 8-bit Multiply Latency 3 8 8-bit Add Latency 1 Cross-Correlation Example 64 multiply, 64 add

44 Truncated Multipliers
Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Half the area/half the delay Can do 2 truncated multiplies in parallel with regular Multiplier 1 Multiplier 2 Truncated Multiplier ALU Multipliers

45 Future Work Long Codes - Implementation Online Arithmetic
Multiprocessing on DSPs and FPGAs

46 Conclusions Architecture and Algorithms to meet real-time
Task Decomposition Real Time with Multiple Processing Elements Iterative Algorithms Reduce Complexity, Simpler Implementation VLSI Implementation Real-Time with minimum Area Overhead Architecture/Extensions to DSPs and GPPs


Download ppt "Sridhar Rajagopal April 26, 2000"

Similar presentations


Ads by Google