Presentation is loading. Please wait.

Presentation is loading. Please wait.

Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro

Similar presentations


Presentation on theme: "Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro"— Presentation transcript:

1 Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
Arithmetic Acceleration Techniques for Wireless Communication Receivers Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro Rice University This work is supported by Nokia, Texas Instruments, Texas Advanced Technology Program and NSF

2 Objective Next generation Wireless Base-station
Real-Time Requirements Multiuser Channel Estimation and Detection High Complexity Algorithms for Advanced Receiver Structures Task Decomposition Potential for parallelism Application-Specific Design / Single Processor

3 Outline Motivation Real-time Requirements
Joint Estimation and Detection Task Decomposition Results Summary

4 Motivation Next Generation Wireless Systems
Higher Data Rates , up to 2 Mbps Multimedia Capabilities Multi-rate, QoS High Complexity in Proposed Algorithms Pressure on existing hardware Time, power, size constraints Acceleration on Hardware Needed

5 Wireless Communication Uplink
Asynchronous CDMA System Multiple Users Channel Effects Fading Multiple paths Multiple Access Interference Direct Path Reflected Paths Noise +MAI User 1 User 2 Base Station

6 Base-Station Receiver
Multiple Users Channel Estimation Multiuser Detection Decoder Data Pilot Demod -ulator Antenna Decision Feedback MU X Detected Bits + Base-station Receiver Delay d b The Physical Layer

7 Real -Time Requirements
W-CDMA Transmission done by multiplication of signature waveform (Spreading) Data Transmission in 10 ms Frames Multiple Data Rates by Varying Spreading Factors Detection needs to be done in real-time 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128 Kbps

8 Joint Estimation and Detection
Algorithm to jointly estimate the channel response and detect all the user’s bits. Shown to have better performance as well as reduced computational complexity. Maximum Likelihood Based Channel Estimation [C.Sengupta et al. : PIMRC’1998 WCNC’1999] Differencing Multistage Detection based on Parallel Interference Cancellation [G.Xu et al. : SPIE’1999]

9 Computations Involved
delay Model Compute Correlation Matrices ri bi bi-1 time Bits of K async. users aligned at times I and I-1 Received bits of spreading length N for K users

10 Solve for the channel estimate, Ai
Multishot Detection Solve for the channel estimate, Ai Multishot Detection

11 Differencing Multistage Detection
Successive Stages S=diag(AHA) y - soft decision d - detected bits (hard decision)

12 Block Bi-Diagonal Matrix
Structure of AHA Block Bi-Diagonal Matrix

13 Bottlenecks Identify using C6x DSP Implementation Channel Estimation
Can be done less frequently Depends on BER needed Multiuser Detection Needs to be done all the time Differencing Multistage Less computations on successive stages Analysis on Various levels of Optimization for Detection

14 Correlation Matrices (Per Bit)
Task Decomposition Block I Block II Block III Task B Correlation Matrices (Per Bit) Inverse Matrix Products Block IV M U X d A0HA1 O(K2N) Multistage Detection (Per Window) Rbr[R] O(KN) RbbAH = Rbr[R] O(K2N) b A0HA0 O(K2N) Rbr[I] O(KN) Data’ M U X RbbAH = Rbr[I] O(K2N) d O(DK2Me) Rbb O(K2) A1HA1 O(K2N) Pilot AHr O(KND) Data Channel Estimation Multistage Detection Task A

15 Sequential / Pipeline A B
Task A Block IV AHr O(KND) d Data O(DK2Me) Real-time 1953 cycles,128 Kbps Task B 13272 cycles 3367*Me cycles (Single PE) Sequential : A+B: *Me : 10.7 Kbps (2 PE) Pipeline : A B : max(13272, 3367*Me) : 18.8 Kbps *Me =3

16 (K+1 PE) Parallel A B : 3367*Me : 24.75 Kbps
Block IV Task A AHr O(ND) 1 Data O(DK2Me) d K Task B Real-time 1953 cycles,128 Kbps 3367*Me cycles 885 cycles (K+1 PE) Parallel A B : 3367*Me : Kbps

17 Parallel A Pipeline B Parallel A Parallel + Pipeline B
Task A 1 K Task B Real-time 1953 cycles,128 Kbps 885 cycles O(N) 3367 cycles O(K2) 225 cycles O(K) (K +3 PE) Parallel A Pipeline B : : Kbps ((Me+1)K PE) Parallel A Parallel + Pipeline B : : Kbps

18 At this step Task A Task B Multistage Detection Block I &II 1 Data K
Stage Stage Stage3… Block IV Block III Task B

19 Achieved Data Rates 9 10 11 12 13 14 15 0.5 1 1.5 2 2.5 3 x 10 5 Number of Users Data Rates Data Rates for Different Levels of Pipelining and Parallelism (Parallel A) (Parallel+Pipe B) (Parallel A) (Pipe B) (Parallel A) B A B Sequential A + B Data Rate Requirement = 128 Kbps

20 Mapping to Hardware Analysis independent of hardware
DSP with coprocessors Multiple Processors Combination of a processor with ASIC/FPGA Single ASIC Minimize Idle time in processing elements Some computations can be shared Assumptions Critical processing elements have functional units similar to C6x No communication overhead between processors Number of elements dependent on number of users

21 Summary Acceleration Techniques for Multiuser Estimation and Detection : computationally intensive algorithm Task Decomposition C6x DSP Simulator Real-time Analysis Hardware Mapping Issues Application Specific Design more effective than a single processor solution

22 Future Work Fixed Point Implementation Matrix Oriented Architectures
LU Decomposition Other Algorithms for decomposition Matrix Oriented Architectures Vector Processor with SIMD 2 Levels of Parallelism Complex Arithmetic

23 DSP Implementation Texas Instruments C6x Simulator
TI TMS320C6701 Floating Point DSP Code and Program optimized to fit in internal memory 32 -bit VLIW Architecture 8 Functional Units 2 Multipliers 4 Adders 2 Load/Store TI C Compiler


Download ppt "Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro"

Similar presentations


Ads by Google