Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro

Similar presentations


Presentation on theme: "Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro"— Presentation transcript:

1 Task partitioning wireless base-station receiver algorithms on multiple DSPs and FPGAs
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro Rice University This work is supported by Nokia, TI, TATP and NSF

2 Motivation Build wireless multimedia communication systems
Kbps to Mbps Sophisticated algorithms - exponential complexity Approaches: Sub-optimal algorithms - O(n2,n3) complexity Better hardware implementations needed

3 Hardware implementations
DSP - programmable ASIC - customized hardware FPGA - programmable ASICs Single DSP - too slow Need flexibility - for different protocols speed - to meet real-time Multiple DSP-FPGA solution investigated

4 Contributions Efficient task-partitioning multiuser estimation and detection algorithms on fixed hardware maximize performance, minimize overhead 1.19X- 5.92X speedup with 2 DSPs. additional processing power and internal memory Use of FPGAs to accelerate multiuser detection Multiple DSP-FPGA to meet real-time requirements

5 Outline Introduction Multiprocessor system at Rice
Single and multiprocessor simulations FPGAs for acceleration Summary

6 Multiuser estimation and detection
noise + interference Base-station Direct Reflections User 1 User 2 Jointly estimate attenuations, fading and delays Jointly detect data of all users

7 Benefits of multiuser estimation & detection
2 4 6 8 10 12 14 16 -4 -3 -2 -1 Error rate vs. SNR SNR (in dB) Bit error rate Single-user (channel estimation + detection) Multi-user estimation+ Single-user detection Multi-user (channel estimation + detection)

8 Base-station receiver
Antenna Multiuser detection Decoding Information bits Multiuser channel estimation Training Tracking

9 Sub-optimal estimation and detection
Maximum likelihood estimation O( User2 * spreading gain ) avoids matrix inversion by an iterative scheme Multi-user detection with interference cancellation Single user detector (code matched filter) O( User * spreading gain ) 3 Stages of parallel interference cancellation O( User2 )

10 Outline Introduction Multiprocessor system at Rice
Single and multiprocessor simulations FPGAs for acceleration Summary

11 Multiprocessor implementations
Single DSP - too slow Multiple DSPs - communication overhead Partition estimation and detection on different DSPs Narrow communication link Maximize performance Data rates dependent only on detection

12 Multiprocessor system at Rice
Prototype multiprocessor board from Sundance Inc. Two TI C67 DSPs and two Xilinx 300K gate FPGAs Inter-processor communication at 20 MBps Host DSP Multiuser estimation detection Received bits Detected bits PC FPGA2 FPGA1 DSP1 DSP2

13 Outline Introduction Multiprocessor system at Rice
Single and multiprocessor simulations FPGAs for acceleration Summary

14 Base case implementation
Single DSP Multiuser estimation 10X-50X slower than multiuser detection Different algorithm complexity Multiuser detection in internal memory (64 KB) Multiuser estimation in internal and off-chip memory

15 Base case simulation Execution time (in seconds) Users 5 10 15 20 25
5 10 15 20 25 30 35 -6 -5 -4 -3 -2 -1 Execution time (in seconds) Users Multi-user estimation Single-user estimation Multi-user detection Single-user detection

16 Dual DSP implementation
Both estimation and detection now in internal memory 2X X speedup in estimation (DSP1 vs. DSP) No change in detection performance Estimation still 3X slower than detection Inter-processor communication overhead O( users * spreading gain ) = KB overhead

17 Dual DSP simulations Execution time (in seconds) Users 5 10 15 20 25
5 10 15 20 25 30 35 -6 -5 -4 -3 -2 -1 Execution time (in seconds) Users Multi-user estimation - DSP Comm. overhead DSP1 - DSP2 Multi-user estimation - DSP1 Multi-user detection - DSP Multi-user detection - DSP2

18 Balancing division of tasks
Unbalanced task division Estimation 3X slower than detection Huge communication overhead > estimation, detection Data rates dependent only on detection Update channel estimates less frequently reasonable for slow fading channels (indoor environments)

19 Frequency of estimation updates
Can update more frequently with more users Once every 48 bits - single user Once every 9 bits - 32 users Relatively larger overhead for fewer users Estimation, detection = O( User2 ) Communication overhead = O( User )

20 Frequency of channel estimate updates
5 10 15 20 25 30 35 40 45 50 Users Frequency of estimation updates ( 1 in 'x' bits)

21 Outline Introduction Multiprocessor system at Rice
Single and multiprocessor simulations FPGAs for acceleration Summary

22 Limitations of DSP implementations
Further acceleration needed for real-time performance Matrix based massively parallel algorithms Detection of bits {+1,-1} : bit - level operations DSPs Bit multiplications not needed - (add/subtract on FPGA) Bit storage not convenient Not fully able to exploit parallelism

23 FPGAs for acceleration
Flexibility of ASICs Good for parallelism and bit-level operations Code matched filter detector Multiuser estimation PIC (Stage 1) PIC (Stage 2) Received bits Detected bits DSP2 DSP1 FPGA1 FPGA2

24 Multiprocessor simulations
5 10 15 20 25 30 35 -6 -5 -4 -3 -2 Execution time (in seconds) Users Single DSP implementation 2 DSP implementation Target data rate Kbps/user 2 DSPs + 2 FPGAs

25 Multiprocessor advantages
1.19X X speedup using 2 DSPs Up to 50X acceleration possible by task balancing with additional FPGAs DSP - FPGA communication overhead Just 2 DSPs and 2 FPGAs can meet 128 Kbps/user real-time requirements for up to 7 users

26 Outline Introduction Multiprocessor system at Rice
Single and multiprocessor simulations FPGAs for acceleration Future work and summary

27 Future work DSP - FPGA communication overhead
Transferring KBs of data into FPGAs Implementation of channel decoding Complete real-time system

28 Summary Efficient task-partitioning multiuser estimation and detection algorithms on fixed hardware maximize performance, minimize overhead 1.19X- 5.92X speedup with 2 DSPs. additional processing power and internal memory Use of FPGAs to accelerate multiuser detection Multiple DSP-FPGA to meet real-time requirements


Download ppt "Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro"

Similar presentations


Ads by Google