Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.

Similar presentations


Presentation on theme: "Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported."— Presentation transcript:

1 Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported by Nokia, TI, TATP and NSF

2 Motivation Build wireless multimedia communication systems - Kbps to Mbps Sophisticated algorithms - exponential complexity Approaches: Sub-optimal algorithms - O(n 2,n 3 ) complexity Better hardware implementations needed

3 Contributions Develop algorithms suitable for implementation Bit-level extensions to microprocessors Pipelining to reduce latency and memory On-line arithmetic for Most Significant Digit First Computations.

4 Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary

5 Communication System - Physical layer Transmitter Antenna Information bits (from higher layers) Coding Spreading RF unit D/A Digital Analog +1

6 Multipath reflections, attenuations, noise, multiple user interference Communication System - Physical layer Channel

7 Channel estimation DetectionDecoding Antenna Information bits (to higher layers) RF unit A/D Digital Communication System - Physical layer Receiver Analog +1

8 Questions Higher data rates => sophisticated algorithms => strain on hardware => lower data rates 1.Which is the best algorithm to use for implementation? 2.How to best do the digital part? - VLSI, DSP, FPGA, microprocessor - combination of these?

9 Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary

10 Multiuser Channel Estimation Algorithm = {+1, -1} : Training/Tracking bits = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed,e.g: 32) K = number of users (variable, <=N) = Maximum Likelihood channel estimate

11 Iterative hardware-efficient scheme Bit-streaming : suitable for tracking (window length L) Method of gradient descent Stable convergence behavior Simple fixed-point VLSI architecture

12 Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15

13 Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary

14 Multiuser interference r i-2 r i-1 riri r i+1 Interference from previous bits of other users Interference from future bits of other users Desired User User 1 User j riri bibi b i+1 time

15 Block Based Detector 1 12 11 22 Matched Filter Stage 1 Stage 2 Stage 3 Matched Filter Stage 1 Stage 2 Stage 3 Bits 2-11 Bits 12-21

16 Detection Iterate for convergence Matched filter

17 Pipelined detection scheme r i-2 r i-1 riri r i+1 Interference from previous bits of other users Interference from future bits of other users Desired User User 1 User j riri bibi b i+1 time

18 Pipelined Detector 1 2 3 4 5 6 7 8 9 10 11 12 Matched Filter Stage 1 Stage 2 Stage 3 1 2 3 4 5 6 7 8 9 10 11 12

19 Chip being built as part of the Elec 422 VLSI course project

20 Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary

21 On-line arithmetic Sign of dot-product computations High precision operations done to find the sign Can be avoided with Most Significant Digit First computation using redundant number systems

22 Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary

23 DSP/microprocessor implementations Further acceleration needed for real-time performance Matrix based massively parallel algorithms Detection of bits {+1,-1} : bit - level operations DSPs Bit multiplications not needed - (add/subtract on FPGA) Bit storage not convenient Not fully able to exploit parallelism

24 FPGAs for acceleration Flexibility of ASICs Good for parallelism and bit-level operations Code matched filter detector Multiuser estimation PIC (Stage 1) PIC (Stage 2) Received bits Detected bits DSP2 DSP1 FPGA1FPGA2

25 Multiprocessor simulations

26 Instruction Set Extensions To accelerate Bit level computations in Wireless Real/Complex Integer - Bit Multiplications Used in Multiuser Detection, Decoding Bit - Bit Multiplications Used in Outer Product Updates Correlation, Channel Estimation Complex Integer-Integer Multiplications Useful in other Signal Processing applications Speech, Video,,,

27 SIMD Parallelism 64-bit Register A ++ 64-bit Register C 8 8 8 64-bit Register B x

28 Integer - Bit Multiplications 64-bit Register C[j] For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j] (Cross-Correlation) 64-bit Register D[i][j] +/- 64-bit Register D[i][j] 8-bit Control Register b[i] 8 8 8

29 Computational Savings Avoid bit multiplications and control structures 4 8-bit Multiply -Latency 3 cycles 8 8-bit Add -Latency 1 cycle Cross-Correlation Example 64 multiply, 64 add

30 Bit-Bit Multiplications D = D + b*b T Eg: Auto-Correlation 64-bit Register A = b164-bit Register B=b2 XNOR b1*b2 Bit-Bit Multiplications 64-bit Register C=b1*b2

31 8-bit to 64-bit conversions D = D + b*b T Eg: Auto-Correlation b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) b(1)..b(8)b(1) b(8) b(1)..b(8) b(1)b(2) b(8)b(7) b(8) 8-bit Register b64-bit Register A 1.1 1.2 2.1

32 Increment/Decrement 64-bit Register D +/- 64-bit Register (D+b1*b2) 8-bit Register b1*b2 1 D = D + b*b T Eg: Auto-Correlation

33 Truncated Multipliers Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Truncated Multipliers - half the area/half the delay Can do 2 truncated multiplies in parallel with regular Multiplier 1Multiplier 2 Truncated Multiplier ALU Multipliers

34 Open Questions VLIW simulator?? Showing performance improvement, for different algorithms Compiler and software support

35 Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary

36 Conclusions Data rates for advanced communication systems, limited by hardware, not by algorithms Need to find efficient solutions to tackle this problem - Hardware-software co-design Presented my ways of attacking this problem

37 Future Work RENÉ: Single re-configurable hardware to switch between 2 communication standards Designing algorithms, conditioned on the availability of only finite precision http://www.ece.rice.edu/~sridhar/research.htm http://cmc.rice.edu


Download ppt "Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported."

Similar presentations


Ads by Google