Download presentation
Presentation is loading. Please wait.
Published byBlanche Crawford Modified over 9 years ago
1
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported by Nokia, TI, TATP and NSF
2
Motivation Build wireless multimedia communication systems - Kbps to Mbps Sophisticated algorithms - exponential complexity Approaches: Sub-optimal algorithms - O(n 2,n 3 ) complexity Better hardware implementations needed
3
Contributions Develop algorithms suitable for implementation Bit-level extensions to microprocessors Pipelining to reduce latency and memory On-line arithmetic for Most Significant Digit First Computations.
4
Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary
5
Communication System - Physical layer Transmitter Antenna Information bits (from higher layers) Coding Spreading RF unit D/A Digital Analog +1
6
Multipath reflections, attenuations, noise, multiple user interference Communication System - Physical layer Channel
7
Channel estimation DetectionDecoding Antenna Information bits (to higher layers) RF unit A/D Digital Communication System - Physical layer Receiver Analog +1
8
Questions Higher data rates => sophisticated algorithms => strain on hardware => lower data rates 1.Which is the best algorithm to use for implementation? 2.How to best do the digital part? - VLSI, DSP, FPGA, microprocessor - combination of these?
9
Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary
10
Multiuser Channel Estimation Algorithm = {+1, -1} : Training/Tracking bits = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed,e.g: 32) K = number of users (variable, <=N) = Maximum Likelihood channel estimate
11
Iterative hardware-efficient scheme Bit-streaming : suitable for tracking (window length L) Method of gradient descent Stable convergence behavior Simple fixed-point VLSI architecture
12
Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15
13
Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary
14
Multiuser interference r i-2 r i-1 riri r i+1 Interference from previous bits of other users Interference from future bits of other users Desired User User 1 User j riri bibi b i+1 time
15
Block Based Detector 1 12 11 22 Matched Filter Stage 1 Stage 2 Stage 3 Matched Filter Stage 1 Stage 2 Stage 3 Bits 2-11 Bits 12-21
16
Detection Iterate for convergence Matched filter
17
Pipelined detection scheme r i-2 r i-1 riri r i+1 Interference from previous bits of other users Interference from future bits of other users Desired User User 1 User j riri bibi b i+1 time
18
Pipelined Detector 1 2 3 4 5 6 7 8 9 10 11 12 Matched Filter Stage 1 Stage 2 Stage 3 1 2 3 4 5 6 7 8 9 10 11 12
19
Chip being built as part of the Elec 422 VLSI course project
20
Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary
21
On-line arithmetic Sign of dot-product computations High precision operations done to find the sign Can be avoided with Most Significant Digit First computation using redundant number systems
22
Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary
23
DSP/microprocessor implementations Further acceleration needed for real-time performance Matrix based massively parallel algorithms Detection of bits {+1,-1} : bit - level operations DSPs Bit multiplications not needed - (add/subtract on FPGA) Bit storage not convenient Not fully able to exploit parallelism
24
FPGAs for acceleration Flexibility of ASICs Good for parallelism and bit-level operations Code matched filter detector Multiuser estimation PIC (Stage 1) PIC (Stage 2) Received bits Detected bits DSP2 DSP1 FPGA1FPGA2
25
Multiprocessor simulations
26
Instruction Set Extensions To accelerate Bit level computations in Wireless Real/Complex Integer - Bit Multiplications Used in Multiuser Detection, Decoding Bit - Bit Multiplications Used in Outer Product Updates Correlation, Channel Estimation Complex Integer-Integer Multiplications Useful in other Signal Processing applications Speech, Video,,,
27
SIMD Parallelism 64-bit Register A ++ 64-bit Register C 8 8 8 64-bit Register B x
28
Integer - Bit Multiplications 64-bit Register C[j] For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j] (Cross-Correlation) 64-bit Register D[i][j] +/- 64-bit Register D[i][j] 8-bit Control Register b[i] 8 8 8
29
Computational Savings Avoid bit multiplications and control structures 4 8-bit Multiply -Latency 3 cycles 8 8-bit Add -Latency 1 cycle Cross-Correlation Example 64 multiply, 64 add
30
Bit-Bit Multiplications D = D + b*b T Eg: Auto-Correlation 64-bit Register A = b164-bit Register B=b2 XNOR b1*b2 Bit-Bit Multiplications 64-bit Register C=b1*b2
31
8-bit to 64-bit conversions D = D + b*b T Eg: Auto-Correlation b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) b(1)..b(8)b(1) b(8) b(1)..b(8) b(1)b(2) b(8)b(7) b(8) 8-bit Register b64-bit Register A 1.1 1.2 2.1
32
Increment/Decrement 64-bit Register D +/- 64-bit Register (D+b1*b2) 8-bit Register b1*b2 1 D = D + b*b T Eg: Auto-Correlation
33
Truncated Multipliers Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Truncated Multipliers - half the area/half the delay Can do 2 truncated multiplies in parallel with regular Multiplier 1Multiplier 2 Truncated Multiplier ALU Multipliers
34
Open Questions VLIW simulator?? Showing performance improvement, for different algorithms Compiler and software support
35
Outline Advanced communication systems Algorithms for efficient implementation Pipelining On-line arithmetic Bit-level extensions to microprocessors Summary
36
Conclusions Data rates for advanced communication systems, limited by hardware, not by algorithms Need to find efficient solutions to tackle this problem - Hardware-software co-design Presented my ways of attacking this problem
37
Future Work RENÉ: Single re-configurable hardware to switch between 2 communication standards Designing algorithms, conditioned on the availability of only finite precision http://www.ece.rice.edu/~sridhar/research.htm http://cmc.rice.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.