Download presentation
Presentation is loading. Please wait.
1
DSPs in emerging wireless systems
2
Motivation Software solutions becoming important in the physical layer
Multi-standard systems Algorithms tailored to environment, SNR etc. Flexible parameters for spreading, coding Computations exceed real-time requirements by > 2 orders of magnitude in current generation DSPs
3
Current approaches HW/SW co-design Maximize programmability in DSPs
Complex tasks on co-processors TI C6416 Viterbi and Turbo co-processors How is this going to scale in 4G? Keep on adding co-processors??
4
Our approach DSP role restricted to controlling co-processors with increasing computational demands Final system as inflexible as traditional ASIC design Investigating Scalable Wireless Application-specific Processors (SWAPs) Identifying bottlenecks in architectures and identify gap w.r.t. ASICs. Investigate solutions to bridge gap
5
Scalable Wireless A-s Processors
Multi-cluster stream-based architecture based on “Imagine” media processor from Stanford Streaming processor because GPP architectures not good for media, wireless streaming processor shown to be good for media applications such as FFT and FIR. Media and communication algorithms similar Media architectures popular --> wireless architectures?
6
Scalable architectures
7
Programming model Kernels Streams Computation Communication
KERNEL example1(istream<int> a, istream<int> b, ostream<int> c) { loop_stream(a) { int ai, bi, ci; a >> ai; b >> bi; ci = ai * 2 + bi * 3; c << ci; } Streams Communication void main() { Stream<int> a(256); Stream<int> b(256); Stream<int> c(256); Stream<int> d(1024); ... example1(a, b, c); example2(c, d); }
8
Architecture evaluation
Benchmark kernels currently used: Matrix-vector multiplications, FFT, Viterbi Was fine in ASIC solutions Programmable architectures need to investigate interaction between the kernels May need to re-order data between the kernels
9
Rice Benchmark for wireless systems
Investigate chain of multi-user estimation, multiuser detection and Viterbi decoding algorithms
10
Bottlenecks in multi-cluster architectures
Packed data (subword parallelism) Not always good to pack data Matrix transposes (Interleaving) Doing in ALUs may be cheaper, lower power Cannot be avoided in packed matrices Viterbi shuffling of path metrics and survivor states using register exchange Register exchange needed for parallel computations
11
DSP comparisions Ideal DSP cluster 8 clusters estimation detection
1c2a2m fl 1c3a3m fx 8c3a3m 1 2 3 4 5 6 7 x 10 Number of cycles estimation detection decoding Ideal DSP cluster 8 clusters
12
Packing in multi-cluster architectures
Kernel (in,out) { half2 a; //packed a int p,q; in >> a; p = mul_low(a,a); q = mul_high(a,a); out << p << q; }
13
Matrix Transpose in Memory
14
Matrix Transpose in kernel
15
Data re-ordering for Viterbi
a. Trellis X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) X(7) X(9) X(11) X(13) X(15) b. Shuffled Trellis
16
Performance loss due to re-ordering data for parallelism
10 1 2 3 4 Number of clusters Frequency needed to attain real-time (in MHz) Viterbi decoding for rate 1/2 constraint 9 for 32 users at 128 Kbps each user Actual with communication overhead Ideal without communication overhead Speedup per cluster added = 0.5 due to parallelizing Viterbi trellis
17
Communication pattern
All data re-arrangement problems share a common communication pattern Odd-even permutation of the data Investigating solutions to solve the problem and bridge gap between multi-cluster and 1 cluster systems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.