DSPs in emerging wireless systems

DSPs in emerging wireless systems

Motivation Software solutions becoming important in the physical layer
Multi-standard systems Algorithms tailored to environment, SNR etc. Flexible parameters for spreading, coding Computations exceed real-time requirements by > 2 orders of magnitude in current generation DSPs

Current approaches HW/SW co-design Maximize programmability in DSPs
Complex tasks on co-processors TI C6416 Viterbi and Turbo co-processors How is this going to scale in 4G? Keep on adding co-processors??

Our approach DSP role restricted to controlling co-processors with increasing computational demands Final system as inflexible as traditional ASIC design Investigating Scalable Wireless Application-specific Processors (SWAPs) Identifying bottlenecks in architectures and identify gap w.r.t. ASICs. Investigate solutions to bridge gap

Scalable Wireless A-s Processors
Multi-cluster stream-based architecture based on “Imagine” media processor from Stanford Streaming processor because GPP architectures not good for media, wireless streaming processor shown to be good for media applications such as FFT and FIR. Media and communication algorithms similar Media architectures popular --> wireless architectures?

Scalable architectures

Programming model Kernels Streams Computation Communication
KERNEL example1(istream<int> a, istream<int> b, ostream<int> c) { loop_stream(a) { int ai, bi, ci; a >> ai; b >> bi; ci = ai * 2 + bi * 3; c << ci; } Streams Communication void main() { Stream<int> a(256); Stream<int> b(256); Stream<int> c(256); Stream<int> d(1024); ... example1(a, b, c); example2(c, d); }

Architecture evaluation
Benchmark kernels currently used: Matrix-vector multiplications, FFT, Viterbi Was fine in ASIC solutions Programmable architectures need to investigate interaction between the kernels May need to re-order data between the kernels

Rice Benchmark for wireless systems
Investigate chain of multi-user estimation, multiuser detection and Viterbi decoding algorithms

Bottlenecks in multi-cluster architectures
Packed data (subword parallelism) Not always good to pack data Matrix transposes (Interleaving) Doing in ALUs may be cheaper, lower power Cannot be avoided in packed matrices Viterbi shuffling of path metrics and survivor states using register exchange Register exchange needed for parallel computations

DSP comparisions Ideal DSP cluster 8 clusters estimation detection
1c2a2m fl 1c3a3m fx 8c3a3m 1 2 3 4 5 6 7 x 10 Number of cycles estimation detection decoding Ideal DSP cluster 8 clusters

Packing in multi-cluster architectures
Kernel (in,out) { half2 a; //packed a int p,q; in >> a; p = mul_low(a,a); q = mul_high(a,a); out << p << q; }

Matrix Transpose in Memory

Matrix Transpose in kernel

Data re-ordering for Viterbi
a. Trellis X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) X(7) X(9) X(11) X(13) X(15) b. Shuffled Trellis

Performance loss due to re-ordering data for parallelism
10 1 2 3 4 Number of clusters Frequency needed to attain real-time (in MHz) Viterbi decoding for rate 1/2 constraint 9 for 32 users at 128 Kbps each user Actual with communication overhead Ideal without communication overhead Speedup per cluster added = 0.5 due to parallelizing Viterbi trellis

Communication pattern
All data re-arrangement problems share a common communication pattern Odd-even permutation of the data Investigating solutions to solve the problem and bridge gap between multi-cluster and 1 cluster systems

DSPs in emerging wireless systems

Similar presentations

Presentation on theme: "DSPs in emerging wireless systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DSPs in emerging wireless systems

Similar presentations

Presentation on theme: "DSPs in emerging wireless systems"— Presentation transcript:

Similar presentations

About project

Feedback