RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,

RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia, TI, TATP and NSF

RICE UNIVERSITY Motivation Wireless Mobile device Baseband Programmable Communications Processor RF Unit A/D D/A Mobile: Switch between standards and between parameters Base-station: varying no. of users with different parameters Programmability - flexibility is good

RICE UNIVERSITY The problem GPP DSP FPGA VLSI Performance Flexibility Best architecture for Power, Area constraints ????

RICE UNIVERSITY An approach for the solution  Algorithms well understood at data-flow level  Can design real-time systems in VLSI.  Pushing implementation higher in the chain  Current DSPs not powerful enough for our application  Use an architecture simulator to design our own

RICE UNIVERSITY Proposed solution Current solutions to meet real-time (Racks of DSPs) Programmable Processor for 4G wireless systems < x cm Future wireless architectures x = 2.5 (W-CDMA BS) x = 2.0 (W-LAN BS) x = 1.5 (Mobile Handset) JOE

RICE UNIVERSITY Past work Algorithms DSP VLSI FPGA IMAGINE Multiuser channel estimation Multiuser detection Task-partitioning Parallelism Pipelining Conventional arithmetic On-line arithmetic Architecture innovations Functional unit design and usage Distant Past Recent Past Recent and Near Future System Design

RICE UNIVERSITY Contents  Motivation  The “Imagine” simulator  Parallel algorithms for estimation/detection/decoding  Performance comparisons and results

RICE UNIVERSITY The IMAGINE architecture

RICE UNIVERSITY Why IMAGINE simulator?  RSIM, SimpleScalar: GPP simulators  Great for media processing algorithms  Has a VLIW-based cluster -- DSP comparisons  A good base architecture : 1024-pt FFT

RICE UNIVERSITY Simulator knobs that we can turn  Cycle-accurate simulator  Varying number of Functional units and their design  Varying memory, register sizes  Graphical tools to investigate FU utilization, bottlenecks, memory stalls, communication overhead …  Almost anything can be changed, some changes easier than others!

RICE UNIVERSITY Caveats  2 level C++ programming  StreamC: transfers streams of data between main memory and stream register file (SRF)  KernelC: transfers streams from the SRF to the ALU clusters  Code optimized to the number of ALU clusters and the size of the data  Compiler not yet fully developed

RICE UNIVERSITY Typical workload representation (Base-station)  Equalization?  FFT  Viterbi decoding  Multiuser channel estimation  Multiuser detection  Viterbi decoding  Turbo decoding  Multiple antenna systems (MIMO) Wireless LAN W-CDMA Advanced receiver schemes

RICE UNIVERSITY Parallel estimation/detection/decoding  Multiuser estimation  replaced matrix inversion by gradient descent  Multiuser detection  Parallel Interference Cancellation (PIC)  Pipelined algorithm that avoids block-based detection  Viterbi decoding  Trellis structures suited for decoding  Register exchange for survivor memory  No traceback latency

RICE UNIVERSITY Estimation/Detection (64,32 sizes) Multiuser Estimation Kernel 1,2,3 Multiuser Detection Kernel 6, 7 Massaging matrices for detection Kernel 4, 5

RICE UNIVERSITY X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) X(7) X(9) X(11) X(13) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) a. Unsuitable Trellisb. Suitable Trellisc. Shuffled Suitable Trellis X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Trellis for rate ½ code with K = 5 Upper bound on parallel clusters for good FU utilization : N/2 k Maximum 8 parallel units for rate ½ with 16 states

RICE UNIVERSITY Trellis structures for parallel Viterbi Definition : If from a present state p  [1..N], set of next states are {m p } (m p has 2 k elements where ‘k’ is the number of inputs at the encoder), i.e. p  {m p } then  i,j  [1..N] either {m i } = {m j } or {m i }  {m j } =  and a trellis that satisfies this property is denoted as a “separable” or a “fast” trellis.

RICE UNIVERSITY X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Y(0) Y(1) Y(2) Y(3) Y(4) Y(5) Y(6) Y(7) Y(8) Y(9) Y(10) Y(11) Y(12) Y(13) Y(14) Y(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Y(0) Y(1) Y(2) Y(3) Y(4) Y(5) Y(6) Y(7) Y(8) Y(9) Y(10) Y(11) Y(12) Y(13) Y(14) Y(15) a. Shuffled Suitable Trellis for ‘k=2’b. Rearranged Shuffled Suitable Trellis for ‘k=2’ Trellis for rate 2/3 code with K = 5 Upper bound on parallel clusters for good FU utilization : N/2 k Maximum 4 parallel units for rate 2/3 with 16 states (Having 8 will involve interprocessor comm. overhead)

RICE UNIVERSITY Survivor Management in Viterbi  Two techniques  Traceback : Commonly used  Register Exchange  Traceback is good for VLSI architectures where the information bits can be decoded by proper survivor memory addressing sequentially  Drawback: Sequential and additional latency

RICE UNIVERSITY Register exchange for decoding  Register for given node at given time contains information bits associated with surviving partial path that ends in that state  Survivors calculated in conjunction with path metrics.  Latency in conventional traceback is avoided.  Higher power consumption as entire survivor memory contents are updated for all states for each bit.  Suited to a parallel programmable implementation as storing bits in a register for traceback touches the previous survivors anyway

RICE UNIVERSITY Lower bounds on + and * 050100150200250300 10 0 1 2 3 Adders/Multipliers required to meet real-time Estimation, Detection and Decoding in a W-CDMA multiuser system Number of users Add Mul SLOW FADING (estimation every 1000 bits) MEDIUM FADING (estimation every 100 bits) FAST FADING (estimation every 10 bits) DATA RATES

RICE UNIVERSITY Kernel 2 (mmult) for 3 +,2* Adders have limited FU utilization O(N 3 ) *, O(N 3 ) + Multipliers 100% in loop Divider not being utilized Replace / with * Communication (waiting for input) TIME LOOP FU unavailable (input ready but FU busy)

RICE UNIVERSITY Kernel 2 (mmult)for 3 +,3* better adder utilization needs sufficient registers for scaling [register allocation may fail] code may also need slight tuning of variables for optimization TIME

RICE UNIVERSITY Kernel computational time Time available at 128 Kbps for each of 32 users at 500 MHz : 4000 cycles *Numbers subject to change

RICE UNIVERSITY Communication overhead Kernels (Micro-controller executing) Memory operations Initialization Idle time between kernels

RICE UNIVERSITY Comparisons with TI C6701 DSPs 05101520253035 10 -6 10 -5 10 -4 10 -3 10 -2 Execution time (in seconds) Users Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user Our architecture based on Imagine X x

RICE UNIVERSITY Kernel comparisons KERNELS Execution time IMAGINE TI C67: Internal Memory TI C67: External Memory

RICE UNIVERSITY 4Gone Conclusions  Various programmable architectures can be investigated for 4G systems depending on algorithms, time, area and power constraints QUICKLY  Enormous potential for 4G system prototyping.  Programmable baseband processor design with broad system functionality, flexibility and low-power consumption that allows a smooth and fast transition from 2G to 3G to 4G systems.

RICE UNIVERSITY Future work  Investigating bottlenecks, functional unit design and other innovations needed to attain real-time  Power and area constraints  Scalability with data rates  Handset algorithms  The insights gained from the design can also be applied to DSPs and other processors.

RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,

Similar presentations

Presentation on theme: "RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,

Similar presentations

Presentation on theme: "RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,"— Presentation transcript:

Similar presentations

About project

Feedback