RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,

Slides:



Advertisements
Similar presentations
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Advertisements

The University of Adelaide, School of Computer Science
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Real-Time DSP Multiprocessor Implementation for Future Wireless Base-Station Receivers Bryan Jones, Sridhar Rajagopal, and Dr. Joseph Cavallaro.
Data-Parallel Digital Signal Processors: Algorithm mapping, Architecture scaling, and Workload adaptation Sridhar Rajagopal.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture.
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst.
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
GPGPU platforms GP - General Purpose computation using GPU
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
Implementation Issues for Channel Estimation and Detection Algorithms for W-CDMA Sridhar Rajagopal and Joseph Cavallaro ECE Dept.
DSPs in Wireless Communication Systems Vishwas Sundaramurthy Electrical and Computer Engineering Department, Rice University, Houston,TX.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Efficient FPGA Implementation of QR
Techniques for Low Power Turbo Coding in Software Radio Joe Antoon Adam Barnett.
A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
RICE UNIVERSITY High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University
TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.
RICE UNIVERSITY SWAPs: Re-thinking mobile and base-station architectures Sridhar Rajagopal VLSI Signal Processing Group Center for Multimedia Communication.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro,
RICE UNIVERSITY DSP architectures for wireless communications Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
Motivation Wireless Communication Environment Noise Multipath (ISI!) Demands Multimedia applications  High rate Data communication  Reliability.
RICE UNIVERSITY A real-time baseband communications processor for high data rate wireless systems Sridhar Rajagopal ECE Department Ph.D.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
L38: Viterbi Decoder저전력 설계
DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Pipelining and number theory for multiuser detection Sridhar Rajagopal and Joseph R. Cavallaro Rice University This work is supported by Nokia, TI, TATP.
Real-Time Turbo Decoder Nasir Ahmed Mani Vaya Elec 434 Rice University.
Programmable processors for wireless base-stations Sridhar Rajagopal December 11, 2003.
RICE UNIVERSITY On the architecture design of a 3G W-CDMA/W-LAN receiver Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia.
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro and Behnaam Aazhang Rice.
DSP base-station comparisons. Second generation (2G) wireless 2 nd generation: digital: last decade: 1990’s Voice and low bit-rate data –~14.4 – 28.8.
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This.
Sunpyo Hong, Hyesoon Kim
The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao.
RICE UNIVERSITY Handset architectures Sridhar Rajagopal ASICsProgrammable The support for this work in.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
RICE UNIVERSITY SWAPs: Re-thinking mobile and base-station architectures Sridhar Rajagopal VLSI Signal Processing Group Center for Multimedia Communication.
Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Low-power Digital Signal Processing for Mobile Phone chipsets
A programmable communications processor for future wireless systems
Sridhar Rajagopal April 26, 2000
How to ATTACK Problems Facing 3G Wireless Communication Systems
Introduction to Digital Signal Processors (DSPs)
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Modeling of RF in W-CDMA with SystemView
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
DSPs for Future Wireless Base-Stations
On-line arithmetic for detection in digital communication receivers
Programmable processors for wireless base-stations
Modeling of RF in W-CDMA with SystemView
Sridhar Rajagopal, Srikrishna Bhashyam,
DSPs in emerging wireless systems
DSP Architectures for Future Wireless Base-Stations
On-line arithmetic for detection in digital communication receivers
Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
DSPs for Future Wireless Base-Stations
Presentation transcript:

RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia, TI, TATP and NSF

RICE UNIVERSITY Motivation Wireless Mobile device Baseband Programmable Communications Processor RF Unit A/D D/A Mobile: Switch between standards and between parameters Base-station: varying no. of users with different parameters Programmability - flexibility is good

RICE UNIVERSITY The problem GPP DSP FPGA VLSI Performance Flexibility Best architecture for Power, Area constraints ????

RICE UNIVERSITY An approach for the solution  Algorithms well understood at data-flow level  Can design real-time systems in VLSI.  Pushing implementation higher in the chain  Current DSPs not powerful enough for our application  Use an architecture simulator to design our own

RICE UNIVERSITY Proposed solution Current solutions to meet real-time (Racks of DSPs) Programmable Processor for 4G wireless systems < x cm Future wireless architectures x = 2.5 (W-CDMA BS) x = 2.0 (W-LAN BS) x = 1.5 (Mobile Handset) JOE

RICE UNIVERSITY Past work Algorithms DSP VLSI FPGA IMAGINE Multiuser channel estimation Multiuser detection Task-partitioning Parallelism Pipelining Conventional arithmetic On-line arithmetic Architecture innovations Functional unit design and usage Distant Past Recent Past Recent and Near Future System Design

RICE UNIVERSITY Contents  Motivation  The “Imagine” simulator  Parallel algorithms for estimation/detection/decoding  Performance comparisons and results

RICE UNIVERSITY The IMAGINE architecture

RICE UNIVERSITY Why IMAGINE simulator?  RSIM, SimpleScalar: GPP simulators  Great for media processing algorithms  Has a VLIW-based cluster -- DSP comparisons  A good base architecture : 1024-pt FFT

RICE UNIVERSITY Simulator knobs that we can turn  Cycle-accurate simulator  Varying number of Functional units and their design  Varying memory, register sizes  Graphical tools to investigate FU utilization, bottlenecks, memory stalls, communication overhead …  Almost anything can be changed, some changes easier than others!

RICE UNIVERSITY Caveats  2 level C++ programming  StreamC: transfers streams of data between main memory and stream register file (SRF)  KernelC: transfers streams from the SRF to the ALU clusters  Code optimized to the number of ALU clusters and the size of the data  Compiler not yet fully developed

RICE UNIVERSITY Contents  Motivation  The “Imagine” simulator  Parallel algorithms for estimation/detection/decoding  Performance comparisons and results

RICE UNIVERSITY Typical workload representation (Base-station)  Equalization?  FFT  Viterbi decoding  Multiuser channel estimation  Multiuser detection  Viterbi decoding  Turbo decoding  Multiple antenna systems (MIMO) Wireless LAN W-CDMA Advanced receiver schemes

RICE UNIVERSITY Parallel estimation/detection/decoding  Multiuser estimation  replaced matrix inversion by gradient descent  Multiuser detection  Parallel Interference Cancellation (PIC)  Pipelined algorithm that avoids block-based detection  Viterbi decoding  Trellis structures suited for decoding  Register exchange for survivor memory  No traceback latency

RICE UNIVERSITY Estimation/Detection (64,32 sizes) Multiuser Estimation Kernel 1,2,3 Multiuser Detection Kernel 6, 7 Massaging matrices for detection Kernel 4, 5

RICE UNIVERSITY X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) X(7) X(9) X(11) X(13) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) a. Unsuitable Trellisb. Suitable Trellisc. Shuffled Suitable Trellis X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Trellis for rate ½ code with K = 5 Upper bound on parallel clusters for good FU utilization : N/2 k Maximum 8 parallel units for rate ½ with 16 states

RICE UNIVERSITY Trellis structures for parallel Viterbi Definition : If from a present state p  [1..N], set of next states are {m p } (m p has 2 k elements where ‘k’ is the number of inputs at the encoder), i.e. p  {m p } then  i,j  [1..N] either {m i } = {m j } or {m i }  {m j } =  and a trellis that satisfies this property is denoted as a “separable” or a “fast” trellis.

RICE UNIVERSITY X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Y(0) Y(1) Y(2) Y(3) Y(4) Y(5) Y(6) Y(7) Y(8) Y(9) Y(10) Y(11) Y(12) Y(13) Y(14) Y(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Y(0) Y(1) Y(2) Y(3) Y(4) Y(5) Y(6) Y(7) Y(8) Y(9) Y(10) Y(11) Y(12) Y(13) Y(14) Y(15) a. Shuffled Suitable Trellis for ‘k=2’b. Rearranged Shuffled Suitable Trellis for ‘k=2’ Trellis for rate 2/3 code with K = 5 Upper bound on parallel clusters for good FU utilization : N/2 k Maximum 4 parallel units for rate 2/3 with 16 states (Having 8 will involve interprocessor comm. overhead)

RICE UNIVERSITY Survivor Management in Viterbi  Two techniques  Traceback : Commonly used  Register Exchange  Traceback is good for VLSI architectures where the information bits can be decoded by proper survivor memory addressing sequentially  Drawback: Sequential and additional latency

RICE UNIVERSITY Register exchange for decoding  Register for given node at given time contains information bits associated with surviving partial path that ends in that state  Survivors calculated in conjunction with path metrics.  Latency in conventional traceback is avoided.  Higher power consumption as entire survivor memory contents are updated for all states for each bit.  Suited to a parallel programmable implementation as storing bits in a register for traceback touches the previous survivors anyway

RICE UNIVERSITY Contents  Motivation  The “Imagine” simulator  Parallel algorithms for estimation/detection/decoding  Performance comparisons and results

RICE UNIVERSITY Lower bounds on + and * Adders/Multipliers required to meet real-time Estimation, Detection and Decoding in a W-CDMA multiuser system Number of users Add Mul SLOW FADING (estimation every 1000 bits) MEDIUM FADING (estimation every 100 bits) FAST FADING (estimation every 10 bits) DATA RATES

RICE UNIVERSITY Kernel 2 (mmult) for 3 +,2* Adders have limited FU utilization O(N 3 ) *, O(N 3 ) + Multipliers 100% in loop Divider not being utilized Replace / with * Communication (waiting for input) TIME LOOP FU unavailable (input ready but FU busy)

RICE UNIVERSITY Kernel 2 (mmult)for 3 +,3* better adder utilization needs sufficient registers for scaling [register allocation may fail] code may also need slight tuning of variables for optimization TIME

RICE UNIVERSITY Kernel computational time Time available at 128 Kbps for each of 32 users at 500 MHz : 4000 cycles *Numbers subject to change

RICE UNIVERSITY Communication overhead Kernels (Micro-controller executing) Memory operations Initialization Idle time between kernels

RICE UNIVERSITY Comparisons with TI C6701 DSPs Execution time (in seconds) Users Single DSP implementation 2 DSP implementation Target data rate Kbps/user Our architecture based on Imagine X x

RICE UNIVERSITY Kernel comparisons KERNELS Execution time IMAGINE TI C67: Internal Memory TI C67: External Memory

RICE UNIVERSITY 4Gone Conclusions  Various programmable architectures can be investigated for 4G systems depending on algorithms, time, area and power constraints QUICKLY  Enormous potential for 4G system prototyping.  Programmable baseband processor design with broad system functionality, flexibility and low-power consumption that allows a smooth and fast transition from 2G to 3G to 4G systems.

RICE UNIVERSITY Future work  Investigating bottlenecks, functional unit design and other innovations needed to attain real-time  Power and area constraints  Scalability with data rates  Handset algorithms  The insights gained from the design can also be applied to DSPs and other processors.