Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro

Slides:



Advertisements
Similar presentations
Multiuser Detection for CDMA Systems
Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Real-Time DSP Multiprocessor Implementation for Future Wireless Base-Station Receivers Bryan Jones, Sridhar Rajagopal, and Dr. Joseph Cavallaro.
1 Wireless Communication Low Complexity Multiuser Detection Rami Abdallah University of Illinois at Urbana Champaign 12/06/2007.
Multiuser Detection in CDMA A. Chockalingam Assistant Professor Indian Institute of Science, Bangalore-12
EE360: Lecture 8 Outline Multiuser Detection
Implementation Issues for Channel Estimation and Detection Algorithms for W-CDMA Sridhar Rajagopal and Joseph Cavallaro ECE Dept.
DSPs in Wireless Communication Systems Vishwas Sundaramurthy Electrical and Computer Engineering Department, Rice University, Houston,TX.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,
TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro,
RICE UNIVERSITY DSP architectures for wireless communications Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
RICE UNIVERSITY Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer.
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
RICE UNIVERSITY A real-time baseband communications processor for high data rate wireless systems Sridhar Rajagopal ECE Department Ph.D.
VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY MOBILE & PORTABLE RADIO RESEARCH GROUP MPRG Combined Multiuser Detection and Channel Decoding with Receiver.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,
VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY MOBILE & PORTABLE RADIO RESEARCH GROUP MPRG Combined Multiuser Reception and Channel Decoding for TDMA.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Pipelining and number theory for multiuser detection Sridhar Rajagopal and Joseph R. Cavallaro Rice University This work is supported by Nokia, TI, TATP.
RICE UNIVERSITY On the architecture design of a 3G W-CDMA/W-LAN receiver Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia.
Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro and Behnaam Aazhang Rice.
Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
RICE UNIVERSITY Handset architectures Sridhar Rajagopal ASICsProgrammable The support for this work in.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
Optimal Sequence Allocation and Multi-rate CDMA Systems Krishna Kiran Mukkavilli, Sridhar Rajagopal, Tarik Muharemovic, Vikram Kanodia.
Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Differencing Multistage Detector
Suman Das Rice University
A programmable communications processor for future wireless systems
Parallel Programming By J. H. Wang May 2, 2017.
Sridhar Rajagopal April 26, 2000
Konstantinos Nikitopoulos
Optimal Sequence Allocation and Multi-rate CDMA Systems
How to ATTACK Problems Facing 3G Wireless Communication Systems
Pipelining and Vector Processing
Introduction to Digital Signal Processors (DSPs)
Superscalar Processors & VLIW Processors
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
STUDY AND IMPLEMENTATION
Ian C. Wong, Zukang Shen, Jeffrey G. Andrews, and Brian L. Evans
Modeling of RF in W-CDMA with SystemView
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
DSPs for Future Wireless Base-Stations
Physical Layer Approach for n
On-line arithmetic for detection in digital communication receivers
Sridhar Rajagopal COMP 625 April 17, 2000
Modeling of RF in W-CDMA with SystemView
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Sridhar Rajagopal, Srikrishna Bhashyam,
DSPs in emerging wireless systems
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
DSP Architectures for Future Wireless Base-Stations
On-line arithmetic for detection in digital communication receivers
DSPs for Future Wireless Base-Stations
Presentation transcript:

Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro Http://www.ece.rice.edu/ Arithmetic Acceleration Techniques for Wireless Communication Receivers Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro {suman,sridhar,chaitali,cavallar}@rice.edu Rice University This work is supported by Nokia, Texas Instruments, Texas Advanced Technology Program and NSF

Objective Next generation Wireless Base-station Real-Time Requirements Multiuser Channel Estimation and Detection High Complexity Algorithms for Advanced Receiver Structures Task Decomposition Potential for parallelism Application-Specific Design / Single Processor

Outline Motivation Real-time Requirements Joint Estimation and Detection Task Decomposition Results Summary

Motivation Next Generation Wireless Systems Higher Data Rates , up to 2 Mbps Multimedia Capabilities Multi-rate, QoS High Complexity in Proposed Algorithms Pressure on existing hardware Time, power, size constraints Acceleration on Hardware Needed

Wireless Communication Uplink Asynchronous CDMA System Multiple Users Channel Effects Fading Multiple paths Multiple Access Interference Direct Path Reflected Paths Noise +MAI User 1 User 2 Base Station

Base-Station Receiver Multiple Users Channel Estimation Multiuser Detection Decoder Data Pilot Demod -ulator Antenna Decision Feedback MU X Detected Bits + Base-station Receiver Delay d b The Physical Layer

Real -Time Requirements W-CDMA Transmission done by multiplication of signature waveform (Spreading) Data Transmission in 10 ms Frames Multiple Data Rates by Varying Spreading Factors Detection needs to be done in real-time 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128 Kbps

Joint Estimation and Detection Algorithm to jointly estimate the channel response and detect all the user’s bits. Shown to have better performance as well as reduced computational complexity. Maximum Likelihood Based Channel Estimation [C.Sengupta et al. : PIMRC’1998 WCNC’1999] Differencing Multistage Detection based on Parallel Interference Cancellation [G.Xu et al. : SPIE’1999]

Computations Involved delay Model Compute Correlation Matrices ri bi bi-1 time Bits of K async. users aligned at times I and I-1 Received bits of spreading length N for K users

Solve for the channel estimate, Ai Multishot Detection Solve for the channel estimate, Ai Multishot Detection

Differencing Multistage Detection Successive Stages S=diag(AHA) y - soft decision d - detected bits (hard decision)

Block Bi-Diagonal Matrix Structure of AHA Block Bi-Diagonal Matrix

Bottlenecks Identify using C6x DSP Implementation Channel Estimation Can be done less frequently Depends on BER needed Multiuser Detection Needs to be done all the time Differencing Multistage Less computations on successive stages Analysis on Various levels of Optimization for Detection

Correlation Matrices (Per Bit) Task Decomposition Block I Block II Block III Task B Correlation Matrices (Per Bit) Inverse Matrix Products Block IV M U X d A0HA1 O(K2N) Multistage Detection (Per Window) Rbr[R] O(KN) RbbAH = Rbr[R] O(K2N) b A0HA0 O(K2N) Rbr[I] O(KN) Data’ M U X RbbAH = Rbr[I] O(K2N) d O(DK2Me) Rbb O(K2) A1HA1 O(K2N) Pilot AHr O(KND) Data Channel Estimation Multistage Detection Task A

Sequential / Pipeline A B Task A Block IV AHr O(KND) d Data O(DK2Me) Real-time 1953 cycles,128 Kbps Task B 13272 cycles 3367*Me cycles (Single PE) Sequential : A+B: 13272 + 3367*Me : 10.7 Kbps (2 PE) Pipeline : A B : max(13272, 3367*Me) : 18.8 Kbps *Me =3

(K+1 PE) Parallel A B : 3367*Me : 24.75 Kbps Block IV Task A AHr O(ND) 1 Data O(DK2Me) d K Task B Real-time 1953 cycles,128 Kbps 3367*Me cycles 885 cycles (K+1 PE) Parallel A B : 3367*Me : 24.75 Kbps

Parallel A Pipeline B Parallel A Parallel + Pipeline B Task A 1 K Task B Real-time 1953 cycles,128 Kbps 885 cycles O(N) 3367 cycles O(K2) 225 cycles O(K) (K +3 PE) Parallel A Pipeline B : 3367 : 74.25 Kbps ((Me+1)K PE) Parallel A Parallel + Pipeline B : 885 : 282.5 Kbps

At this step Task A Task B Multistage Detection Block I &II 1 Data K Stage 1 Stage2 Stage3… Block IV Block III Task B

Achieved Data Rates 9 10 11 12 13 14 15 0.5 1 1.5 2 2.5 3 x 10 5 Number of Users Data Rates Data Rates for Different Levels of Pipelining and Parallelism (Parallel A) (Parallel+Pipe B) (Parallel A) (Pipe B) (Parallel A) B A B Sequential A + B Data Rate Requirement = 128 Kbps

Mapping to Hardware Analysis independent of hardware DSP with coprocessors Multiple Processors Combination of a processor with ASIC/FPGA Single ASIC Minimize Idle time in processing elements Some computations can be shared Assumptions Critical processing elements have functional units similar to C6x No communication overhead between processors Number of elements dependent on number of users

Summary Acceleration Techniques for Multiuser Estimation and Detection : computationally intensive algorithm Task Decomposition C6x DSP Simulator Real-time Analysis Hardware Mapping Issues Application Specific Design more effective than a single processor solution

Future Work Fixed Point Implementation Matrix Oriented Architectures LU Decomposition Other Algorithms for decomposition Matrix Oriented Architectures Vector Processor with SIMD 2 Levels of Parallelism Complex Arithmetic

DSP Implementation Texas Instruments C6x Simulator TI TMS320C6701 Floating Point DSP Code and Program optimized to fit in internal memory 32 -bit VLIW Architecture 8 Functional Units 2 Multipliers 4 Adders 2 Load/Store TI C Compiler