Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004.

Slides:

Advertisements

Similar presentations

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Advertisements

DSPs Vs General Purpose Microprocessors

4/22/2002 George Wai Wong 1 Future Mobile Communications beyond 3G Systems A Multicarrier CDMA Architecture Based on Orthogonal Complementary Codes Prepared.

Comparison of different MIMO-OFDM signal detectors for LTE

Interference Cancellation Algorithm with Pilot in 3GPP/LTE

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Real-Time DSP Multiprocessor Implementation for Future Wireless Base-Station Receivers Bryan Jones, Sridhar Rajagopal, and Dr. Joseph Cavallaro.

1 Wireless Communication Low Complexity Multiuser Detection Rami Abdallah University of Illinois at Urbana Champaign 12/06/2007.

1/44 1. ZAHRA NAGHSH JULY 2009 BEAM-FORMING 2/44 2.

Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems Ali Irturk †, Bridget Benson †, Nikolay Laptev ‡, Ryan Kastner.

APPLICATION OF SPACE-TIME CODING TECHNIQUES IN THIRD GENERATION SYSTEMS - A. G. BURR ADAPTIVE SPACE-TIME SIGNAL PROCESSING AND CODING – A. G. BURR.

#7 1 Victor S. Frost Dan F. Servey Distinguished Professor Electrical Engineering and Computer Science University of Kansas 2335 Irving Hill Dr. Lawrence,

1 Channel Estimation for IEEE a OFDM Downlink Transmission Student: 王依翎 Advisor: Dr. David W. Lin Advisor: Dr. David W. Lin 2006/02/23.

Tejas Bhatt and Dennis McCain Hardware Prototype Group, NRC/Dallas Matlab as a Development Environment for FPGA Design Tejas Bhatt June 16, 2005.

Receiver Performance for Downlink OFDM with Training Koushik Sil ECE 463: Adaptive Filter Project Presentation.

Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.

Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.

DSPs in Wireless Communication Systems Vishwas Sundaramurthy Electrical and Computer Engineering Department, Rice University, Houston,TX.

RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696

A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University

Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,

Performance evaluation of adaptive sub-carrier allocation scheme for OFDMA Thesis presentation16th Jan 2007 Author:Li Xiao Supervisor: Professor Riku Jäntti.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

Software Defined Radio 長庚電機通訊組碩一張晉銓指導教授 : 黃文傑博士.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,

TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.

CNIT-Polimi, Newcom Cluster 2 Meeting, Barcelona 9-10 March 2005 CNIT-POLIMI: technical expertise and people in Dep. 1 Researchers: –Umberto Spagnolini.

Ali Al-Saihati ID# Ghassan Linjawi

Iterative Multi-user Detection for STBC DS-CDMA Systems in Rayleigh Fading Channels Derrick B. Mashwama And Emmanuel O. Bejide.

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro,

RICE UNIVERSITY DSP architectures for wireless communications Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.

RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.

Motivation Wireless Communication Environment Noise Multipath (ISI!) Demands Multimedia applications  High rate Data communication  Reliability.

Wireless Multiple Access Schemes in a Class of Frequency Selective Channels with Uncertain Channel State Information Christopher Steger February 2, 2004.

Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,

RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.

DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,

Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.

DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –

Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro and Behnaam Aazhang Rice.

Presented by Rajatha Raghavendra

Multipe-Symbol Sphere Decoding for Space- Time Modulation Vincent Hag March 7 th 2005.

SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.

RICE UNIVERSITY Handset architectures Sridhar Rajagopal ASICsProgrammable The support for this work in.

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,

An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.

Optimal Sequence Allocation and Multi-rate CDMA Systems Krishna Kiran Mukkavilli, Sridhar Rajagopal, Tarik Muharemovic, Vikram Kanodia.

Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro

Adnan Quadri & Dr. Naima Kaabouch Optimization Efficiency

A programmable communications processor for future wireless systems

Embedded Systems Design

Sridhar Rajagopal April 26, 2000

Optimal Sequence Allocation and Multi-rate CDMA Systems

How to ATTACK Problems Facing 3G Wireless Communication Systems

Matlab as a Development Environment for FPGA Design

Sridhar Rajagopal and Joseph R. Cavallaro Rice University

Sridhar Rajagopal and Joseph R. Cavallaro Rice University

DSPs for Future Wireless Base-Stations

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

On-line arithmetic for detection in digital communication receivers

Sridhar Rajagopal, Srikrishna Bhashyam,

DSP Architectures for Future Wireless Base-Stations

On-line arithmetic for detection in digital communication receivers

Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro

DSPs for Future Wireless Base-Stations

Presentation transcript:

Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004

Wireless System Downlink transmission in MIMO wireless system Physical layer of the mobile handset Linear channel equalization Hardware implementation using ASIP architectures

Motivation MIMO Downlink and Equalization MIMO: high data rate and high spectral efficiency Interference from each antenna that introduces MAI DS-CDMA signals in multipath environment – user orthogonality is destroyed which causes ISI Solution: powerful channel equalization to mitigate ISI and MAI in order to restore user’s orthogonality Chip level channel equalization based on iterative CG and adaptive LMS algorithms

Motivation ASIP Hardware Implementation Future generations of mobile handsets: high speed, flexibility and low power Traditional approaches: ASIC and DSP processors ASIC: No flexibility: Family of ASICs are needed High probability of design errors, high design cost DSP: Not optimized for a given application Often limited instruction and data level parallelism ASIP: Tradeoff between efficiency of ASICs and flexibility of DSPs

Thesis Contributions Channel equalization in broad range of environments 16-bit fixed point implementation Flexible ASIP architecture design Same hardware - different equalization (slow/fast fading, CG/LMS) Extension of ASIP instruction set with application-specific operations Customized architecture: Real-time requirements for 1xEV-DV standard ( Mc/s) Reasonable clock frequency (up to 150MHz) and power dissipation Automatic hardware design: from C to gate level Hardware synthesis for FPGA and CMOS libraries

Outline Data model Channel equalization ASIP hardware implementations Conclusions and future work

Data Model: Transmission Side Alternating symbols over transmit antennas Spreading: orthogonality between users Scrambling: Reduction of inter-cell interference Transmission over multipath correlated channels

Receiver Implementations RAKE Receiver, Multiuser Detector, Kalman filter, LMMSE equalization RAKE: Deteriorated performance in highly loaded system Not appropriate for MIMO environments Multiuser Detectors: High computational complexity Limited knowledge about the activity of other users Kalman filter: Optimal solution in the sense of MSE Prohibitive complexity in MIMO environments

LMMSE Equalization Lower complexity in comparison with other receivers Independent on the number of users Iterative Solutions Good performance in highly scattered environments LMMSE Receiver

LMMSE Equalization Linear system to be solved: Covariance: block Toeplitz and positive definite A and B: Toeplitz Hermitian matrices C: Toeplitz matrix

LMMSE Approaches LMMSE solution: Cholesky decomposition More complex hardware primitives Conjugate Gradient (CG) Iterative solution, fast convergence Block algorithm – modifications for fast fading channels Least Mean Square (LMS) Adaptive algorithm Sensitivity to learning step

Equalization in Time-Varying Channels Spatially correlated, frequency selective (multipaths), fading channels Data-rate: MChips/sec Antenna correlation: Base Station: 50.18% Mobile: 43.99%

Channel Equalization: CG Algorithm N samples: 4096 in slow fading channels

CG Equalization in Veh. A 30km/h Sliding Window (SW) approach Faster variations: more frequent update of filter coefficients

CG Equalization: Velocity of 120km/h Multiple sub-blocks instead of two blocks Partial channel estimation for each sub-block Apply weights for global channel estimation: Weights are adjusted according to the channel variations If channel fading is faster, faster the coefficients drop to 0

Architectural Alternative: LMS Equalization Adaptive LMS:

Performance: Slow Fading Environments From 32-bit floating to 16-bit fixed point Control of quantization error Pedestrian A – 3km/h Pedestrian B – 10km/h

Performance: Vehicular A 30km/h CG with sliding window (CG-SW): Improvement in comparison with basic CG

CG–SW Approach: Fixed Point 32-bit floating point and 16-bit fixed point About 1 % BER difference Vehicular A – 30km/h

Performance: Velocity of 120km/h CG with sliding window and weights averaging CG-SW-WA with different numbers of sub-blocks Performance improvement if weights are applied Pedestrian A - 120km/hVehicular A 120km/h

Computational Complexity Number of operations per chip in 1 second CG filter update is less complex Reason: block-level filter update algorithm

Directions for Architecture Implementation Equalization in different environments Block CG, adaptive LMS for slow fading environments Modifications of CG for fast fading channels Different computational complexity and amount of parallelism Flexible hardware for different equalizations and CG modifications Programmable architecture Application specific

ASIP Architecture for Equalization: Required Features Flexible architecture able to operate in different channel environments Slow/fast fading Low/high scattering Architecture customization Implementation of application-specific operations Instruction and data level parallelism Fast execution of complex algorithms Automatic hardware-software co-design Fast processor design starting from C/C++ code of application

ASIP Architecture Based on TTA Flexible architecture No limitations to add new FUs, buses, registers Customizable architecture Implementation of Special Function Units (SFUs) Instruction and data level parallelism VLIW architecture principle Efficient and parallel data flow Fast processor design Automatic search for best processor VHDL processor representation

General Structure of TTA Transport of operands triggers the appropriate operation as a side effect Only one instruction: “move” instruction 32-bit architecture

TTA Design Flow: MOVE Tool Design space exploration for optimal architecture

Customization of ASIP Implementation of application specific operations User-defined Special Function Units (SFUs) Sacrificing architecture generality for optimization and performance improvement Designed SFUs: Real multiplication with shifting ability Complex multiplication with shifting Sub-word arithmetic operations Sign-test and add/subtract

SFU: Complex Multiplication Reduction of data transports between FUs Less number of buses and smaller interconnection network Smaller instruction word Instruction and data parallelism is placed inside CXMUL

Performance Improvement with SFUs Bus reduction of 50% Instruction word length reduction of about 50%

TTA Processors for MIMO Equalization 1. Two co-processors (CG equalization) Co-processor for updating equalizer coefficients Co-processor for filtering and user detection 2. Single processor for all parts of equalization algorithm (CG/LMS equalization) Identical architectures for slow and fast fading environments

Single Processor vs. Two Coprocessors Single processor Smaller area and power dissipation Higher clock frequency

Processor Flexibility Identical customized processor for broad range of channel environments Identical processor for LMS and CG equalization

Example of Designed Processor Coprocessor for CG filter update

Hardware synthesis design flow MOVEGen: generates VHDL representation of processor core Xilinx tools for fast FPGA prototyping Mentor Graphics tools for CMOS gate level design

VHDL Template of TTA Processor Automatic VHDL generation of processor core, control and interconnection FUs, SFUs, peripherals: pre-designed or defined by user

MoveProc Synthesis on Xilinx FPGA CG/LMS equalizer including user detection no SFUs 32 buses Xilinx FPGA part: XC2V8000 Slices: 38,757 out of 46,592 BRAMs: 148 out of 168 IOBs: 263 out of 1108 MULT18x18s: 24 out of 168

MoveProc Synthesis on Xilinx FPGA Customized CG/LMS equalizer including user detection with SFUs 16 buses Xilinx FPGA part: XC2V6000 Slices: 21,126 out of 33,792 BRAMs: 107 out of 144 IOBs: 229 out of 1104 MULT18x18s: 11 out of 144

Gate Level CMOS Synthesis Mentor Graphics Tools 0.5  CMOS library Customized CG/LMS equalizer including user detection (with SFUs) Synthesis estimate of processor core: 182,887 gates

Conclusions Equalization algorithms for broad range of channel environments Slow fading: CG/LMS Fast fading: Modifications of basic CG equalization ASIP architecture design based on TTA Same architecture – different equalization algorithms Optimization with application-specific operations Reasonable frequency and power dissipation for 3GPP data rate Fast processor design VHDL representation of optimal processor FPGA synthesis and CMOS gate level synthesis

Future Work Processor layout synthesis IC Station software tool from Mentor Graphics Precise timing, area, and power analysis Implementation of hybrid word length Reduced precision for filter application part Implementation on C5x DSP for comparison

Acknowledgements Thanks to: Professor Cavallaro Dr. De Baynast Professor Aazhang Dr. Dabak Dr. Sabharwal Texas Instruments Nokia