Download presentation
Presentation is loading. Please wait.
Published byGwendoline Shanna Jefferson Modified over 8 years ago
1
Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004
2
Wireless System Downlink transmission in MIMO wireless system Physical layer of the mobile handset Linear channel equalization Hardware implementation using ASIP architectures
3
Motivation MIMO Downlink and Equalization MIMO: high data rate and high spectral efficiency Interference from each antenna that introduces MAI DS-CDMA signals in multipath environment – user orthogonality is destroyed which causes ISI Solution: powerful channel equalization to mitigate ISI and MAI in order to restore user’s orthogonality Chip level channel equalization based on iterative CG and adaptive LMS algorithms
4
Motivation ASIP Hardware Implementation Future generations of mobile handsets: high speed, flexibility and low power Traditional approaches: ASIC and DSP processors ASIC: No flexibility: Family of ASICs are needed High probability of design errors, high design cost DSP: Not optimized for a given application Often limited instruction and data level parallelism ASIP: Tradeoff between efficiency of ASICs and flexibility of DSPs
5
Thesis Contributions Channel equalization in broad range of environments 16-bit fixed point implementation Flexible ASIP architecture design Same hardware - different equalization (slow/fast fading, CG/LMS) Extension of ASIP instruction set with application-specific operations Customized architecture: Real-time requirements for 1xEV-DV standard (1.2288 Mc/s) Reasonable clock frequency (up to 150MHz) and power dissipation Automatic hardware design: from C to gate level Hardware synthesis for FPGA and CMOS libraries
6
Outline Data model Channel equalization ASIP hardware implementations Conclusions and future work
7
Data Model: Transmission Side Alternating symbols over transmit antennas Spreading: orthogonality between users Scrambling: Reduction of inter-cell interference Transmission over multipath correlated channels
8
Receiver Implementations RAKE Receiver, Multiuser Detector, Kalman filter, LMMSE equalization RAKE: Deteriorated performance in highly loaded system Not appropriate for MIMO environments Multiuser Detectors: High computational complexity Limited knowledge about the activity of other users Kalman filter: Optimal solution in the sense of MSE Prohibitive complexity in MIMO environments
9
LMMSE Equalization Lower complexity in comparison with other receivers Independent on the number of users Iterative Solutions Good performance in highly scattered environments LMMSE Receiver
10
LMMSE Equalization Linear system to be solved: Covariance: block Toeplitz and positive definite A and B: Toeplitz Hermitian matrices C: Toeplitz matrix
11
LMMSE Approaches LMMSE solution: Cholesky decomposition More complex hardware primitives Conjugate Gradient (CG) Iterative solution, fast convergence Block algorithm – modifications for fast fading channels Least Mean Square (LMS) Adaptive algorithm Sensitivity to learning step
12
Equalization in Time-Varying Channels Spatially correlated, frequency selective (multipaths), fading channels Data-rate: 1.2288MChips/sec Antenna correlation: Base Station: 50.18% Mobile: 43.99%
13
Channel Equalization: CG Algorithm N samples: 4096 in slow fading channels
14
CG Equalization in Veh. A 30km/h Sliding Window (SW) approach Faster variations: more frequent update of filter coefficients
15
CG Equalization: Velocity of 120km/h Multiple sub-blocks instead of two blocks Partial channel estimation for each sub-block Apply weights for global channel estimation: Weights are adjusted according to the channel variations If channel fading is faster, faster the coefficients drop to 0
16
Architectural Alternative: LMS Equalization Adaptive LMS:
17
Performance: Slow Fading Environments From 32-bit floating to 16-bit fixed point Control of quantization error Pedestrian A – 3km/h Pedestrian B – 10km/h
18
Performance: Vehicular A 30km/h CG with sliding window (CG-SW): Improvement in comparison with basic CG
19
CG–SW Approach: Fixed Point 32-bit floating point and 16-bit fixed point About 1 % BER difference Vehicular A – 30km/h
20
Performance: Velocity of 120km/h CG with sliding window and weights averaging CG-SW-WA with different numbers of sub-blocks Performance improvement if weights are applied Pedestrian A - 120km/hVehicular A 120km/h
21
Computational Complexity Number of operations per chip in 1 second CG filter update is less complex Reason: block-level filter update algorithm
22
Directions for Architecture Implementation Equalization in different environments Block CG, adaptive LMS for slow fading environments Modifications of CG for fast fading channels Different computational complexity and amount of parallelism Flexible hardware for different equalizations and CG modifications Programmable architecture Application specific
23
ASIP Architecture for Equalization: Required Features Flexible architecture able to operate in different channel environments Slow/fast fading Low/high scattering Architecture customization Implementation of application-specific operations Instruction and data level parallelism Fast execution of complex algorithms Automatic hardware-software co-design Fast processor design starting from C/C++ code of application
24
ASIP Architecture Based on TTA Flexible architecture No limitations to add new FUs, buses, registers Customizable architecture Implementation of Special Function Units (SFUs) Instruction and data level parallelism VLIW architecture principle Efficient and parallel data flow Fast processor design Automatic search for best processor VHDL processor representation
25
General Structure of TTA Transport of operands triggers the appropriate operation as a side effect Only one instruction: “move” instruction 32-bit architecture
26
TTA Design Flow: MOVE Tool Design space exploration for optimal architecture
27
Customization of ASIP Implementation of application specific operations User-defined Special Function Units (SFUs) Sacrificing architecture generality for optimization and performance improvement Designed SFUs: Real multiplication with shifting ability Complex multiplication with shifting Sub-word arithmetic operations Sign-test and add/subtract
28
SFU: Complex Multiplication Reduction of data transports between FUs Less number of buses and smaller interconnection network Smaller instruction word Instruction and data parallelism is placed inside CXMUL
29
Performance Improvement with SFUs Bus reduction of 50% Instruction word length reduction of about 50%
30
TTA Processors for MIMO Equalization 1. Two co-processors (CG equalization) Co-processor for updating equalizer coefficients Co-processor for filtering and user detection 2. Single processor for all parts of equalization algorithm (CG/LMS equalization) Identical architectures for slow and fast fading environments
31
Single Processor vs. Two Coprocessors Single processor Smaller area and power dissipation Higher clock frequency
32
Processor Flexibility Identical customized processor for broad range of channel environments Identical processor for LMS and CG equalization
33
Example of Designed Processor Coprocessor for CG filter update
34
Hardware synthesis design flow MOVEGen: generates VHDL representation of processor core Xilinx tools for fast FPGA prototyping Mentor Graphics tools for CMOS gate level design
35
VHDL Template of TTA Processor Automatic VHDL generation of processor core, control and interconnection FUs, SFUs, peripherals: pre-designed or defined by user
36
MoveProc Synthesis on Xilinx FPGA CG/LMS equalizer including user detection no SFUs 32 buses Xilinx FPGA part: XC2V8000 Slices: 38,757 out of 46,592 BRAMs: 148 out of 168 IOBs: 263 out of 1108 MULT18x18s: 24 out of 168
37
MoveProc Synthesis on Xilinx FPGA Customized CG/LMS equalizer including user detection with SFUs 16 buses Xilinx FPGA part: XC2V6000 Slices: 21,126 out of 33,792 BRAMs: 107 out of 144 IOBs: 229 out of 1104 MULT18x18s: 11 out of 144
38
Gate Level CMOS Synthesis Mentor Graphics Tools 0.5 CMOS library Customized CG/LMS equalizer including user detection (with SFUs) Synthesis estimate of processor core: 182,887 gates
39
Conclusions Equalization algorithms for broad range of channel environments Slow fading: CG/LMS Fast fading: Modifications of basic CG equalization ASIP architecture design based on TTA Same architecture – different equalization algorithms Optimization with application-specific operations Reasonable frequency and power dissipation for 3GPP data rate Fast processor design VHDL representation of optimal processor FPGA synthesis and CMOS gate level synthesis
40
Future Work Processor layout synthesis IC Station software tool from Mentor Graphics Precise timing, area, and power analysis Implementation of hybrid word length Reduced precision for filter application part Implementation on C5x DSP for comparison
41
Acknowledgements Thanks to: Professor Cavallaro Dr. De Baynast Professor Aazhang Dr. Dabak Dr. Sabharwal Texas Instruments Nokia
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.