Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17,

Slides:



Advertisements
Similar presentations
Multiuser Detection for CDMA Systems
Advertisements

DSPs Vs General Purpose Microprocessors
Comparison of different MIMO-OFDM signal detectors for LTE
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Real-Time DSP Multiprocessor Implementation for Future Wireless Base-Station Receivers Bryan Jones, Sridhar Rajagopal, and Dr. Joseph Cavallaro.
1 Wireless Communication Low Complexity Multiuser Detection Rami Abdallah University of Illinois at Urbana Champaign 12/06/2007.
Multiuser Detection in CDMA A. Chockalingam Assistant Professor Indian Institute of Science, Bangalore-12
EE360: Lecture 8 Outline Multiuser Detection
Overview.  UMTS (Universal Mobile Telecommunication System) the third generation mobile communication systems.
Overview Team Members What is Low Complexity Signal Detection Team Goals (Part 1 and Part 2) Budget Results Project Applications Future Plans Conclusion.
10 January,2002Seminar of Master Thesis1 Helsinki University of Technology Department of Electrical and Communication Engineering WCDMA Simulator with.
Anne Mascarin DSP Marketing The MathWorks
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Implementation Issues for Channel Estimation and Detection Algorithms for W-CDMA Sridhar Rajagopal and Joseph Cavallaro ECE Dept.
Anthony Gaught Advisors: Dr. In Soo Ahn and Dr. Yufeng Lu Department of Electrical and Computer Engineering Bradley University, Peoria, Illinois May 7,
DSPs in Wireless Communication Systems Vishwas Sundaramurthy Electrical and Computer Engineering Department, Rice University, Houston,TX.
Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,
TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro,
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
RICE UNIVERSITY Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Pipelining and number theory for multiuser detection Sridhar Rajagopal and Joseph R. Cavallaro Rice University This work is supported by Nokia, TI, TATP.
RICE UNIVERSITY On the architecture design of a 3G W-CDMA/W-LAN receiver Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia.
Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro and Behnaam Aazhang Rice.
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This.
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
RICE UNIVERSITY Handset architectures Sridhar Rajagopal ASICsProgrammable The support for this work in.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Differencing Multistage Detector
Techniques to control noise and fading
A programmable communications processor for future wireless systems
Sridhar Rajagopal April 26, 2000
How to ATTACK Problems Facing 3G Wireless Communication Systems
Matlab as a Development Environment for FPGA Design
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
A 100 µW, 16-Channel, Spike-Sorting ASIC with On-the-Fly Clustering
Modeling of RF in W-CDMA with SystemView
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
DSPs for Future Wireless Base-Stations
On-line arithmetic for detection in digital communication receivers
Enhancing capacity of wireless cellular CDMA
Enhancing capacity of wireless cellular CDMA
Modeling of RF in W-CDMA with SystemView
Sridhar Rajagopal, Srikrishna Bhashyam,
EM based Multiuser detection in Fading Multipath Environments
DSP Architectures for Future Wireless Base-Stations
On-line arithmetic for detection in digital communication receivers
Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
DSPs for Future Wireless Base-Stations
Presentation transcript:

Overview of Implementation Issues for Multitier Networks on DSPs Joseph R. Cavallaro Electrical & Computer Engineering Dept. Rice University August 17, 1999

Outline u Overview of Multitier Networks u DSP Rapid Prototyping Tools u Channel Estimation and Multistage Detection u DSP implementation and Real-time Issues u ASIC Implementation of Algorithm Modules u Conclusions and Future Directions

Multitier Overlay Networks Home Area Wireless LAN High Speed Office Wireless LAN Outdoor CDMA Cellular Network

Time Scales in Multitier Networks u Multiple Radio Interfaces u Reconfigurability and Commonality of Modules u Multitier Network Interface Card

mNIC Server Mobile Platform Network Protocols Proxy File System Transcoders Application Proxy Awareness mNIC N I C BS I N T E R N E T File System Network Protocols Proxy File System Transcoders

Current Group u Suman Das - Universal Baseline Software System u Vishwas Sundaramurthy - System Design Issues u Sridhar Rajagopal - Channel Estimation Algorithms u Oscar Pan – Real Time Workshop Implementation u Recent Graduates: – Chaitali Sengupta - ML Synchronization – Gang Xu - Differencing Multistage Detector

W-CDMA Simulation Testbed Overview u Development of an integrated software testbed u Unified framework to evaluate new algorithms for coding, synchronization, detection, etc. u Construction of a faster, efficient, and possibly hardware accelerated simulation testbed u TI TMS320C6201- TMS320C6701 based system – Base Station u TI TMS320C54 and FPGA / ASIC - Mobile

Software Rapid Prototyping Methodology DSP hardware DSP CODE HOST DSP CODE GENERATION TOOLS C - CODE WRAPPER (C - Code or Simulink) C mex - CODE MATLAB COMPILER MATLAB CODE u Communication and Signal Processing Algorithms in MATLAB and “C” u Faster Execution of “C” Code u Acceleration on DSP Boards u Multiple DSP Boards C - CODE

Simulink u Simulink – Good system for algorithm evaluation in communication systems and signal processing – Ties in well with MATLAB environment and functions – More intuitive than (C/Matlab) code based evaluation u Used in software version of wireless testbed

RTW u Real-Time Workshop – Generates ANSI C-code for Simulink block diagrams – Tool for DSP rapid prototyping – Quick but inefficient/non-optimized C-code u RTW support for C67x generation boards – Hardware (DSP)-in-the-loop simulations

Wireless Channel User_Data Show Stats Update Parameters Decorrelating Detector Multiuser Detector Error Counter Chip MF Max. Likelihood Channel Est. Channel Estimation CDMA Wireless System Testbed Simulink Version Parameters Multiuser Detection Channel Estimation AWGN Channel User Data Error Rate Calculation Statistics Chip matched filter

Hardware Platform Issues u Current System – TI TMS320C6201 and TMS32C6701 EVM boards u Multiple DSP Processor Configuration Issues and Task Decomposition. u Planned Upgrade to BlueWave, Spectrum

DSPs in Simulink based Wireless testbed u Use of C67 based boards for simulations – Useful for study of individual algorithms on C67 generation processors u Multiprocessing issues – Need block diagram partitioning and code generation support from Simulink/RTW – Need cleaner external communication mechanisms in the C67x DSP – Need support for controlling multiple DSPs

Architectural Issues u Memory – More internal memory for large temporary matrices u Prefetch Buffers – Matrices stored as arrays in memory. u ASIC /FPGA glue support – To explore HW acceleration of critical parts of the code u Specialized instructions : Square roots, reciprocals, rotations ?

Compiler Support u Compilers for VLIW – Scheduling & Tracking units difficult in manual assembly – Challenge to generate code to keep all units busy. – Small Operating System Support u Architectural improvements require coordinated advances in compiler support.

W-CDMA Software Testbed Experiments u Third generation wireless communication systems u Multimedia capabilities u Multirate services u Quality of service u Higher Data Rates: 2 Mbps, 384 Kbps, 144 Kbps.

The Wireless Channel : Multiuser, Multipath Direct Path Reflected Paths Faces Attenuation, Delays and Doppler Effects : Unknown Channel Parameters Antenna Noise + MAI Desired User

W-CDMA Base-Station Receiver Channel Estimator Multiuser Detector Demux Decoder Data Pilot Estimated Amplitudes & Delays Demodulator Antenna

CDMA Uplink System Channel Encoder Channel Encoder Channel Encoder Spreading AWGN Matched Filter Matched Filter Channel Estimator Matched Filter Multi- User Detector Channel Decoder + User 1 d 1 User 2 d 2 User K d K R(t) User 1 d 1 ' User 2 d 2 ' User K d K ' y1y1 y2y2 yKyK Demux

Maximum Likelihood - Channel Estimation u Send a time-multiplexed Preamble (Pilot). u Channel properties extracted from received signal. u Compare received signal with known pilot and estimate channel parameters. u Keep estimate for remaining data bits (static). u Repeat preamble every frame, if no tracking.

The Maximum Likelihood Algorithm u Compute the correlation matrices u Compute the channel estimate  Calculate the noise covariance matrix K.  Calculate the channel impulse response vector z. u Extract the ampitudes and delays from the channel impulse response vector using least squares fit.

The ML Algorithm Complexity u Complex-Real Dot Product. u Complex-Real Matrix Product. u Complex -Real Product. u Real Square roots. – Solving quadratic equation for least squares fit. u Critical code : Matrix-vector multiplications / Dot Product Assuming Unity Noise Covariance Offline

Differencing Multistage - Multiuser Detection u Based on the principle of Parallel Interference Cancellation (PIC) u Cross-correlation information used to remove interference of other users from desired user u Repeated iterations for convergence u Differencing techniques applied for improving the performance of the algorithm

The Differencing Multistage Detector u Split the crosscorrelation matrix into lower, upper and the diagonal matrix. u Calculate the channel impulse response iteratively using  x is called the differencing vector.

Multistage Detector Complexity u Matrix Multiplication: – Computed only once for one frame u Dot Product: – Computed iteratively u Critical code: Dot Product

TI Tools Used u Evaluation Modules (EVM) for C6201 and C6701 fixed and floating point DSPs – 64 KB each internal program & data memory – 256 KB SBSRAM, 8 MB SDRAM (external) u C Compiler ver 3.0 from Code Generation Tools u Code Composer ver 4.02 for profiling the code

DSP Implementation: Channel Estimation u Floating point implementation found more feasible due to matrix inversions and square-roots. u Code optimized for the DSP u Use of Specialized approximate instructions – Approximate reciprocal square roots – Approximate reciprocals u Use of Assembly Code for critical part. – TI's C67 floating point benchmarks for Matrix- Vector Multiplication & Dot Product u Data Memory requirements for Channel Estimation

Use of Approximate Instructions L = 150, P =3, N= 31, SNR = 5dB, SINR = -10 dB Number of users --> Execution time(in milliseconds) --> Use of specialized instructions and assembly code on C6701 DSP C6701: Original C6701: with Intrinsics C6701: with Assembly 10% improvement 100% improvement

Optimization Effects for Channel Estimation Effect of optimizations for Channel Estimation on C6701--> Execution time(normalized) --> Base (-o3 -pm) Approx. (-o3 -pm with intrinsics) Assembly opt. (-o3 -pm with asm) 2.34X improvement 1.08X improvement

Data Memory Requirements Data to be placed in External memory 130 6

DSP Implementation: Multistage Detection u 16-bit Fixed Point C Code u Code optimized for the DSP u Use of Assembly Code for critical part – TI's C62 fixed point assembly benchmarks for Dot Product u Data memory requirements for Multistage Detection

Optimization Effects for Multistage Detector Effect of optimizations for Multistage Detection on C > Execution time(normalized) --> Global opt. (-o3 -pm -mu) Software Pipelining (-o3 -pm) Assembly opt. (-o3 -pm with asm) 5.22X improvement 7.47X improvement

Data Memory Requirements Data can be placed completely in Internal memory

Flops Count x 10 4 Total Number of Iterations Number of Flops Users:K=15 SNR=6dB Conventional Method Differencing Method conventional differencing 2X speedup for a three-stage detector

Real-Time Requirements Real-Time capability by C6201 DSP NUMBER OF USERS MAX BIT RATE PER USER (kb/s) SNR=10dB WindowSize=12 Conventional Method Differencing Method 12users 150kb/s

Trends in Recent DSPs u More internal memory and higher clock speeds – C6203 : 512 KB data, 384 KB program, 250 MHz – useful for uplink channel estimation algorithms. u Specialized Blocks in the DSP Core. – Viterbi decoding in C54. u Lower Voltage operation – 1.2 V in C5402, useful for saving power consumption in the mobile.

ASIC Implementation u Differencing Multistage Detector Block u MOSIS Tiny-Chip (40-pin DIP) – 8 synchronous users – 12-bit fixed point implementation – 6000 transistors – 1.2  m CMOS technology – 190kb/s for each user – 3-stage cascade delay < 15  s

Chip (Single Stage) Architecture SHIFT ALUALU RECODER REG (L+L’)A Control Logic Internal signals External signals

ASIC Architecture Features

Chip Layout 12-bit ALU Soft Decisions Cross- Correlation Recoding logic 2.0 mm

3-stage Cascade Mode Sin Hin Fin Load CLK Sout Hout Fout 1/2 Sin Hin Fin Load CLK Sout Hout Fout 1/2 Sin Hin Fin Load CLK Sout Hout Fout 1/2 Matche d Filter Output Detector Output Hand Shakin g Load R Clock Output Valid

Current Work – GPP vs. DSP Joint work with Prof. Sarita Adve, Praful Kaul, and Parthasarathy Ranganathan Performance of general-purpose systems Comparing GPP and DSP performance Complete 3G benchmark suite with all components Identification of key performance bottlenecks

Preliminary Results (1 of 4) u (4 algorithms: channel estimation, multi-stage detection, FIR filter, dot product) u Performance of general-purpose processors – Instruction-level parallelism features help (3.4X to 4.4X) – Media ISA extensions help (1.2X to 5.4X) n New extensions for packing/multiplication useful u Comparing GPP and DSP performance – GPPs outperform DSPs n UltraSPARC-II+VIS 2-4X better than TI TMS320C6701 n Caveat: compiler issues with DSP

Preliminary Results (2 of 4) u Important to study complete system including all components – Need for complete benchmark suite SOURCE CODING CHANNEL CODING SPREADING DECODER DETECTOR DEMODULATION CHANNEL ESTIMATION user’s bits TRANSMITTER RECEIVER (BASE STATION) (MOBILE USER) detected bits of all K users K USERS MODULATION

Preliminary Results (3 of 4) u Complete 3G benchmark suite with all components Source coding Channel coding Spreading Modulation/De-modulation Multi-stage detection Channel estimation Channel decoding Source decoding u Used either public-domain or in-house “C” code n Optimized with ISA extensions

Preliminary Results (4 of 4) u Choice of source coding standard makes big difference – G728 system: source coding/decoding dominant – GSM system: channel estimation/detection dominant

Conclusions u Implementation issues : Estimation & Detection Algorithms u Channel Estimation - Floating Point / External Memory u Multistage Detection - Fixed Point / Internal Memory u Specialized instructions : square root/reciprocals. u Additional support for complex arithmetic useful. u Recent trends in GPP / DSPs highly encouraging for next generation wireless communication applications.

Future Work u FPGA / ASIC Implementation via VHDL models and SPW u Program & DSP implementations for W-CDMA uplink and downlink – Blind Algorithms – Adaptive Algorithms u Architectural bottlenecks and compiler issues in DSPs to enhance suitability for next generation W-CDMA systems u Multiple DSPs – mixed DSP / FPGA for mNIC