Download presentation
Presentation is loading. Please wait.
Published byHorace Gilbert Modified over 9 years ago
1
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal
2
RICE UNIVERSITY Motivation Wireless Mobile device Baseband Programmable Communications Processor RF Unit A/D D/A Add-on PCMCIA Network Interface Card Higher Layers Mobile: Switch between standards and between parameters Base-station: varying number of users with different parameters
3
RICE UNIVERSITY The problem GPP DSP FPGA VLSI Performance Power Flexibility
4
RICE UNIVERSITY An approach for the solution Algorithms well understood at VLSI level Can design real-time systems. Pushing it higher in the chain Current DSPs not powerful enough for our application Using the IMAGINE simulator to see what kind of architecture features would be useful in a future DSP for such applications.
5
RICE UNIVERSITY History of my work Algorithms DSP VLSI FPGA IMAGINE Multiuser channel estimation Multiuser detection Task-partitioning Parallelism Pipelining Conventional arithmetic On-line arithmetic Instruction set extensions Co-processor support Functional unit design and usage Distant Past Recent Past Recent and Near Future
6
RICE UNIVERSITY Contents Programmable architecture design using the IMAGINE simulator Multiuser estimation and detection implementation Performance comparisons and results Other extensions for possible integration Conclusions
7
RICE UNIVERSITY The IMAGINE architecture and simulator IMAGINE is a media signal processor
8
RICE UNIVERSITY Why the IMAGINE simulator? Great for media processing algorithms Has a VLIW-based cluster -- DSP comparisons A good base architecture : 1024-pt FFT RSIM, SimpleScalar…: more general purpose architecture simulators
9
RICE UNIVERSITY What does the simulator give us? Execution time for the different parts of the code Functional unit utilization Insights into the bottlenecks Flexibility to add and remove functional units already present or design your own Graphical view of the schedule on the functional units
10
RICE UNIVERSITY Down-side 2 level C++ programming StreamC: transfers streams of data between main memory and stream register file (SRF) KernelC: transfers streams from the SRF to the ALU clusters Code optimized to the number of ALU clusters and the size of the data Compiler may fail register allocation if too many variables or functional units modified
11
RICE UNIVERSITY Contents Programmable architecture design using the IMAGINE simulator Multiuser estimation and detection implementation Performance comparisons and results Other extensions for possible integration Conclusions
12
RICE UNIVERSITY Typical workload representation (Base-station) Equalization FFT Viterbi decoding Channel estimation Multiuser detection Viterbi/Turbo decoding Multiple antennas Long spreading codes Space-Time codes Wireless LAN W-CDMA If you felt that life was too easy
13
RICE UNIVERSITY Estimation/Detection (64,32 sizes) Multiuser Estimation Kernel 1,2,3 Multiuser Detection Kernel 6, 7 Massaging matrices for detection Kernel 4, 5
14
RICE UNIVERSITY Kernels 1. Update: Update Rbb, Rbr 2. Mmult : multiply Rbb * A 3. Iterate: gradient descent 4. MmultL: Calculate L 5. MmultC: Calculate C 6. Mf: Matched Filter 7. Pic: 1 Parallel Interference Cancellation Stage
15
RICE UNIVERSITY Kernel 2 (mmult) for 3 +,2* Divider not being utilized Adders have limited FU utilization O(N 3 ) *, O(N 3 ) + Multipliers 100% in loop Replace / with *
16
RICE UNIVERSITY Kernel 2 (mmult)for 3 +,3* better adder utilization needs sufficient registers for scaling [register allocation may fail] code may also need slight tuning of variables for optimization
17
RICE UNIVERSITY Contents Programmable architecture design using the IMAGINE simulator Multiuser estimation and detection implementation Performance comparisons and results Other extensions for possible integration Conclusions
18
RICE UNIVERSITY FU utilization on each cluster Time for detection at 128 Kbps for each of 32 users at 500 MHz : 4000 cycles
19
RICE UNIVERSITY Comparisons with DSPs 05101520253035 10 -6 10 -5 10 -4 10 -3 10 -2 Execution time (in seconds) Users Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user Our architecture based on Imagine X x
20
RICE UNIVERSITY Current work Evaluating performance of wireless communication algorithms such as estimation, detection and decoding on this architecture Studying bottlenecks, functional unit design needed to attain real-time The insights gained from the design can also be applied to other processors such as DSPs.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.