Download presentation
Presentation is loading. Please wait.
Published byByron Nichols Modified over 9 years ago
1
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia Communications This work has been supported by Nokia, TI, TATP and NSF
2
RICE UNIVERSITY Single-slide version of my talk Algorithms DSP VLSI FPGA IMAGINE Multiuser channel estimation Multiuser detection Task-partitioning Parallelism Pipelining Conventional arithmetic On-line arithmetic Instruction set extensions Co-processor support Functional unit design and usage Distant Past Recent Past Recent and Near Future
3
RICE UNIVERSITY Contents Algorithms for channel estimation and detection Conventional and on-line arithmetic designs Programmable architecture design using the IMAGINE simulator
4
RICE UNIVERSITY Estimation - detection algorithms? Sophisticated, computationally complex algorithms proposed for 3G - 4G standards Typically need complex operations, huge matrix sizes, matrix inversions Difficult for hardware implementation and for real- time performance
5
RICE UNIVERSITY Multiuser channel estimation algorithm = {+1, -1} : training/tracking bits = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed, e.g. 32) K = number of users (variable, <=N) = maximum likelihood channel estimate
6
RICE UNIVERSITY Iterative scheme for channel estimation Bit-streaming : suitable for tracking (window length L) Method of gradient descent Stable convergence behavior Simple fixed-point VLSI architecture [ASAP 2000]
7
RICE UNIVERSITY Comparisons DSPs unable to exploit bit-level parallelism Inefficient storage of bits Replacing multiplications by additions/subtractions
8
RICE UNIVERSITY Multiuser detection innovations Developed a simple architecture for asynchronous multiuser detection for CDMA [ +, x ] Bit-streaming reduced latency eliminates window edge computations lower memory requirements Pipelined stages higher throughput (with more hardware)
9
RICE UNIVERSITY Block Pipelined Detector Variable latency [Worst case (1st bit) D*latency per bit] 2 extra edge bit computations per stage. 11 MF 22 Bits 12-21 TIME 1 MF 12 Bits 2-11 1 PIC 1211 PIC 22 1 PIC 1211 PIC 22 1 PIC 1211 PIC 22
10
RICE UNIVERSITY Bit-streaming multiuser detection Savings in memory by D 2
11
RICE UNIVERSITY Pipelining the multiuser detector Matched Filter (causal) PIC - Stage 1 PIC - Stage 2 PIC - Stage 3 TIME Latency = 2*latency per bit (D/2 speedup over block) eliminated edge bit computations. [ISCAS 2001]
12
RICE UNIVERSITY Contents Algorithms for channel estimation and detection Conventional and on-line arithmetic designs Programmable architecture design using the IMAGINE simulator
13
RICE UNIVERSITY Matched filter with conventional arithmetic T ~ log(N) * log(d) N - dot product size d - precision
14
RICE UNIVERSITY Conventional MF using CSAs T ~ a + log(d+c) a,c - constants
15
RICE UNIVERSITY Key concept in on-line arithmetic Conventional detection - high precision operations (8-32 bits) followed by testing for sign. Actual detection dependent only on most significant digits (1-3 bits). Use MSDF computation to find the sign and avoid computation of the successive digits. [Arith-15] Detection
16
RICE UNIVERSITY Comparisons of arithmetic schemes
17
RICE UNIVERSITY Using on-line arithmetic for detection Channel -1,+1 -0.500.51 0 1 1.5 2 2.5 3 3.5 4 4.5 5 Received Signal Amplitude (Normalized) Time taken for addition (Normalized)
18
RICE UNIVERSITY Equations Probability of error for optimal BPSK detection Probability of error for on-line BPSK detection r – radix of the number system p – number of digits
19
RICE UNIVERSITY Probability of error using on-line
20
RICE UNIVERSITY On-line MF implementation T ~ c c - constant
21
RICE UNIVERSITY Throughput comparisons
22
RICE UNIVERSITY Area comparisons
23
RICE UNIVERSITY Implementing higher modulation schemes
24
RICE UNIVERSITY Conclusions on arithmetic schemes CSAs better than straightforward implementation 1.35 - 1.6X speedup for 8-32 bit precision 1.64 - 1.14X less area If reduced precision computations, on-line still better 1.67 - 2.12X speedup over CSA 0.64 - 12.73X less area over CSA
25
RICE UNIVERSITY Contents Algorithms for channel estimation and detection Conventional and on-line arithmetic designs Programmable architecture design using the IMAGINE simulator
26
RICE UNIVERSITY A programmable architecture? Flexibility in the algorithm requirements channel dependent computations changing algorithms on-the-fly seamless switching between wireless LAN and wideband CDMA -- RENE. Simulator needed to test performance of algorithms extensions/modifications for critical operations
27
RICE UNIVERSITY Algorithms needed for 3G base-band base-station implementation Equalization FFT Viterbi decoding Channel estimation Multiuser detection Viterbi/Turbo decoding Multiple antennas Long spreading codes Space-Time codes Wireless LAN W-CDMA If you felt that life was too easy
28
RICE UNIVERSITY The IMAGINE architecture and simulator IMAGINE is a media signal processor, built at Stanford. Many common workload features Good starting point to explore. Local expertise - Dr. Scott Rixner ( rixner@rice.edu )
29
RICE UNIVERSITY IMAGINE architecture Great for media processing algorithms 1024 pt FFT in 7.4 s on a 500 MHz processor with a 8-cluster (48 units) 3.8W of power Great for parallel, vector and streaming computations Performance/extensions to sequential computation kernels such as Viterbi traceback needs to be investigated.
30
RICE UNIVERSITY Conclusions Algorithm steps for designing communication systems Design hardware-efficient versions Fixed-point implementation DSP implementation - bottlenecks Task partitioning, pipelining, parallelism Computer arithmetic ideas -- VLSI Integration into a programmable processor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.