RICE UNIVERSITY On the architecture design of a 3G W-CDMA/W-LAN receiver Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia Communication This work is supported by Nokia, TI, TATP and NSF
RICE UNIVERSITY Introduction A baseband communications processor Wireless LAN Wideband CDMA RENE
RICE UNIVERSITY Motivation No architecture developed yet to meet real- time requirements of 3G systems. Mbps range for wideband CDMA 100 Mbps range for wireless LAN Design factors that makes the problem harder Low power Flexibility
RICE UNIVERSITY Previous Work Designing algorithms from an implementation perspective algorithms with high degree of parallelism fixed-point computations simple operations - multiplications/additions Example: multiuser channel estimation & detection Real-time implementation on DSPs/FPGAs/ASICs area-time tradeoffs
RICE UNIVERSITY Possible contributions of this work A real-time low-power VLSI architecture design using on-line arithmetic A real-time programmable architecture design using a media processor simulator -- IMAGINE Integrating these two architectures in one.
RICE UNIVERSITY Contents Low-power VLSI architecture design using on- line arithmetic Programmable architecture design using the IMAGINE simulator Conclusions
RICE UNIVERSITY On-line arithmetic Uses a redundant number representation. Pipelined digit-serial arithmetic with MSDF computations. Successive computations as soon as inputs available ( = 1..4, typically). Algorithms available for various operations (+,*,/,sqrt) and for fixed-point computations. z5z5 …z4z4 z3z3 z2z2 z1z1 Output z …y5y5 y4y4 y3y3 y2y2 y1y1 Input y …x5x5 x4x4 x3x3 x2x2 x1x1 Input x
RICE UNIVERSITY Why is on-line arithmetic useful? Conventional operations in 3G wireless systems high precision operations (16-32 bits) but with low precision outputs. Only most significant digits (1-3 bits) needed. Use MSDF computation to find the needed digits and avoid computation of the successive digits. Digit-serial computations and hence, low power Detection
RICE UNIVERSITY Redundant number systems Radix -r number system: digit has |r| values: 0,1,2…..,r-1 Redundant number system: digit has q >|r| values r+2 q 2r-1 Example: each digit in the number has a sign associated with it. 10(-1)2 = 992 has 2 equivalent representations. Redundancy helps in carry-free additions - MSDF
RICE UNIVERSITY Adder Implementation t conv – conventional adder time per bit t OL – online delay time per digit d – bit-precision
RICE UNIVERSITY On-line radix-4 adder Digit serial inputs Digit serial outputs Digit selection Carry Save Adders Residual feedback
RICE UNIVERSITY Comparison with regular adders Addition time and area independent of digit precision (X area dependent on precision) Savings in time obtained by chaining operations as successive operations can start as soon as MSD is obtained.
RICE UNIVERSITY Signal Amplitude Time taken for addition On-line addition Conventional addition Dependency of execution time for on-line addition on SNR
RICE UNIVERSITY Detection Example Multi- user Single user Detector 3.00m* t OL =8t CMF =24Throughput +m* S*t OL =94 +2*t CMF = t MF +S*t PIC (2*S-1)*t CPIC Latency log 2 (d)*t conv = m* t OL =8(log 2 (N)+2)* Throughput t OL +t stop = 14log 2 (d)*t conv = (log 2 (N)+2)* Latency SpeedupOn-lineConventional
RICE UNIVERSITY Low power VLSI design Power savings due to 2 reasons eliminating unwanted computations digit-serial hardware Real-time requirements met by proper pipelining of computations and exploiting parallelism in the algorithms.
RICE UNIVERSITY Contents Low-power VLSI architecture design using on- line arithmetic Programmable architecture design using the IMAGINE simulator Conclusions
RICE UNIVERSITY A programmable architecture simulator Flexibility in the algorithm requirements channel dependent computations changing algorithms on-the-fly seamless switching between wireless LAN and wideband CDMA. Simulator needed to test performance of algorithms extensions/modifications for critical operations
RICE UNIVERSITY The IMAGINE architecture and simulator IMAGINE is a media signal processor, built at Stanford. Many common workload features Good starting point to explore. Local expertise - Dr. Scott Rixner
RICE UNIVERSITY IMAGINE architecture Great for media processing algorithms 1024 pt FFT in 7.4 s on a 500 MHz processor with a 8-cluster (48 units) 3.8W of power Great for parallel, vector and streaming computations Performance/extensions to sequential computation kernels such as Viterbi traceback needs to be investigated.
RICE UNIVERSITY Conclusions On-line arithmetic useful for a low power real- time implementation A programmable real-time architecture is being investigated using the IMAGINE simulator Aim is to then integrate these two features