Download presentation
Presentation is loading. Please wait.
1
Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian Flautner University of Michigan 1 ARM Ltd
2
Advanced Computer Architecture Laboratory University of Michigan2 Contents Software defined radio Categories of wireless networks Core technologies for future networks Case study : W-CDMA Network Major algorithms Workload characterization Architectural implications
3
Software Defined Radio
4
Advanced Computer Architecture Laboratory University of Michigan4 Wireless Communication System Upper Protocol Layers Physical Layer (PHY) Application bits Baseband Processing Analog Front-end Packets “Air” MAC LINK Network Transport PPP IP TCP/UDP
5
Advanced Computer Architecture Laboratory University of Michigan5 Anatomy of Cellular Phone
6
Advanced Computer Architecture Laboratory University of Michigan6 Audio AMR/QCELP PHY MAC Protocol on Wireless Platform Upper layers Physical layer LINK Network Transport ASIC (Hardware) GPP (Software) Video MPEG GPP (Software) DSP/ Accelerator Source coding Application Processor Baseband Processor
7
Advanced Computer Architecture Laboratory University of Michigan7 Software Defined Radio (SDR) Use software routines instead of ASICs for the physical layer operations of wireless communication system ASICs (PHY) Programmable Hardware Software Routines Both Analog Frontend and Digital Baseband are the scope of SDR
8
Advanced Computer Architecture Laboratory University of Michigan8 Levels of SDR TierNameDescription Tier 0 Hardware Radio (HR) Implemented using hardware components. Cannot be modified Tier 1 Software Controlled Radio (SCR) Only control functions are implemented in software: inter-connects, power levels, etc. Tier 2 Software Defined Radio (SDR) Software control of a variety of modulation techniques, wide-band or narrow-band operation, security functions, etc. Tier 3 Ideal Software Radio (ISR) Programmability extends to the entire system with analog conversion only at the antenna. Tier 4 Ultimate Software Radio (USR) Defined for comparison purposes only
9
Advanced Computer Architecture Laboratory University of Michigan9 Why we need SDR ? Seamless wireless connection – End User Widely different wireless protocols TDMA : GSM, AMPS CDMA : IS-95, cdma2000, W-CDMA, IEEE 802.11b OFDM : IEEE 802.11a/g/n, WiMAX Needs a terminal that can support multiple wireless protocols Easy infrastructure upgrade – Service Provider Wireless protocols evolve continuously Ex) W-CDMA W-CDMA + HSDPA Time to market – Manufacturer Reduce hardware development time and cost
10
Advanced Computer Architecture Laboratory University of Michigan10 Where can we use SDR ? Basestations Weak constraints on power and area Support several hundred subscribers Will be commercialized first Wireless terminals Tight constraints on power and area. Will be commercialized next
11
Advanced Computer Architecture Laboratory University of Michigan11 Why SDR is challenging ? Analog Frontend Must be tunable across a range of carrier frequencies and bandwidths. Digital Baseband Super computer level computation power. > 50 Gops per subscriber Tight power budget. 200 ~ 300 mW (@terminal) High level of programmability. Combination of heterogeneous signal processing algorithms.
12
Advanced Computer Architecture Laboratory University of Michigan12 Our Strategy Performance Exploit the parallelism in signal processing and forward error correction (FEC) algorithms Power Limit the programmability to minimize power consumption. Minimize both active and idle mode power consumption There exists trade off between power efficiency and programmability
13
Categories of Wireless Networks
14
Advanced Computer Architecture Laboratory University of Michigan14 Categories of Wireless Networks
15
Advanced Computer Architecture Laboratory University of Michigan15 WWAN (Wireless Wide Area Network)
16
Advanced Computer Architecture Laboratory University of Michigan16 WLAN / WMAN WMAN : Wireless Metro Area Network For last mile problem 802.16d : Fixed WiMax 802.16e : Mobile WiMax WLAN : Wireless Local Area Network High data rate Poor mobility support
17
Advanced Computer Architecture Laboratory University of Michigan17 WPAN (Wireless Personal Area Network) Interconnecting personal devices
18
Core technologies of future networks
19
Advanced Computer Architecture Laboratory University of Michigan19 OFDM (Orthogonal Frequency Division Multiplexing) Transmit signal over several sub-carriers. Frequency spectrum of sub-carriers are overlapped. (High spectral efficiency) Highly susceptible to frequency error in receiver.
20
Advanced Computer Architecture Laboratory University of Michigan20 Major Computation in OFDM system FFT / IFFT N = 64 : IEEE 802.11a N = 256~2048 : IEEE 802.16 WiMax Data precision : 12~16bits Amount of computations for OFDM operation ~ 10 8 complex multiplications / sec
21
Advanced Computer Architecture Laboratory University of Michigan21 MIMO (Multiple Input Multiple Output) Use multiple antennas for signal transmission and reception In ideal case, linearly increase channel capacity Can effectively compensate multipath fading effect Significantly increase receiver complexity Channel Capacity C = W log 2 (1+SNR) Channel Capacity C = min(n, m) * W log 2 (1+SNR)
22
Advanced Computer Architecture Laboratory University of Michigan22 Computation in MIMO receiver Amount of computation in MIMO receiver M : # of Tx/Rx antenna L T : Length of preamble L P : Length of payload 4 Tx/Rx antenna, 100 Mbps, 64 QAM, ½ coding rate ~ 6 x 10 8 Computations / Sec
23
Advanced Computer Architecture Laboratory University of Michigan23 LDPC code Low Density Parity Check (LDPC) code Turbo code like coding gain with lower implementation cost. Encoding Matrix multiplication, c = xG G (Generator matrix) is large matrix. (e.g. 4K X 4K matrix) Decoding Equivalent to find most probable vector x such that Hx mod 2 = 0. H (Parity check matrix) is large sparse matrix. Implementation There exist trade-off between coding gain and implementation complexity
24
Advanced Computer Architecture Laboratory University of Michigan24 Hybrid ARQ Reuse error frames for the decoding of retransmitted frame Require huge buffer space
25
Case Study : W-CDMA system
26
Major Algorithms
27
Advanced Computer Architecture Laboratory University of Michigan27 Physical layer of W-CDMA Error Correction Overcome severe error in short time interval Assign signal waveform optimal for data transmission Suppress the signal term in outside of stop band
28
Advanced Computer Architecture Laboratory University of Michigan28 Channel Encoder/Decoder Encoder Add systematic redundancy on source data Decoder Fix errors on received data with the systematic redundancy information generated by encoder W-CDMA system uses Convolutional code (for short voice and control message) Turbo code (for video stream and high speed packet data)
29
Advanced Computer Architecture Laboratory University of Michigan29 Channel Encoder Consists of flip-flops and exclusive OR gates Has negligible impact on workload Output 0 G 0 = 561 (octal) Input DDDDDDDD Output 1 G 1 = 753 (octal)
30
Advanced Computer Architecture Laboratory University of Michigan30 Channel Decoder Determine maximally probable code sequence from the received sequence. Select C having minimum distance with received sequence r One of dominant workload C1C1 C2C2 CNCN r d1d1 d2d2 dNdN...... - {c i } : code set - r : received signal
31
Advanced Computer Architecture Laboratory University of Michigan31 Channel Decoder – Viterbi Algorithm Most popular decoding algorithm of convolutional code Consists of three steps: Branch metric calculation (BMC) abs(a-b), Parallelizable Add compare select (ACS) min(a+b, c+d), Parallelizable Trace back (TB) Recursive pointer tracing, Sequential Amount of operation in W-CDMA 16Kbps voice : ~2Gops
32
Advanced Computer Architecture Laboratory University of Michigan32 Channel Decoder –Turbo decoder Two algorithms are widely used SOVA (Soft Output Viterbi Algorithm) Less computation intensive Lower error correction performance Max-LogMap algorithm More computation required Higher error correction performance Amount of operation in W-CDMA For 128 Kbps streaming data : ~18 Gops
33
Advanced Computer Architecture Laboratory University of Michigan33 Turbo Decoder Based on the multiple iteration of SOVA / Max-LogMap blocks. More iterations show better performance.
34
Advanced Computer Architecture Laboratory University of Michigan34 Block Interleaver/Deinterleaver Overcome severe signal attenuation within short time interval which frequently appears at wireless channel. Interleaver (@transmitter): Randomize the sequence of source data. Deinterleaver (@receiver): Recover original sequence by reordering. Amount of operation : < 10 Mops 123456789 InterleavingDeinterleaving 147258369 123456789 147258369
35
Advanced Computer Architecture Laboratory University of Michigan35 Spreader/Despreader Allow the transmission of several signals at the same time. (x[n] and y[n] in the below diagram) It is based on the orthogonality between spreading codes
36
Advanced Computer Architecture Laboratory University of Michigan36 Spreader/Despreader Spreader / Despreader also suppress noise Amount of operation : ~4 Gops
37
Advanced Computer Architecture Laboratory University of Michigan37 Scrambler/Descrambler Randomize the output signal by multiplying pseudo random sequence so called scrambling code. Allow multiple terminals to communicate at the same time. Amount of operation : ~ 3 Gops Terminal 1, with scrambling code n Terminal 2, with scrambling code m
38
Advanced Computer Architecture Laboratory University of Michigan38 Low Pass Filter Suppress the signal terms at the outside of stop band frequency. Filtering Time domain Freq. domain Impulse signal sinc function Band limited signal Band unlimited signal
39
Advanced Computer Architecture Laboratory University of Michigan39 Low Pass Filter Use conventional FIR filter Number of filter tap (N) = 32 ~ 64 Amount of operation : ~ 12 Gops
40
Advanced Computer Architecture Laboratory University of Michigan40 Rake Receiver – Multipath fading Rake receiver mitigates multipath fading effect Multipath fading is a major cause of unreliable wireless channel characteristic x(t) y(t) = a 0 x(t)y(t) = a 0 x(t)+a 1 x(t-d 1 )y(t) = a 0 x(t)+a 1 x(t-d 1 )+a 2 x(t-d 2 )
41
Advanced Computer Architecture Laboratory University of Michigan41 Rake Receiver - Functions Ideally the function of rake receiver is to aggregate the signal terms with proper delay compensation y(t) = a 0 x(t)+a 1 x(t-d 1 )+a 2 x(t-d 2 ) r(t) = a 0 x(t-t dealy )+a 1 x(t-d 1 -d est1 )+a 2 x(t-d 2 -d est2 ) = (a 0 +a 1 +a 2 ) * x(t-t delay ) Rake receiver We need to know delay spread of received signal that randomly varies
42
Advanced Computer Architecture Laboratory University of Michigan42 Rake Receiver – Detect Delay Spread Scan the received signal in frame buffer while computing correlation with scrambling code sequence. Received signal Correlation window Correlation Result a0a0 a1a1 a2a2 0 d1d1 d2d2
43
Advanced Computer Architecture Laboratory University of Michigan43 Computation of Rake Receiver Correlation computation : L W L B F L W : Correlation window = 320 L B : Frame buffer size = 5120 F : Operation Frequency = 50 ~ 80 Mega Multiplications / sec Multiplications can be converted into subtraction Amount of operation in W-CDMA : ~25 Gops Most dominant workload
44
Advanced Computer Architecture Laboratory University of Michigan44 Rake Receiver – Overall Architecture Detects delay spread Compensates propagation delay recombine signal terms without delay
45
Advanced Computer Architecture Laboratory University of Michigan45 Power Control Receiver controls the transmission power of transmitter in order to minimize the interference to other users. Required computation is negligible TerminalBasestation Refrence level uduuddu Strength of pilot signal is below the reference level Terminal sends UP command Strength of pilot signal is above the reference level Terminal sends DOWN command : Pilot Signal u : Power Control Command
46
Advanced Computer Architecture Laboratory University of Michigan46 H/W operation states Radio resource control state defined in W-CDMA specification operation states defined according to H/W activity Idle Control Hold Active For long idle period between sessions Periodic wake up for control message reception Minimum workload but dominate terminal standby time For short idle period between packet burst Hold narrow control channel for fast transition to Active Intermediate workload For packet burst transmission period Use high speed packet channels up to 2Mbps Most heavily loaded state
47
Workload Characterization
48
Advanced Computer Architecture Laboratory University of Michigan48 Workload Profile One operation is equivalent to one RISC instruction Searcher, Turbo decoder, and LPF are dominant workloads Workload profile varies according to operation state
49
Advanced Computer Architecture Laboratory University of Michigan49 Processing Time Requirement Mixture of algorithms with various processing time requirements Classified into two categories Heavy workload with long processing time (turbo decoder, searcher) Light workload with short processing time (Scrambler, spreader, LPF, Power control)
50
Advanced Computer Architecture Laboratory University of Michigan50 Parallelism Most heavy workload algorithms have significant vector parallelism Data width of most operation is 8 bit
51
Advanced Computer Architecture Laboratory University of Michigan51 Memory Access Pattern Huge memory is not required Traffic between algorithm is not dominant Access rate of scratch pad memory is very high.
52
Advanced Computer Architecture Laboratory University of Michigan52 Instruction Breakdown ADD/SUB are dominant instruction Multiplication is not dominant in heavy workloads
53
Advanced Computer Architecture Laboratory University of Michigan53 Frequent Computations Most multiplications are simplified into cheaper operations Multiplication in LPF-Rx can not be simplified because both operands are 16bit integer number.
54
Architectural Implications
55
Advanced Computer Architecture Laboratory University of Michigan55 Architectural Implications SIMD because We can exploit vector parallelism in W-CDMA algorithms Highly power efficiency can be achieved by sharing control logic between datapath elements. Chip multiprocessor because There exist substantial algorithm level parallelism There exist many tiny sequential algorithms Multiple SIMD + Scalar SIMD …. Scalar Interconnection Network
56
Advanced Computer Architecture Laboratory University of Michigan56 Architectural Implications Memory structure Cache free Memory access pattern exhibits very dense spatial locality. Small data memory (<64K) Small instruction memory (<4K) Simple interconnection network Low inter-processor communication is possible by algorithm level task mapping on each PE.
57
Advanced Computer Architecture Laboratory University of Michigan57 Architectural Implication Power management Large workload variation according to operation state and radio channel condition change. Various power management schemes can be applied DVS, DFS, Clock gating. Idle mode power must be minimized because it dominates terminal standby time.
58
Advanced Computer Architecture Laboratory University of Michigan58 W-CDMA benchmark suite C based implementation of W-CDMA physical layer operation. Used for the workload characterization done in this paper. Available at www.eecs.umich.edu/~sdrg
59
Advanced Computer Architecture Laboratory University of Michigan59 Conclusion We discussed : what is SDR and why it is challenging topic for embedded system. the evolution history of wireless protocols and what are the core technologies of emerging protocols. We analyzed : the workload characteristic of W-CDMA protocol and its architectural implication.
60
Backup Slides
61
Advanced Computer Architecture Laboratory University of Michigan61 Viterbi Algorithms –Trellis Diagram Viterbi algorithm is based on trellis diagram. Trellis diagram represents all possible state transition of encoder.
62
Advanced Computer Architecture Laboratory University of Michigan62 Viterbi Algorithm - BMC BMC (Branch metric calculation) operation is to compute difference between the received sequence r and outputs of trellis diagram. BMC i,j = distance(r ij, o ij )=abs(r ij, o ij ) o ij : output of state transition form i to j r ij : corresponding received sequence All BMC operation in a trellis diagram can be done in parallel. distance between r(01) and C n (10) = 1 + 1 = 2 CnCn
63
Advanced Computer Architecture Laboratory University of Michigan63 Viterbi Algorithm - ACS ACS(Add Compare Select) operation is: This procedure is equivalent to finding a local optimal code sequence. If C 1 has smallest ACS value at node state i, then the ACS values of C 2 and C 3 are always greater than that of C 1 Add Compare, Select
64
Advanced Computer Architecture Laboratory University of Michigan64 Viterbi Algorithm - TB Trace back a code sequence which is most close to the received sequence Sequential algorithm
65
Advanced Computer Architecture Laboratory University of Michigan65 Block Interleaver/Deinterleaver Interleaver Write row by row sequentially read column by column according to the predefined permutation pattern Deinterlever Write column by column according to the predefined permutation pattern read row by row sequentially
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.