Download presentation
Presentation is loading. Please wait.
Published byEthan Gregory Modified over 9 years ago
1
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696 sridhar@rice.edu
2
RICE UNIVERSITY Motivation Viterbi decoding - One of the major bottlenecks in baseband processing [PHY] Need for flexibility in the algorithm parameters due to different protocols “read programmable” No architecture developed yet to meet real-time requirements of 3G systems. 2 - 8 Mbps range for wideband CDMA 100 Mbps range for wireless LAN
3
RICE UNIVERSITY Today Background Advanced DSP architectures -- TI C6x [15] Viterbi algorithm basics [10] Viterbi on TI DSPs [10] A programmable processor specifically designed for Viterbi [15]
4
RICE UNIVERSITY VLIW [Very Long Instruction Word] arch. Similar to a vector processor -- but multiple instructions -> multiple Func. Units FU’s are not all the same 32-bit architecture 8 functional units TI C6x architecture Inst 1 Inst 2 Inst 3 Inst 4 FU 1 FU 2 FU 3 FU 4 4-wide VLIW
5
RICE UNIVERSITY
6
8 VelociTI principles Parallel fetch, decode and execute Pipelined enough to make ADD critical path Instructions based on RISC Load - Store architecture Orthogonal - Instruction Set and Reg. File Determinism Conditional Instructions Instruction Packing
7
RICE UNIVERSITY 2 * 4 = 8 Functional Units .M Multiplication unit 16 bit x 16 bit signed/# packed/# .L arithmetic Logic unit Comparisons and logic operations Saturation arithmetic and absolute value .S Shifter unit Bit manipulation (set, get, shift, rotate) Branching, addition and packed addition .D Data unit Load/store to memory Addition and pointer arithmetic
8
RICE UNIVERSITY How powerful am I? 8 instructions per cycle Max: 6 adds per cycle 2 multiplies per cycle 2 load/stores per cycle 2 branches per cycle Idea is you will be using instructions in these ratios to get full FU utilization.
9
RICE UNIVERSITY C6x DSP Core
10
RICE UNIVERSITY C6x Datapath
11
RICE UNIVERSITY C6x Resource Constraints Instructions using the same FU 1 inst. / FU Cross Paths only 1 operand from other reg. file to (L,S,M) Loads and stores 2 loads and stores from 2 different reg. files Reads and writes max 4-reads from the same register No 2 writes to the same register :)
12
RICE UNIVERSITY Instruction Packing Fetch Packet Execute Packet Avoid NOPs in the instruction code Multi-cycle NOPs if absolutely necessary LSB- “p” bit of instruction for packing A || B || C,D || E, F, G || H 8 instructions instead of 32 A B C D 1 1 0 1 E F G H 0 0 1 0
13
RICE UNIVERSITY Conditional Instructions All instructions can be conditioned based on the value in registers A1,A2,B0,B1,B2 Avoids branch latencies If condition not met by end of first phase of execution, results not written back to reg. file Conditional loads/stores squashed before data phase
14
RICE UNIVERSITY C6x Pipeline Fetch (if necessary) - 4 phases Address Generate Address Send Access Ready Wait Fetch Packet Receive Decode - 2 phases Instruction dispatch (if necessary) Instruction decode Execute - 10 phases Most 1 phase
15
RICE UNIVERSITY Some interesting instructions Saturation Bit-counting -- Image coding Integer-comparison Bit-manipulation Seed generation for reciprocal instructions
16
RICE UNIVERSITY Other details 64 KB internal program and data DMA - peripherals to memory Intrinsics in code for better programming similar to using “ViS” in UltraSPARC Software pipelining of loops PERFORMANCE: 5-10X higher clock -- higher pipeline (2-4X) Additional ALUs
17
RICE UNIVERSITY Additional features in C64x SIMD support Communication-specific instructions interleaving, galois field multiply Bit count and rotate hardware 64 32-bit registers Lower resource constraints No more NOPs needed ever [no boundaries]
18
RICE UNIVERSITY C64x DSP Core
19
RICE UNIVERSITY Today Background Advanced DSP architectures -- TI C6x [15] Viterbi algorithm basics [10] Viterbi on TI DSPs [10] A programmable processor specifically designed for Viterbi [15]
20
RICE UNIVERSITY Viterbi Decoding Encoder Decoder k k n > k n Rate k/n = 1/2 Convolutional Encoder
21
RICE UNIVERSITY Error Protection States = 2^(FFs) = 2^(Constraint Length - 1) Cannot go from any state to any state
22
RICE UNIVERSITY Trellis for decoding
23
RICE UNIVERSITY Trellis for an input sequence
24
RICE UNIVERSITY Error detection Branch metric = “Distance” between received symbol pair and possible symbol pairs Path metric = Accumulated error metric
25
RICE UNIVERSITY Error-correction
26
RICE UNIVERSITY Stages in Viterbi Decoding Calculate Branch metrics for all states every stage Update Path metrics for all states every stage At the end, Traceback the trellis to get the decoded bits
27
RICE UNIVERSITY Computations Branch metrics: Hamming distance: (XOR) and Count 1’s Euclidean distance: squared distance Path metrics: Add Branch metrics to existing path metrics Compare for minimum and Select minimum Survivor Traceback: Linked list /Pointer chasing Memory Intensive / Sequential Operations
28
RICE UNIVERSITY Today Background Advanced DSP architectures -- TI C6x [15] Viterbi algorithm basics [10] Viterbi on TI DSPs [10] A programmable processor specifically designed for Viterbi [15]
29
RICE UNIVERSITY Viterbi support in different processors C54x Special hardware accelerator ACS unit with 2 ACC and split ALU Viterbi butterfly (2 ACS) in 4 cycles C62x nothing special C6416 Viterbi coprocessor K = 5-9,Rate = 1/2,1/3,1/4
30
RICE UNIVERSITY Viterbi Coprocessor in C6416
31
RICE UNIVERSITY Viterbi Coprocessor in C6416 SM, SD and HD memory not accessible to DSP
32
RICE UNIVERSITY Today Background Advanced DSP architectures -- TI C6x [15] Viterbi algorithm basics [10] Viterbi on TI DSPs [10] A programmable processor specifically designed for Viterbi [15]
33
RICE UNIVERSITY Need for VSP architecture Large amount of memory access Traceback decoding Not efficient on a GPP Program instructions in a GPP is of a higher order than complexity of the algorithm
34
RICE UNIVERSITY VSP architecture
35
RICE UNIVERSITY Branch Metric Calculation
36
RICE UNIVERSITY Path Metric Calculation
37
RICE UNIVERSITY Traceback Unit
38
RICE UNIVERSITY Traceback with survivor updates Start Filling the Trellis Start Traceback 5*Constraint Length Symbol Decoded Update Survivor Path for most recent symbol
39
RICE UNIVERSITY Survivor Path Updates
40
RICE UNIVERSITY Circular updates
41
RICE UNIVERSITY Software Programming Small but specialized instruction set LOAD, ACS Shorter execution time All 3 subprocessors programmed independently 10 ns, (100 MHz) in 1990 to get 1.5 Mbps
42
RICE UNIVERSITY Conclusions Viterbi algorithm important for implementation in a programmable communication receiver Approaches have been as co-processor support to DSPs or specialized processors. We are yet to design programmable processors that meet real-time requirements for 100 Mbps applications.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.