Download presentation
Presentation is loading. Please wait.
1
Intro to the “c6x” VLIW processor
Texas Instruments TMSC6000 series TMSC6700 subseries – include floating point VLIW = Very Long Instruction Word
2
Operations in Parallel
registers Function units
3
Operations in Parallel
registers bypassing Function units
4
Non-orthogonal registers registers Bypass Function units
5
Non-orthogonal *** See TI's picture *** A B registers registers Bypass
Function units L1 S1 M1 D1 L2 S2 M2 D2 *** See TI's picture ***
6
Specialized Function Units
L units: arithmetic, compare, and logical ops S units: arithmetic, logical, branches, constant generation M units: multiplies D units: address generation / memory accesses
7
Complicated hardware registers registers
8
Explicit parallelism registers registers
9
Simple VLIW encoding Slots that cannot be utilized are filled with no-ops Bad for code density, cache utilization, energy, ...
10
C6X: Packets One bit of each instruction indicates whether next instruction can be executed in parallel (0 = “EOP”) Any slot can go to any function unit 1 1 1 1 1 1
11
C6X: Packets One bit of each instruction indicates whether next instruction can be executed in parallel Any slot can go to any function unit 1 1 1 1 1 1
12
C6X: Packets One bit of each instruction indicates whether next instruction can be executed in parallel Any slot can go to any function unit 1 1 1 1 1 1 1 1 1 1 1 1 Packet cannot cross an 8-word boundary Resources constrain which instructions can be combined in the same packet You can branch into the middle of a packet!
13
Explicit scheduling Delay slots must be respected – no HW interlocks or scoreboarding Multiply – 1 delay slot Load – 4 delay slots Branch – 5 delay slots B5 := B3 * B2 B5 := B3 * B2 B7 := B5 + B1 B7 := B5 + B1 Right Wrong
14
Predicated execution Example:
Why? To get rid of branches (5 delay slots * 8 wide ....) Basic idea: a comparison result is stored to a condition register ; this register is then used as an operand of other instructions, and its value causes those operations to be selectively enabled or squashed. [Condition registers: A1, A2, B0, B1, B2] Example: If (B3<B4) B3++ else B4++
15
Predicated execution With branches: With predicates: cmp B3, B4 bge L2
<nop> B3 := B3+1 b DONE L2: B4 := B4+1 DONE: cmplt B3, B B0 [B0] B3 := B3+1 [!B0] B4 := B4+1 ...and the last two can be issued in parallel! Control dependency has been converted to data dependency...
16
Assembly details .text .align 32 .global proc proc: mvk 4, b3
cmpgt b3, b4, b0 [ b0] mvk.S2 9, b5 || [!b0] mvk.S1 8, a5 stw a5, *-a15[4] .....
17
Fetch/execute pipeline
PG generate program address PS program address send PW program memory access PR fetch reaches CPU boundary DP instruction dispatch DC instruction decode E1 execute 1 E2 execute 2 E3 execute 3 E4 execute 4 E5 execute 5
18
Addressing Modes C equivalent *R (*R) *+R[ucst5] (R[ucst5])
*+R[offsetR] (R[offsetR]) *-R[offsetR] (R[-offsetR]) Special case: 15b offsets: *+B15[ucst15] *+B14[ucst15]
19
Addressing Modes Pre/post increment/decrement *++R , *R++
*++R[ucst5], *R++[ucst5] *--R[ucst5], *R--[ucst5] *++R[offsetR], *R++[offsetR] *--R[offsetR], *R--[offsetR]
20
Resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.