Presentation is loading. Please wait.

Presentation is loading. Please wait.

TI C6701 VLIW MIMD.

Similar presentations


Presentation on theme: "TI C6701 VLIW MIMD."— Presentation transcript:

1 TI C6701 VLIW MIMD

2 Presentation Outline Introduction / Overview Differentiating Features
Assembly Syntax Instruction Flow Pipelining and Optimization Conclusion

3 Introduction TI’s C6000 family VLIW architectures
Flexibility from Software

4 Characteristics Chart
Characteristics Chart Architecture VLIW FPU Yes MFLOPs (Peak) 1000 16x16 MACs (MMAC/s) 334 8x8 MACs (MMAC/s) MIPS (Peak) 1336 MOPS (Peak) 336 Memory Bus Bandwidth (MB/s) 332 1K FP cfft (µsec) 108 1K 16 bit cfft (µsec) 1K FP dot product (µsec) 3.07 1K 16 bit dot product (µsec) 512 2 xFP Conv3x3 (msec) 7.11 512 2 x8 bit Conv3x3 (msec) 512 2 x8 bit Erosion/Dilation (msec) 3.62 Figure 1: TI Data Sheet

5 Basic Overview Eight 32-bit instructions fetched per clock cycle, called a fetch packet Two CPU multipliers , Six ALUs for execution. Two general-purpose register files (A and B), Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2), Two load-from-memory data paths per register file (LD1a, LD1b, LD2a, LD2b), Two data address paths (DA1 and DA2), and Two register file data cross paths (1X and 2X)

6 Architecture Overview

7 Differentiating Features
The features that differentiate the TI from other VLIW architectures are: Instructions that can be of varied length Predication in all instructions Pipelining of the branch functions

8 Assembly Syntax Label Parallel Bars Conditions Instruction
Functional Unit Operands Comments

9 Assembly Example

10 Instruction Flow Eight functional units - two separate groups of four
Each group has a separate data path and splits the general-purpose registers the two units are named .L1 and .L2, .M1 and .M2, .S1 and .S2, and .D1 and .D2 The .L units are responsible for Logical operations Data packing and unpacking Some arithmetic.

11 Instruction Flow 32 General Purpose Registers
64 Bit Operations using the LDDW instruction LD1a manages the least-significant 32 bits and LD1b handles the most-significant 32 bits The .D units are joined so that we can look at either register file for data, regardless of where the data address came from

12 Instruction Flow Fetch Packets occur at boundaries of 256-bit intervals Important! An execute packet can’t cross the fetch packet boundary The execute packet for parallel instructions is created by looking at the first bit in the instruction (The P bit) Maximum of eight instructions executed in parallel.

13 Architecture Overview

14 Pipelining & Optimization
The C6701 doesn’t have the ability to look ahead and schedule The number of instructions in the execute packet is the key to optimizing the code The number of clock cycles used in executing an instruction is called the number of delay slots Multiple cycle instructions will have significant effects on the delay slot count of an instruction

15 Pipelining & Optimization
Possible to have an execute packet that contains NOPS. By using multiple NOPS in parallel with a multi-cycle instruction we will make the next execute packet capable of using the previous multi-cycle instruction result If we use a cross-path during a multi-cycle instruction then we can’t use that cross path again until the instruction has finished

16 Execution Pipeline

17 AD vs. TI vs. Motorola

18 Conclusion The C6701 allows scheduling of instructions in the assembly code Unfortunately, a good understanding of the hardware is still necessary to be able to schedule instructions in an optimized way Thank You


Download ppt "TI C6701 VLIW MIMD."

Similar presentations


Ads by Google