Download presentation
Presentation is loading. Please wait.
Published byJack Sherman Modified over 9 years ago
1
Texas Instruments Course DSP
2
Module 1 Outline Introduction Introduction TI DSP Families C6000 Architecture C6000 Roadmap
3
Learning Objectives Why process signals digitally? Definition of a real-time application. Why use Digital Signal Processing processors? What are the typical DSP algorithms? Parameters to consider when choosing a DSP processor. Programmable vs ASIC DSP.
4
Present Day Applications Consumer Audio Stereo A/D, D/A Stereo A/D, D/A PLL PLL Mixers Mixers Multimedia Stereo audio Stereo audio Imaging Imaging Graphics palette Graphics palette Voltage regulation Voltage regulation Wireless / Cellular Voice-band audio Voice-band audio RF codecs RF codecs Voltage regulation Voltage regulation HDD PRML read channel PRML read channel MR pre-amp MR pre-amp Servo control Servo control SCSI tranceivers SCSI tranceivers Automotive Digital radio A/D/A Digital radio A/D/A Active suspension Active suspension Voltage regulation Voltage regulation DTAD Speech synthesizer Speech synthesizer Mixed-signal processor Mixed-signal processor DSP: Technology Enabler
5
Performance Interfacing Power Size Ease-of Use Programming Programming Interfacing Interfacing Debugging DebuggingIntegration Memory Memory Peripherals Peripherals Cost Device cost Device cost System cost System cost Development cost Development cost Time to market Time to market System Considerations
6
Different Needs? Multiple Families! Different Needs? Multiple Families! Lowest Cost Control Systems Motor Control Storage Digital Ctrl Systems C2000 (C20x/24x/28x) ‘C1x ‘C2x C6000 (C62x/64x/67x) ‘C3x ‘C4x ‘C8x Multi Channel and Multi Function App's Comm Infrastructure Wireless Base-stations DSL Imaging Multi-media Servers Video Max Performance with Best Ease-of-Use Efficiency Efficiency Best MIPS per Best MIPS per Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP C5000 (C54x/55x) ‘C5x
7
Why go digital? Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance. Examples: FIR filter with linear phase. Adaptive filters.
8
Why go digital? Analogue signal processing is achieved by using analogue components such as: Resistors. Capacitors. Inductors. The inherent tolerances associated with these components, temperature, voltage changes and mechanical vibrations can dramatically affect the effectiveness of the analogue circuitry.
9
Why go digital? With DSP it is easy to: Change applications. Correct applications. Update applications. Additionally DSP reduces: Noise susceptibility. Chip count. Development time. Cost. Power consumption.
10
Why NOT go digital? High frequency signals cannot be processed digitally because of two reasons: Analog to Digital Converters, ADC cannot work fast enough. The application can be too complex to be performed in real-time.
11
Real-time processing DSP processors have to perform tasks in real-time, so how do we define real- time? The definition of real-time depends on the application. Example: a 100-tap FIR filter is performed in real-time if the DSP can perform and complete the following operation between two samples:
12
We can say that we have a real-time application if: Waiting Time 0 Real-time processing Processing Time Waiting Time Sample Time nn+1
13
Why not use a General Purpose Processor (GPP) such as a Pentium instead of a DSP processor? What is the power consumption of a Pentium and a DSP processor? What is the cost of a Pentium and a DSP processor? Why do we need DSP processors?
14
Use a DSP processor when the following are required: Cost saving. Smaller size. Low power consumption. Processing of many “high” frequency signals in real-time. Use a GPP processor when the following are required: Large memory. Advanced operating systems. Why do we need DSP processors?
15
What are the typical DSP algorithms? The Sum of Products (SOP) is the key element in most DSP algorithms:
16
Introduction Introduction Introduction TI DSP Families TI DSP Families C6000 Architecture CPU Buses and Peripherals C6000 Roadmap
17
ExternalMemory CPU Internal Buses PERIPHERALS 'C6000 Block Diagram Internal Memory
18
'C6000 System Block Diagram ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) CPU PERIPHERALS Internal Buses Internal Memory
19
What Problem Are We Trying To Solve? Digital sampling of an analog signal: A t What does it take to do this fast … and easy? Most DSP algorithms can be expressed with MAC: count i = 1 i = 1 Y = a i * x i for (i = 1; i < count; i++){ sum += m[i] * n[i]; } DAC xY ADC DSP
20
Fastest Execution of MACs The ‘C6x roadmap... from 200 to 2400 MMACs Ease of C Programming Even using natural C, the ‘C6000 Architecture can perform 2 to 4 MACs per cycle Compiler generates 80-100% efficient code Multiply-Accumulate (MAC) in Natural C Code for (i = 0; i < count; i++){ sum += m[i] * n[i]; } Fast MAC using only C How does the ‘C6000 achieve such performance from C?
21
Great out-of-box experience Great out-of-box experience Completely natural C code (non ’C6x specific) Completely natural C code (non ’C6x specific) Code available at: www.ti.com/sc/c6000compiler Code available at: www.ti.com/sc/c6000compiler Versus hand-coded assembly based on cycle count Versus hand-coded assembly based on cycle count How does the ‘C6000 achieve such performance from C? Sample Compiler Benchmarks
22
'C6000 Architecture: Built for Speed A0 A31........ A15.........M1.M1.L1.L1.D1.D1.S1.S1.M2.M2.L2.L2.D2.D2.S2.S2 B0 B31........ B15........ Controller/DecoderController/Decoder Memory ‘C6000 Compiler excels at Natural C While dual-MAC speeds math intensive algorithms, flexibility of 8 independent functional units allows the compiler to quickly perform other types of processing All ‘C6000 instructions are conditional allowing efficient hardware pipelining Instruction set and CPU hardware orthogonality allow the compiler to achieve 80- 100% efficiency
23
Fastest MAC using Natural C ;** --------------------------------------------------* LOOP:; PIPED LOOP KERNEL LDDW.D1A4++,A7:A6 ||LDDW.D2B4++,B7:B6 ||MPYSP.M1XA6,B6,A5 ||MPYSP.M2XA7,B7,B5 ||ADDSP.L1A5,A8,A8 ||ADDSP.L2B5,B8,B8 || [A1]B.S2LOOP || [A1]SUB.S1A1,1,A1 ;** --------------------------------------------------* float mac(float *m, float *n, int count) { int i, float sum = 0; for (i=0; i < count; i++) { for (i=0; i < count; i++) { sum += m[i] * n[i]; } … sum += m[i] * n[i]; } … A0 A31........ A15.........M1.M1.L1.L1.D1.D1.S1.S1.M2.M2.L2.L2.D2.D2.S2.S2 B0 B31........ B15........ Controller/DecoderController/Decoder Memory
24
Introduction Introduction Introduction TI DSP Families TI DSP Families 'C6000 Architecture CPU CPU Buses and Peripherals 'C6000 Roadmap
25
'C6000 System Block Diagram ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU PERIPHERALS Internal Buses Looking at the internal buses... Internal Memory
26
‘C6000 Internal Buses PC Program Addrx32 Program Datax256 DMA DMA Addr- Read DMA Data- Read DMA Addr- Write DMA Data- Write Aregs Bregs Data Addr- T1 x32 Data Data- T1 x32/64 Data Addr- T2x32 Data Data- T2 x32/64 InternalMemoryExternalMemory Peripherals
27
'C6000 System Block Diagram ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Next, the internal memory... Internal Memory
28
‘C6711 Memory cache details cache logic FFFF_FFFF0000_0000 64KB Internal On-chip Peripherals 0180_0000 128MB External 2 3 8000_0000 9000_0000 A000_0000 B000_0000 0 1 64K Prog / Data (Level 2) CPU 4K Program Cache 4K Data Cache
29
‘C6711 Cache Logic CPU requests data Is data in L1? Is data in L2? Copy Data from External Mem to L2 Copy Data from L2 to L1 Send Data to CPU No Yes Yes No
30
‘C6711 Cache Details Level 1 Program Always cacheAlways cache 1 way cache (direct mapped)1 way cache (direct mapped) Zero wait-stateZero wait-state Line size:512 bits (or 16 instr)Line size:512 bits (or 16 instr) Level 1 Data Always cacheAlways cache 2 way cache2 way cache Zero wait-stateZero wait-state Line size:256 bitsLine size:256 bits Level 2 Unified (prog or data)Unified (prog or data) RAM or cacheRAM or cache 1-4 way cache1-4 way cache 32 data bytes in 4 cycles32 data bytes in 4 cycles 16 instr. in 5 cycles16 instr. in 5 cycles Line Size:1024 bits (or 128 bytes)Line Size:1024 bits (or 128 bytes) CPU L1 Prog (4KB) L1 Data (4KB) L2Unified (64KB) 256 8/16/32/64 128 256
31
'C6000 System Block Diagram ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU PERIPHERALS Internal Buses Looking at each peripheral... Internal Memory
32
'C6000 Peripherals ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’s Utopia GPIO VCPTCP DMA, EDMA (Boot) Timers PLL XB, PCI, Host Port EMIF
33
.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory SDRAM Async SBSRAM EMIF EMIF External Memory Interface ( EMIF ) Glueless access to async/sync memory Works with PC100 SDRAM (cheap, fast, and easy!) Byte-wide data access 16, 32, or 64-bit bus widths External Memory Interface ( EMIF ) Glueless access to async/sync memory Works with PC100 SDRAM (cheap, fast, and easy!) Byte-wide data access 16, 32, or 64-bit bus widths
34
Internal and External Memory DevicesInternal EMIF (A) size of range EMIFB C6201C6204C6205C6701 P = 64 KB D = 64 KB 52M Bytes (32-bits wide) N/A C6202 P = 256 KB D = 128 KB 52M Bytes (32-bits wide) N/A C6203 P = 384 KB D = 512 KB 52M Bytes (32-bits wide) N/A C6211C6711C6712 L1P=4 KB L1D=4 KB L2=64 KB 128M Bytes (32-bits wide) (32-bits wide)N/A C6712 64M Bytes (16-bits wide) (16-bits wide)N/A C6414C6415C6416 L1P=16 KB L1D=16 KB L2=1 MB 256M Bytes (64-bits wide) (64-bits wide) 64M Bytes (16-bits wide) (16-bits wide) 52M Bytes (32-bits wide) N/A L1P=4 KB L1D=4 KB L2=64 KB N/A
35
ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory XBUS, PCI, Host Port EMIF HPI / XBUS / PCI Parallel Peripheral Interface HPI:Dedicated, slave-only, async 16/32-bit bus allows host- P access to C6000 memory XBUS:Similar to HPI but provides … Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI:Standard 32-bit, 33MHz PCI interface These interfaces provide means to bootstrap the C6000 Parallel Peripheral Interface HPI:Dedicated, slave-only, async 16/32-bit bus allows host- P access to C6000 memory XBUS:Similar to HPI but provides … Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI:Standard 32-bit, 33MHz PCI interface These interfaces provide means to bootstrap the C6000
36
GPIO ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO XB, PCI, Host Port EMIF General Purpose Input/Output (GPIO) ‘C64x provides 8 or 16 bits of general purpose bitwise I/O Use to observe or control the signal of a single-pin General Purpose Input/Output (GPIO) ‘C64x provides 8 or 16 bits of general purpose bitwise I/O Use to observe or control the signal of a single-pin
37
McBSP and Utopia ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’s Utopia GPIO XB, PCI, Host Port EMIF Multi-Channel Buffered Serial Port ( McBSP ) 2 (or 3) full-duplex, synchronous serial-ports Up to 100 Mb/sec performance Supports multi-channel operation (T1, E1, MVIP, …) Utopia ( C64x ) ATM connection 50 MHz wide area network connectivity Multi-Channel Buffered Serial Port ( McBSP ) 2 (or 3) full-duplex, synchronous serial-ports Up to 100 Mb/sec performance Supports multi-channel operation (T1, E1, MVIP, …) Utopia ( C64x ) ATM connection 50 MHz wide area network connectivity
38
DMA / EDMA ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’s Utopia GPIO DMA, EDMA (Boot) XB, PCI, Host Port EMIF Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another 4 / 16 / 64 channels (transfer parameter sets) Transfers can be triggered by any interrupt (sync) Operates independent of CPU On reset, provides bootstrap from memory Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another 4 / 16 / 64 channels (transfer parameter sets) Transfers can be triggered by any interrupt (sync) Operates independent of CPU On reset, provides bootstrap from memory
39
Timer / Counter ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’sUtopia GPIO DMA, EDMA (Boot) Timers XB, PCI, Host Port EMIF Timer / Counter Two (or three) 32-bit timer/counters Can generate interrupts Both input and output pins Timer / Counter Two (or three) 32-bit timer/counters Can generate interrupts Both input and output pins
40
VCP / TCP -- 3G Wireless ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’sUtopia GPIO VCPTCP DMA, EDMA (Boot) Timers XB, PCI, Host Port EMIF Turbo Coprocessor (TCP) Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) Supports >500 voice channels at 8 kbps Programmable decoder parameters include constraint length, code rate, and frame length Turbo Coprocessor (TCP) Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) Supports >500 voice channels at 8 kbps Programmable decoder parameters include constraint length, code rate, and frame length
41
PLLExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’s Utopia GPIO VCPTCP DMA, EDMA (Boot) Timers PLL XB, PCI, Host Port EMIF PLL External clock multiplier Reduces EMI and cost Pin selectable PLL Input CLKIN Output CLKOUT1 - Output rate of PLL - Instruction (MIP) rate CLKOUT2 - 1/2 rate of CLKOUT1 Input CLKIN Output CLKOUT1 - Output rate of PLL - Instruction (MIP) rate CLKOUT2 - 1/2 rate of CLKOUT1
42
'C6000 Peripherals ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’s Utopia GPIO VCPTCP DMA, EDMA (Boot) Timers PLL XB, PCI, Host Port EMIF
43
Introduction Introduction Introduction TI DSP Families TI DSP Families 'C6000 Architecture 'C6000 Architecture 'C6000 Roadmap
44
C6000 Roadmap Performance Highest Performance Time Software Compatible Floating Point Multi-coreMulti-core C64x ™ DSP 1.1 GHz C64x ™ DSP 2nd Generation General Purpose C6414C6414 C6415C6415C6416C6416 Media Gateway 3G Wireless Infrastructure C6201 C6701 C6202 C6203 C6211 C6711 C6204 1st Generation C6205 C6712 C62x ™ C67x ™
45
Performance Time C67x 3 GFLOPS and beyond C6712 600 MFLOPS C6711 900 MFLOPS C6701 1 GFLOPS 150 MFLOPS C32 C31 C30 C33 ’C6000 Floating-Point
46
TI Floating Point - A History of Firsts: First commercially-successful floating-point DSP ‘C30 (1987) First floating-point DSP with multiprocessing support ‘C40 (1991) First $10 floating-point DSP ‘C32 (1995) First 1-GFLOPS DSP ‘C6701 (1998) First $5 floating-point DSP ‘C33 (1999) First 2-level cache floating-point DSP ‘C6711 (1999) First to offer 600 MFLOPS for under $10 ‘C6712 (2000) TI Floating-Point Innovation
47
Introduction Introduction Introduction TI DSP Families TI DSP Families 'C6000 Architecture 'C6000 Architecture 'C6000 Roadmap 'C6000 Roadmap Optional Topics Go to Module 2 Go to Module 2 'C6000 CPU Details 'C6000 CPU Details 'C6000 CPU Details 'C6000 Instruction Summaries 'C6000 Instruction Summaries 'C6000 Instruction Summaries
48
What Problem Are We Trying To Solve? Digital sampling of an analog signal: A t What are the two primary instructions? Most DSP algorithms can be expressed as: 40 i = 1 i = 1 Y = a i * x i DAC xY ADC DSP
49
MultMult ALUALU MPYa, x, prod ADD y, prod, y y = 40 a n x n n = 1 * The Core of DSP : Sum of Products Where are the variables? The ’C6000 Designed to handle DSP’s math-intensive calculations ALUALUALUALU ALUALUALUALU.M.M MPY.Ma, x, prod.L.L ADD.L y, prod, y Note: You don’t have to specify functional units (.M or.L)
50
y = 40 a n x n n = 1 * Register File A a x prod 32-bits y.............M.M.L.L Working Variables : The Register File Working Variables : The Register File How are the number of iterations specified? 16 registers MPY.Ma, x, prod ADD.Ly, prod, y
51
Loops: Coding on a RISC Processor Loops: Coding on a RISC Processor 1. Program flow: the branch instruction 2. Initialization: setting the loop count 3. Decrement: subtract 1 from the loop counter B loop B loop SUB cnt, 1, cnt SUB cnt, 1, cnt MVK 40, cnt MVK 40, cnt
52
y = 40 a n x n n = 1 * The “S” Unit : For Standard Operations The “S” Unit : For Standard Operations MVK.S40, cnt loop: MPY.Ma, x, prod ADD.L y, prod, y SUB.Lcnt, 1, cnt B.Sloop B.Sloop How is the loop terminated?.M.M.L.L.S.S Register File A 32-bits 16 registers a x prod y............ cnt
53
Conditional Instruction Execution Note: if condition is false, execution replaced with nop Code SyntaxExecute if: [ cnt ]cnt 0 [ !cnt ]cnt = 0 Execution based on [zero/non-zero] value of specified variable To minimize branching, all instructions are conditional [condition] Bloop
54
y = 40 a n x n n = 1 * Loop Control via Conditional Branch Loop Control via Conditional Branch MVK.S40, cnt loop: MPY.Ma, x, prod ADD.L y, prod, y SUB.Lcnt, 1, cnt [cnt]B.Sloop [cnt]B.Sloop How are the a and x array values brought in from memory?.M.M.L.L.S.S Register File A 32-bits a x prod y............ cnt
55
Memory Access via “.D” Unit.M.M.L.L.S.S y = 40 a n x n n = 1 * MVK.S40, cnt loop: LDH.D*ap, a LDH.D*xp, x MPY.Ma, x, prod ADD.L y, prod, y SUB.Lcnt, 1, cnt [cnt]B.Sloop [cnt]B.Sloop Data Memory: x(40), a(40), y How do we increment through the arrays? 16 registers Register File A a x prod y cnt *ap *xp *yp.D.D
56
Auto-Increment of Pointers Register File A a x prod y cnt *ap *xp *yp y = 40 a n x n n = 1 * MVK.S40, cnt loop: LDH.D*ap++, a LDH.D*xp++, x MPY.Ma, x, prod ADD.L y, prod, y SUB.Lcnt, 1, cnt [cnt]B.Sloop [cnt]B.Sloop How do we store results back to memory?.M.M.L.L.S.S Data Memory: x(40), a(40), y.D.D 16 registers
57
Storing Results Back to Memory Register File A a x prod y cnt *ap *xp *yp y = 40 a n x n n = 1 * MVK.S40, cnt loop: LDH.D*ap++, a LDH.D*xp++, x MPY.Ma, x, prod ADD.L y, prod, y SUB.Lcnt, 1, cnt [cnt]B.Sloop [cnt]B.Sloop STW.Dy, *yp But wait - that’s only half the story....M.M.L.L.S.S Data Memory: x(40), a(40), y.D.D
58
Dual Resources : Twice as Nice A0 A1 A2 A3 A4 Register File A A15 A5 A6 A7 anananan xnxnxnxn prd sum cnt........ *a *x *y.M1.M1.L1.L1.S1.S1.D1.D1.M2.M2.L2.L2.S2.S2.D2.D2 Register File B B0 B1 B2 B3 B4 B15 B5 B6 B7........ 32-bits................ 32-bits Our final view of the sum of products example...
59
ExternalMemory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU PERIPHERALS Internal Buses To summarize each units’ instructions... Internal Memory ‘C6000 System Block Diagram
60
‘C62x RISC-like instruction set.L.L.D.D.S.S.M.M No Unit Used IDLENOP.S Unit NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKH.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM.M Unit SMPY SMPYH MPY MPYH MPYLH MPYHL.D Unit NEG STB(B/H/W) SUB SUBAB(B/H/W) ZERO ADD ADDAB(B/H/W) LDB(B/H/W) MV
61
‘C67x : Superset of Fixed-Point.L.L.D.D.S.S.M.M No Unit Used IDLENOP.S Unit NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKH ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP.M Unit SMPY SMPYH MPY MPYH MPYLH MPYHL MPYSP MPYDP MPYI MPYID.D Unit NEG STB(B/H/W) SUB SUBAB (B/H/W) ZERO ADD ADDAB(B/H/W) LDB(B/H/W) LDDW MV
62
‘C64x Superset of ‘C62x.L.L.S.S.D.D.M.M.S Unit PACK2 PACKH2 PACKLH2 PACKHL2 UNPKHU4 UNPKLU4 SWAP2 SPACK2 SPACKU4 SADD2 SADDUS2 SADD4 ANDN SHR2 SHRU2 SHLMB SHRMB CMPEQ2 CMPEQ4 CMPGT2 CMPGT4 BDEC BPOS BNOP ADDKPC.L Unit SHLMB SHRMB MVK(5-bit) ABS2 ADD2 ADD4 MAX MIN SUB2 SUB4 SUBABS4 ANDN PACK2 PACKH2 PACKLH2 PACKHL2 PACKH4 PACKL4 UNPKHU4 UNPKLU4 SWAP2/4.D Unit LDDW LDNW LDNDW STDW STNW STNDW MVK(5-bit) ADD2 SUB2 AND ANDN OR XOR ADDAD.M.M.M Unit MVD BITC4 BITR DEAL SHFL MPYHI MPYLI MPYHIR MPYLIR AVG2 AVG4 ROTL SSHVL SSHVR BITC4 BITR DEAL SHFL MPY2/SMPY2 DOTP2 DOTPN2 DOTPRSU2 DOTPNRSU2 DOTPU4 DOTPSU4 GMPY4 XPND2/4 Double-size Register sets (A16-A31)(B16-B31) Advanced Instruction Packing (minimizes code-size) Advanced Emulation Features
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.