Overview of TigerSHARC processor ADSP-TS101 Compute Operations

Slides:



Advertisements
Similar presentations
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Advertisements

Jan 28, 2004Blackfin Compute Unit REV B A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit Based on a ENEL white paper prepared by.
Real time DSP Professors: Eng. Julian S. Bruno Eng. Jerónimo F. Atencio Sr. Lucio Martinez Garbino.
What are the characteristics of DSP algorithms? M. Smith and S. Daeninck.
Process for changing “C-based” design to SHARC assembler ADDITIONAL EXAMPLE M. R. Smith, Electrical and Computer Engineering University of Calgary, Canada.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Generation of highly parallel code for TigerSHARC processors An introduction This presentation will probably involve audience discussion, which will create.
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Jan 28, 2004Blackfin Compute Unit REV B ENEL DSP Architectures BlackFin Compute Unit.
Feb 12, 2004Tiger SHARC Memory Operations REV B 1 of 17 ENEL DSP Architectures Tiger SHARC Memory Operations.
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Fixed-Point Arithmetics: Part II
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Digital Signal Processors (DSPs). DSP Advanced signal processor circuits MAC (Multiply and Accumulate) unit (s) - provides fast multiplication of two.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
1 The Instruction Set Architecture September 27 th, 2007 By: Corbin Johnson CS 146.
Systematic development of programs with parallel instructions SHARC ADSP21XXX processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
Computing Science Computer Structure: Lesson 1: Processor Structure
Continuous Random Variables
William Stallings Computer Organization and Architecture 7th Edition
Subject Name: Digital Signal Processing Algorithms & Architecture
واشوقاه إلى رمضان مرحباً رمضان
Software and Hardware Circular Buffer Operations
General Optimization Issues
TigerSHARC processor General Overview.
Continuous Random Variables
Overview of SHARC processor ADSP and ADSP-21065L
Arithmetic Logical Unit
DMA example Video image manipulation
Overview of SHARC processor ADSP Program Flow and other stuff
Trying to avoid pipeline delays
Understanding the TigerSHARC ALU pipeline
What are the characteristics of DSP algorithms?
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
ECEG-3202 Computer Architecture and Organization
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Understanding the TigerSHARC ALU pipeline
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* From AMD 1996 Publication #18522 Revision E
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
General Optimization Issues
Chapter 8 Computer Arithmetic
Explaining issues with DCremoval( )
General Optimization Issues
Tutorial on Post Lab. 1 Quiz Practice for parallel operations
The ARM Instruction Set
DMA example Video image manipulation
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Memory Operations
Understanding the TigerSHARC ALU pipeline
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
A first attempt at learning about optimizing the TigerSHARC code
Working with the Compute Block
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
Presentation transcript:

Overview of TigerSHARC processor ADSP-TS101 Compute Operations * 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. Overview of TigerSHARC processor ADSP-TS101 Compute Operations Steve Daeninck, Electrical and Computer Engineering, University of Calgary, Alberta, Canada smithmr @ ucalgary.ca *

ENEL 619.23 -- Review of TigerSHARC Processor To be tackled today Reference sources Register file and operations ALU operations MAC operations (Multiply and accumulate) SHIFTER operations Common errors Example exercises 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

ENEL 619.23 -- Review of TigerSHARC Processor Reference Sources ADSP-TS101 TigerSHARC Processor Programming Reference, Analog Devices web site. ADSP-TS101 TigerSHARC Hardware Reference, Analog Devices web site. ENEL619.23 Course, Reference and Laboratory Notes. 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

Register File and COMPUTE Units Key Points Each block can load/store 4x32bit registers in a cycle. 4 inputs to Compute block, but only 3 Outputs to Register Block. Highly parallel operations UNDER THE RIGHT CONDITIONS 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

ENEL 619.23 -- Review of TigerSHARC Processor Register File - Syntax Key Points Each Block has 32x32 bit Data registers Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words. XSR3:2 -> 4x16 bit words XFR1:0 -> 1x40 bit float XR7 -> 1x32 bit word XBR3:0 -> 16x8 bit words Multiple of 4 Multiple of 2 XLR7:6 -> 1x64 bit word Register Syntax 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

Register File – BIT STORAGE * 07/16/96 128 bit examples are not shown but they are the same. 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

Volatile Data Registers * 07/16/96 Volatile registers – no need to save 24 Volatile Data registers in each block XR0 – XR23 YR0 – YR23 2 ALU summation registers in each block XPR0, XPR1, YPR0, YPR1 5 MAC accumulate registers in each block XMR0 – XMR3, YMR0 – YMR3 XMR4, YMR4 – Overflow registers PR stands for parallel results register MR stands for Multiplier results register 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

Arithmetic Logic Unit (ALU) * 07/16/96 2x64 bit input paths 2x64 bit output paths 8, 16, 32, or 64 bit addition/subtraction - Fixed-point 32 or 64 bit logical operations - fixed-point 32 or 40 bit floating-point operations DAB – Data Alignment Buffer(2x128 bit FIFO)-> used to align misaligned quad or dual 32 bit data loads 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

Sample ALU Instruction * 07/16/96 Example of 16 bit addition XYSR1:0 = R31:30 + R25:24 Performs addition in X and Y Compute Blocks Other additions/subtractions look the same, but use 32 or 8 bits 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

Sample ALU Instructions * 07/16/96 A neat instruction is the sideways addition sum (SUM) Fixed-Point Floating-Point 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Example * 07/16/96 int x_two = 64, y_two = 16; int x_three = 128, y_three = 8; int x_four = 128, y_four = 8; int x_five = 64, y_five = 16; int x_odd = 0, y_odd = 0; int x_even = 0, y_even = 0; x_odd = x_five + x_three; x_even = x_four + x_two; y_odd = y_five + y_three; y_even = y_four + y_two; XR2 = 64;; XR3 = 128;; XR4 = 128;; XR5 = 64;; YR2 = 16;; YR3 = 8;; YR4 = 8;; YR5 = 16;; XYR1:0 = R5:4 + R3:2;; //XR1 = x_odd, XR0 = x_even //YR1 = y_odd, YR1 = y_even nice example of the tigerSharc, it accomplishes thecode in less lines than C 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Multiplier * 07/16/96 Operates on fixed, floating and complex numbers. Fixed-Point numbers 32x32 bit with 32 or 64 bit results 4 (16x16 bit) with 4x16 or 4x32 bit results Data compaction inputs – 16, 32, 64 bits, outputs 16, 32 bit results Floating-Point numbers 32x32 bit with 32 bit result 40x40 bit with 40 bit result Complex Numbers 32x32 bit with results stored in MR register Fixed-point only Complex – imaginary part is in the MSB part of the 32 bit word 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Multiplier * 07/16/96 XR2 = MR3:2, XMR3:2 = R3*R5;; if integer multiply, R2 gets MR2, if Fractional gets MR3 MR stands for Multiplier results register XR0 = R1*R2;; XR1:0 = R3*R5;; XMR1:0 = R3*R5;; //uses XMR4 overflow XR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;; XFR0 = R1*R2;; XFR1:0 = R3:2*R5:4;; //40 bit multiply //32 bit mantissa 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Multiplier * 07/16/96 16 bit multiplies results can be 16 bit or 32 bit RED for 16 bit, Blue for 32 bit MR4 contains four overflow bits for every MR register No need to tell the instruction if it is a short or normal word XR5:4 = R1:0*R3:2;;(16 bit results) XR7:4 = R3:2*R5:4;; (32 bit results) XMR1:0 += R3:2*R5:4;;(16 bit results) XMR3:0 += R3:2*R5:4;; (32 bit results) XR3:2 = MR3:2, XMR3:2 = R1:0*R5:4;; (16 bit results) XR3:0 = MR3:0, XMR3:0 = R1:0*R5:4;; (32 bit results) 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Practice Examples Convert from “C” into assembly code – use volatile registers long int value = 6; long int number = 7; long int temp = 8; value = number * temp; float value = 6; float number = 7; long int temp = 8; value = number * temp; 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

Avoiding common design errors * 07/16/96 float value = 6.0; float number = 7.0; long int temp = 8; value = value + (float) 1; number = number + (float) 2; temp = (int) (value + number); XR12 = 6.0;; XR13 = 7.0;; XR18 = 8;; XR0 = 1.0;; XR1 = 2.0;; XFR12 = R12 + R0;; XFR13 = R13 + R1;; XFR14 = R12 + R13;; XR18 = FIX FR14;; Questionable if XFR12 = 1.0;; is allowed, assembler complains XR23:22 = 10.0;; not allowed, there may not be an immediate load for 40 bit floats 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

Avoiding common design errors * 07/16/96 Convert from “C” into assembly code – use volatile registers float value = 6.0; float number = 7.0; long int temp = 8; value = number * (float) temp; XR12 = 6.0;; //valueF12 XR13 = 7.0;;//numberF13 XR18 = 8;; //tempR18 //(float)tempR18 XFR18 = FLOAT R18;; //valueF12 = numberF13 * tempF18 XFR12 = R13 * R18;; XFR23:22 = R21*R22;; not allowed 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Shifter Instructions * 07/16/96 FEXT – bit field extraction, FDEP – bit field deposit 2x64 bit input paths and 2x64 bit output paths 32, or 64 bit shifting operations 32 or 64 bit manipulation operations 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor *

ENEL 619.23 -- Review of TigerSHARC Processor Examples long int value = 128; long int high, low; low = value >> 2; high = value << 2; POSITIVE VALUE – LEFT SHIFT NEGATIVE VALUE – RIGHT SHIFT XR0 = 2;; XR1 = -XR2;; XR2 = 128;; //low = value >> 2; XR23 = ASHIFT XR2 BY –2;; Or XR23 = ASHIFT XR2 BY XR1;; //high = value << 2; XR22 = ASHIFT XR2 BY 2;; XR22 = ASHIFT XR2 BY XR0;; 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

ENEL 619.23 -- Review of TigerSHARC Processor TS101 ALU instructions Under the RIGHT conditions can do multiple operations in a single instruction. Instruction line has 4x32 bit instruction slots. Can do 2 Compute and 2 memory operations. This is actually 4 Compute operations counting both compute blocks. One instruction per unit of a compute block, ie. ALU. Since there are only 3 result buses, only one unit (ALU or Multiplier) can use 2 result buses. Not all instructions can be used in parallel. 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

Dual Operation Examples FRm = Rx + Ry, FRn = Rx – Ry;; Note that uses 4(8) different registers and not 6(12) The source registers used around the + and – must be the same. Can be floating(single or extended precision) or fixed(32 or 64 bit) add/subtract. FRm = MRa, MRa += Rx * Ry;; MRa must be the same register(s) Can be used on fixed(32 or 64 bit results), floating(32 or 40 bit results) and complex numbers. 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor

ENEL 619.23 -- Review of TigerSHARC Processor Tackled today Reference sources Register file and operations ALU operations Multiplier operations SHIFTER operations Common errors Example exercises 1/12/2019 ENEL 619.23 -- Review of TigerSHARC Processor