TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Branch prediction Titov Alexander MDSP November, 2009.
PIPELINE AND VECTOR PROCESSING
CSCI 4717/5717 Computer Architecture
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
1 Analog Devices TigerSHARC® DSP Family Presented By: Mike Lee and Mike Demcoe Date: April 8 th, 2002.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter.
Detailed look at the TigerSHARC pipeline Cycle counting for COMPUTE block versions of the DC_Removal algorithm.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
Computer Organization and Assembly language
Pipelining By Toan Nguyen.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Pipelining and Parallelism Mark Staveley
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Lecture 17 Final Review Prof. Mike Schulte Computer Architecture ECE 201.
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
CBP 2005Comp 3070 Computer Architecture1 Last Time … All instructions the same length We learned to program MIPS And a bit about Intel’s x86 Instructions.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Generating a software loop with memory accesses TigerSHARC assembly syntax.
Advanced Architectures
Instruction Level Parallelism
CS203 – Advanced Computer Architecture
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Software and Hardware Circular Buffer Operations
TigerSHARC processor General Overview.
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
Overview of SHARC processor ADSP and ADSP-21065L
Overview of SHARC processor ADSP Program Flow and other stuff
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Understanding the TigerSHARC ALU pipeline
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Understanding the TigerSHARC ALU pipeline
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Overview of TigerSHARC processor ADSP-TS101 Compute Operations
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Getting serious about “going fast” on the TigerSHARC
Explaining issues with DCremoval( )
General Optimization Issues
Overview Prof. Eric Rotenberg
Understanding the TigerSHARC ALU pipeline
A first attempt at learning about optimizing the TigerSHARC code
Lecture 5: Pipeline Wrap-up, Static ILP
Working with the Compute Block
COMPUTER ORGANIZATION AND ARCHITECTURE
A first attempt at learning about optimizing the TigerSHARC code
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
Presentation transcript:

TigerSHARC processor General Overview

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to capabilities of TigerSHARC ADSP-TS201 processor – Warning – you have TS201S instruction manual and TS101S hardware manual. At the moment the TS201S hardware manual is only available on the web

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 3 Processor Architecture bit data busses 2 Integer ALU 2 Computational Blocks – ALU (Float and integer) – SHIFTER – MULTIPLIER – COMMUNICATIONS CLU

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 4 Integer ALU Except for NO multiplier capability, essentially “processor” unit with capabilities of a 68K or MIPS processor Intended more as DAG Data address generator, but can do integer math.

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 5 X and Y Register File X – 32 locations Y – 32 locations Holds “bit patterns” Those bit patterns can be “floating point number” bit patterns OR “integer number” bit patterns BUT NOT BOTH AT THE SAME TIME 10% of marks lost in final and midterm will be associated with not understanding this issue. (30% of time wasted in labs too)

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 6 X and Y ALU Can handle floating point and integer operations by taking “bit patterns” from register file and do operations on them Very flexible 10% of marks lost in final and midterm will be associated with not understanding this functionality. (30% of time wasted in labs too)

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 7 SHIFTER Can handle integer operations by taking “bit patterns” from register file and do operations on them Very flexible 10% of marks lost in final and midterm will be associated with not understanding this functionality.

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 8 MULTIPLIER Multiplies integer and floating point “bit patterns” from register file Very flexible 10% of marks lost in final and midterm will be associated with not understanding this functionality. (15% of time wasted in labs too)

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 9 CLU VERY, VERY FANCY COMPLEX ARITHMETIC (2, 8 and 16 bits) TRELLIS, VERTIBBI etc Capability, excellent individual projects, also Q9 on final exam (D-I-Yourself)

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 10 J and K DATA BUSES VERY FANCY 32-bit accesses 64-bit accesses 128-bit accesses 256-bit loads possible (128 to 4 X registers and 128 to 4 Y registers) Some special issues when loading QUAD values (4 at same time) that are offset handled with DAB (data address buffer?)

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 11 I-BUS DATA BUS VERY FANCY VLIW – very long and variable length instruction word 32-bit accesses done with IAB 64-bit accesses done with IAB 96-bit accesses done with IAB 128-bit accesses done with IAB BTB (Branch target buffer) assists with many pipeline issues 10% of marks lost in final and midterm will be associated with not understanding this functionality.

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 12 Pipeline issues -- Normal instruction

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 13 Terminology – C++ compiler often inserts comments about “bubbles” Instr 2 needs result from instr1 But Instr1 result not available till end of pipeline so Instr 2 stalls

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 14 Not entirely clear of explanation seems 1 cycle out by my model STALL BUBBLE Once the STALL is broken, then a BUBBLE (virtual NOP?) is inserted into the instruction stream

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 15 Many types of coding used in this course Compiler “debug” mode – inefficient code Compiler “release” mode – more efficient code (inter-procedural optimization, general parallel instructions). – Use the.s output as a starting point for optimizing assembly code and for learning about instructions and optimizing techniques Custom Assembler “SISD” “debug mode” – not highly optimized, but no general inefficiencies. Lab1 and Lab. 2. Generally quizable. Custom Assembler “SISD” and “SIMD” “release modes” – coded in a way that we “avoid” probable stalls, rather than completely understanding them. Lab. 2 and Lab. 3. Somewhat quizable. Custom Assembler “SISD”, “SIMD” and “MIMD” “highly optimized mode”. Understanding the concepts, very difficult to put actual questions into a quiz (too time consuming). Probably demonstrable in final lab. Lab 4 – individual assignments. Need to know “what to worry about” Dual processor mode – all of the above. Probably demonstrable in final lab. Lab 4 – individual assignments. Need to know “what to worry about”

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 16 Predicted jump – BTB hit BIG DELAYS NO DELAYS

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 17 Predicted jump – BTB miss BIG DELAYS

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 18 Non predicted branches XY – big loss

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 19 Non predicted branches JK – less loss

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 20 Pipeline issues – Predicted – not taken

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 21 Pipeline issues -- many Predicted Branch not taken R1 = R2 + R3;; R0 = [J1 += J5];; R4 = R5 + R1;; Conflict on J-bus Data dependencies Sort of acts like Sort of acts like R1 = R2 + R3;; R0 = [J1];; stall J1 = J1 + J5;; R4 =R4 + R1;;

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 22 Pipeline issues -- many R1 = R2 + R3;; R0 = [J1 += J5];; R4 = R5 + R1;; Conflict on J-bus Data dependencies Sort of acts like Sort of acts like R1 = R2 + R3;; R0 = [J1];; stall J1 = J1 + J5;; R4 =R4 + R1;;

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 23 Pipeline issues -- many Predicted Branch not taken R1 = R2 + R3;; R0 = [J1 += J5];; R4 = R5 + R1;; Conflict on J-bus Data dependencies Sort of acts like Sort of acts like R1 = R2 + R3;; R0 = [J1];; stall J1 = J1 + J5;; R4 =R4 + R1;;

6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 24 Concepts tackled Introduction to capabilities of TigerSHARC ADSP-TS201 processor – Warning – you have TS201S instruction manual and TS101S hardware manual. At the moment the TS201S hardware manual is only available on the web