Generating a software loop with memory accesses

Slides:

Advertisements

Similar presentations

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.

Advertisements

Fall EE 333 Lillevik 333f06-l4 University of Portland School of Engineering Computer Organization Lecture 4 Assembly language programming ALU and.

1 Today’s lecture  Last lecture we started talking about control flow in MIPS (branches)  Finish up control-flow (branches) in MIPS —if/then —loops —case/switch.

Computer Architecture CSCE 350

Review of Blackfin Syntax Moves and Adds 1) What we already know and have to remember to apply 2) What we need to learn.

Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.

Detailed look at the TigerSHARC pipeline Cycle counting for COMPUTE block versions of the DC_Removal algorithm.

VisualDSP++ and Test Driven Development Prelaboratory assignment information.

Understanding the Blackfin ADSP-BF5XX Assembly Code Format

TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.

Friday’s lecture again. Using E-TDD environment Build the tests you want to pass Build the code Test the code.

Processor Architecture Needed to handle FFT algoarithm M. Smith.

Blackfin Array Handling Part 2 Moving an array between locations int * MoveASM( int foo[ ], int fee[ ], int N);

Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.

A Play Core Timer Interrupts Acted by the Human Microcontroller Ensemble from ENCM511.

Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.

Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.

Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);

A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.

Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.

Carnegie Mellon 1 Midterm Review : Introduction to Computer Systems Recitation 8: Monday, Oct. 13, 2014 Lou Clark.

Generating a software loop with memory accesses TigerSHARC assembly syntax.

Lecture 3 Translation.

MIPS Instruction Set Advantages

Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.

Software and Hardware Circular Buffer Operations

General Optimization Issues

TigerSHARC processor General Overview.

Generating the “Rectify” code (C++ and assembly code)

Generating “Rectify( )”

A Play Core Timer Interrupts

Introduction to Test Driven Development

More About Data Types & Functions

Trying to avoid pipeline delays

Understanding the TigerSHARC ALU pipeline

Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.

MIPS Functions.

VisualDSP++ and Test Driven Development What happened last lecture?

Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.

Assembly Language Review

Understanding the TigerSHARC ALU pipeline

Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.

Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.

A Play Lab. 2 Task 8 Core Timer Interrupts

Assembly Language Review

Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.

Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.

MIPS function continued

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

Getting serious about “going fast” on the TigerSHARC

General Optimization Issues

Explaining issues with DCremoval( )

General Optimization Issues

COMS 361 Computer Organization

Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.

Developing a bicycle speed-o-meter

Building a simple loop using Blackfin assembly code

Understanding the TigerSHARC ALU pipeline

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

A first attempt at learning about optimizing the TigerSHARC code

CPU Structure CPU must:

Working with the Compute Block

Procedure Support From previous study of high-level languages, we know the basic issues: - declaration: header, body, local variables - call and return.

Blackfin Syntax Moves and Adds

Blackfin Syntax Stores, Jumps, Calls and Conditional Jumps

A first attempt at learning about optimizing the TigerSHARC code

Building tests and code for a “software radio”

Presentation transcript:

Generating a software loop with memory accesses TigerSHARC assembly syntax

Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Passing integer rectify 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Add the “ASM” tests Want link to fail to find mangled name Name mangled function name 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

More detailed look at the code As with 68K needs a .section But name and format different As with 68K need .align statement Is the “4” in bytes (8 bits) or words (32 bits) As with 68K need .global to tell other code that this function exists Single semi-colons Double semi-colons Start function label End function label Label format similar to 68K Needs leading underscore and final colon 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Using J8 for returned int * value Now passing this test “by accident Should be conditionally passing back NULL 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Parameter passing Spaces for first four parameters present on the stack (as with 68K) But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS) The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) when assembly code functions call assembly code functions J4, J5, J6 and J7 are volatile registers 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Coding convention // int *HalfWaveRectifyRelease(int initial_array[ ], // int final_array[ ], int N) #define initial_pt_inpar1 J4 #define final_pt_inpar2 J5 #define M_J6_inpar3 J6 #define return_pt_J8 J8 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Note END_IF not defined and not yet recognized as an error ELSE is a KEYWORD Missing ;; means all these instructions are joined into “1-line” of more than 4 instructions Note END_IF not defined and not yet recognized as an error 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Jumps can be predicted to happen (default) Quad stuff issue Personally, because of name mangling issues, I cut-and-paste function name into labels Two issues Jumps can be predicted to happen (default) Quad stuff issue 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

The code was not exactly what we designed (C++ equivalent) – refactor and retest after the refactoring NEXT STEP 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

For – loop structure – Use 68K style of looping jumps 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

For – loop structure – Use 68K style of looping – tests and jumps 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Accessing memory Basic mode Special register J31 – acts as zero when used in additions Pt_J5 is a pointer register into an array Read_J1 is being used as a data register J registers like MIPS registers (used as pointer and data). NOT like 68K registers – either data or address but not both Read_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to 68K MOVE.L (A5), D1 Read_J1 = [Pt_J5 + 8];; read value from memory location pointed to by the value (J5 + 8) -- Compare to 68K MOVE.L 8(A5), D1 PREMODIFY – address used J5 + 8, no change in J5 Read_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Read_J1 = [Pt_J5];; -- NEED TO CONFIRM 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Accessing memory – step 2 Basic mode Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Read_J1 is being used as a data register Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 Compare to 68K MOVE.L (A5, D4), D1 Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add POST-MODIFY – address used J5, then perform J5 = J5 + J4 Compare to 68K MOVE.L (A5), D1 ADD.L A4, A5 but as single instruction 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Many other addressing modes Normal memory accesses Merged memory accesses Broadcast memory accesses Single register accesses Dual register accesses Quad register accesses Cross-over accesses Access of COMPLEX numbers 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

For – loop structure – Use 68K style of looping QUAD ERROR ISSUE AGAIN 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Write the “float-asm” Integer 0 has bit pattern 0x0000 0000 Float 0.0 has bit pattern 0x0000 0000 Integer has format b S??? ???? ???? ???? ? ??? ???? ???? ???? Float has format b S??? ???? ???? ???? ? ??? ???? ???? ???? Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm Just re-use integer algorithm with a change of name EXPONENT 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Float ASM test 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Do the timing tests 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Weird results DEBUG RELEASE FIRST_ASM INTEGER 426 416 124 118 316 320 FLOAT 462 458 210 216 224 222 Variation of about 6 cycles in testing Our first ASM is faster than debug and slower than release – that was expected Our integer code was slower than our float code – that was unexpected since the same code Can we optimize an improve the timing? 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Integer release code – identify new instructions 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Float release – identify new instructions 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Exercise 1 – needed for Lab. 1 FIR filter operation -- data and filter-coefficients are both integer arrays – in C++ 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Exercise 1 – needed for Lab. 1 FIR filter operation -- data and filter-coefficients are both integer arrays – in ASM 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Insert C++ code 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Insert assembler code version 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada