* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.

Slides:



Advertisements
Similar presentations
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
Advertisements

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
6/2/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Process for changing “C-based” design to SHARC assembler ADDITIONAL EXAMPLE M. R. Smith, Electrical and Computer Engineering University of Calgary, Canada.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Generation of highly parallel code for TigerSHARC processors An introduction This presentation will probably involve audience discussion, which will create.
Generation of highly parallel code for 2106X processors An introduction Developed by M. R. Smith Presented by S. Lei SHARC2000 Workshop, Boston, September.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources M. Smith, University of Calgary, Canada ucalgary.ca.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
ITEC 352 Lecture 18 Functions in Assembly. Functions + Assembly Review Questions? Project due on Friday Exam –Average 76 Methods for functions in assembly.
Efficient Loop Handling for DSP algorithms on CISC, RISC and DSP processors M. Smith, Electrical and Computer Engineering, University of Calgary, Alberta,
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
Systematic development of programs with parallel instructions SHARC ADSP21XXX processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
Generating a software loop with memory accesses TigerSHARC assembly syntax.
واشوقاه إلى رمضان مرحباً رمضان
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
TigerSHARC processor General Overview.
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
Microcoded CCU (Central Control Unit)
Program Flow on ADSP2106X SHARC Pipeline issues
Overview of SHARC processor ADSP and ADSP-21065L
The planned and expected
Overview of SHARC processor ADSP Program Flow and other stuff
Generating a software loop with memory accesses
ENCM K Interrupts Theory and Practice
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
ENCM515 Standard and Custom FIR filters for Lab. 4
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
The University of Adelaide, School of Computer Science
* M. R. Smith, University of Calgary, Alberta,
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Overview of TigerSHARC processor ADSP-TS101 Compute Operations
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Lab. 2 Modeling an audio channel with delays on ADSP21061
Hints for Post-Lab Quiz 1
-- Tutorial A tool to assist in developing parallel ADSP2106X code
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
* 2000/08/1307/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these.
Getting serious about “going fast” on the TigerSHARC
* L. E. Turner and M. R. Smith, University of Calgary, Alberta, Canada
Explaining issues with DCremoval( )
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
General Optimization Issues
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Tutorial on Post Lab. 1 Quiz Practice for parallel operations
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Compute Operations
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Overview of SHARC processor ADSP-2106X Memory Operations
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
A first attempt at learning about optimizing the TigerSHARC code
Working with the Compute Block
A first attempt at learning about optimizing the TigerSHARC code
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
ENCM515 Standard and Custom FIR filters
Presentation transcript:

* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. Process for changing “C-based” design to SHARC assembler ADDITIONAL EXAMPLE M. R. Smith, Electrical and Computer Engineering University of Calgary, Canada smithmr @ ucalgary.ca *

To be tackled today Need to set up review process to look for, and remove, common errors when writing assembly code Process to translate a “C” program involving arrays into SHARC code Comparison of timings for non-optimized code, optimized code, hardware loops, super-scalar architecture 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Code review Sheet -- PSP Need to identify common errors -- CODE REVIEW Constructs to link to “C” Are all declarations at the start of subroutine -- #define etc CONSTANTS, variables, FunctionNames, EXPORT leading underscores, .segment declarations Assembly syntax Self documentating code, clanguage_register_defines.I Missing semicolons -- CODE REVIEW Conditional Delayed Branching properly handled -- DESIGN REVIEW Load/Store Architecture -- DESIGN REVIEW Can’t do R1 = R2 + 4 . Becomes temp = 4; R1 = R2 + temp; Register operations, volatile, order of I and M registers -- CODE REVIEW What is your favourite error to waste time? 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Simpler example of array handling void MakeRamp{ float re_array[ ], int num ) { int count; for (count = 0; count < num; count++) { re_array[count] = count; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, Epilogue REVIEW How handle LOAD/STORE architecture How handle for-loop How handle = count operation (int to float conversion) How handle stepping through array -- post modify How handle how handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 1 -- int to float conversion Int to float conversion must be handled by YOU void MakeRamp{ float re_array[ ], int num ) { int count; for (count = 0; count < num; count++) { re_array[count] = (float) count; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, Epilogue REVIEW How handle LOAD/STORE architecture How handle for-loop How handle = count operation (int to float) How handle stepping through array -- post modify How handle how handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Watch for SHARC assembler nastiness The code F2 = dm(I1,1) disassembles as R2 = dm(I1,1) MEANING there is no special instruction needed as F2 and R2 are the same register. Translation handled by assembler F2 = 1.0 is translated as R2 = bit pattern for 1.0 NASTY SIDE EFFECT F2 = 1 is translated as R2 = bit pattern for 1 and is NOT TRANSLATED as R2 = bit pattern for (float) 1 so you get the effect of F2 = 1.0 * 10 -45 -- which is not what you intended. Make sure that you always add the decimal point .0 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 2 -- Convert to use local pointers (in scope) Use local pointer set to pointer value passed on the stack void MakeRamp{ float *re_array, int num ) { int count NOT A USEABLE POINTER dm float *arraypt = re_array; for (count = 0; count < num; count++) { *arraypt = (float) count; arraypt++; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, Epilogue REVIEW How handle LOAD/STORE architecture How handle for-loop How handle stepping through array -- post modify How handle how handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 3 -- load-store architecture Use registers variables and scratch register void MakeRamp{ register float *re_array, register int num ) { register int count = GARBAGE; register float scratch = GARBAGE; register dm float *arraypt = re_array; for (count = 0; count < num; count++) { scratch = (float) count; // *arraypt = (float) count *arraypt = scratch; arraypt++; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, Epilogue REVIEW How handle LOAD/STORE architecture How handle for-loop How handle how handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 4 -- convert the for-loop void MakeRamp{ register float *re_array, register int num ) { register int count = GARBAGE; register float scratch = GARBAGE; register dm float *arraypt = re_array; count = 0; while (count < num) { scratch = (float) count; *arraypt = scratch; arraypt++; count = count + 1; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, Epilogue REVIEW How handle for-loop -- 68K like -- NOT OPTIMIZED How handle how handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 4A -- convert the for-loop void MakeRamp{ register float *re_array, register int num ) { register int count = GARBAGE; register float scratch = GARBAGE; register dm float *arraypt = re_array; count = num; if (num > 0) do { scratch = (float) count; PROBLEM *arraypt = scratch; ALSO – how handle with HWL arraypt++ } while (--count > 0); } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, Epilogue REVIEW How handle for-loop -- 68K like -- NOT OPTIMIZED How handle how handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 5 -- Prologue -- which registers? void MakeRamp{ register float *re_array, register int num ) { INPAR1 (R4) INPAR2 (R8) NOW SEE WHY INPAR1 NOT POINTER register int count = GARBAGE; scratchR1 register float scratch = GARBAGE; scratchF2 (not R2) register dm float *arraypt = re_array; scratchDMpt count = 0; while (count < num) { scratch = (float) count; *arraypt = scratch; arraypt++; count = count + 1; } } Prologue -- leaf routine -- no stack changes Epilogue -- since leaf routine -- standard 5 lines How handle parameter passing 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 6 -- Handle loop -- Part 1 void MakeRamp{ register float *re_array, register int num { #define numR4 INPAR2 #define countR1 scratchR1 // register int count = GARBAGE; countR1 = 0; // count = 0; _MR_WHILE: // while (count < num) { ???? // Loop body countR1 = countR1 + 1; // count = count + 1; JUMP(PC, _MR_WHILE) (DB); // } nop; nop; // } end MakeRamp() 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 7 -- Handle loop -- Part 2 void MakeRamp{ register float *re_array, register int num ) { #define numINPAR2 INPAR2 #define countR1 scratchR1 // register int count; countR1 = 0; // count = 0; MR_WHILE: COMP(countR1,numINPAR2); // while (count < num) { if GT JUMP(PC, MR_ENDLOOP) (DB); nop; nop; ???? // Loop body countR1 = countR1 + 1; // count = count + 1; JUMP(PC, _MR_WHILE) (DB); // } nop; nop; MR_ENDLOOP: 5 magic lines of code for “C” return // } 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Reminder of what trying to do! void MakeRamp{ register float *re_array, register int num ) { register int count; register float scratch , *arraypt = re_array; for (count = 0; count < num; count++) { scratch = (float) count; *arraypt = scratch; arraypt++; } } 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Step 8 -- handle loop body // void MakeRamp{ register float *re_array, register int num ) { .segment seg_pmco; .global _MakeRamp; _MakeRamp: #define re_arrayINPAR1 INPAR1 // register int count; #define tempF2 scratchF2 // register float temp = GARBAGE #define arraypt scratchDMpt // *arraypt = GARBAGE; arraypt = re_arrayINPAR1; // *arraypt = re_array; // for (count = 0; count < num; count++) { tempF2 = FLOAT countR1; // temp = (float) count; dm(arraypt, 1) = tempF2; // *arraypt = temp; // arraypt++; // } // } 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Final “C” Code Translation Code as directly translated Possible Optimization Decide if it is worth the effort of optimizing? Optimized Don’t do it unless asked for this course in quizzes and labs Very easy to get it wrong 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

#define re_arrayINPAR1 INPAR1 #define numINPAR2 INPAR2 .global _MakeRamp; _MakeRamp: #define countR1 scratchR1 #define arraypt scratchDMpt countR1 = 0; arraypt = re_arrayINPAR1; MR_WHILE: COMP(countR1, numINPAR2); if GT JUMP(PC, MR_ENDLOOP) (DB); nop; nop; #define tempF2 scratchF2 tempF2 = FLOAT countR1; dm(arraypt, 1) = tempF2; countR1 = countR1 + 1; JUMP(PC, MR_WHILE) (DB); nop; nop; MR_ENDLOOP: 5 magic lines of code for “C” return 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Final “C” Code Translation Code as directly translated (7 + num *10 instr) Possible Optimization -- Worth the effort? Best case would be (7 + num * 6 instructions) Optimized Don’t do it unless asked for this course in quizzes and labs Very easy to get it wrong Improved algorithm using DSP architecture Hardware loop capability (8 + num * 2 instructions) Activate Super-Scalar capability (7 + num * 1 instructions) 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

#define re_arrayINPAR1 INPAR1 #define numINPAR2 INPAR2 .global _MakeRamp; _MakeRamp: #define countR1 scratchR1 #define arraypt scratchDMpt countR1 = 0; CAN’T BE MOVED arraypt = re_arrayINPAR1; CAN’T BE MOVED MR_WHILE: COMP(countR1, numINPAR2); if GT JUMP(PC, MR_ENDLOOP) (DB); nop; #define tempF2 scratchF2 tempF2 = FLOAT countR1; JUMP(PC, MR_WHILE) (DB); dm(arraypt, 1) = tempF2; countR1 = countR1 + 1; MR_ENDLOOP: 5 magic lines of code for “C” return 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Final “C” Code Translation Code as directly translated (7 + num *10 instr) Possible Optimization -- Worth the effort? Best case would be (7 + num * 6 instructions) Actual optimized was (7 + num * 7 instructions) Optimized Don’t do it unless asked for this course in quizzes and labs Very easy to get it wrong Improved algorithm using DSP architecture Hardware loop capability (8 + num * 2 instructions) Activate Super-Scalar capability (7 + num * 1 instructions) 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Hardware loop void MakeRamp{ register float *re_array, register int num ) { register int count; register float scratch , *arraypt = re_array; num_INPAR3 = pass num_INPAR3 if LE Jump PASTFOR; // not delayed count_R0 = 0; LCNTR num_INPAR3, DO (PC, PASTLOOP-1) UNTIL LCE; // for (count = 0; count < num; count++) { scratch_F1 = (float) count_R0; count_R0 = count_R0 + 1; NOTE THIS *arraypt = scratch_F1; arraypt++; } PASTFOR: } 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Hardware loop – VBasic mode void MakeRamp{ register float *re_array, register int num ) { register int count; register float scratch , *arraypt = re_array; num_INPAR3 = pass num_INPAR3 if LE Jump PASTFOR; // not delayed count_F1 = 0.0; plus1_F2 = 1.0 LCNTR num_INPAR3, DO (PC, PASTLOOP-1) UNTIL LCE; // for (count = 0; count < num; count++) { *arraypt = count_F1; arraypt++; count_F1 = count_F1 + plus1_F2; NOTE THIS } PASTFOR: } 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca

Tackled today Need to set up review process to look for, and remove, common errors when writing assembly code Process to translate a “C” program involving arrays into SHARC code Comparison of timings for non-optimized code, optimized code, hardware loops, super-scalar architecture 1/1/2019 ENEL515 -- Translating “C-based” design to 21061 code Copyright smithmr@ucalgary.ca