Presentation is loading. Please wait.

Presentation is loading. Please wait.

Process for systematic conversion of a design in “C-pseudo code” to SHARC 21061 assembly code M. Smith, Electrical and Computer Engineering, University.

Similar presentations


Presentation on theme: "Process for systematic conversion of a design in “C-pseudo code” to SHARC 21061 assembly code M. Smith, Electrical and Computer Engineering, University."— Presentation transcript:

1 Process for systematic conversion of a design in “C-pseudo code” to SHARC 21061 assembly code M. Smith, Electrical and Computer Engineering, University of Calgary, Canada smithmr @ ucalgary.ca This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

2 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 2 / 56 -- 2 DAYS Lots To Be Tackled Today (2 days) Setting up special processor constants and registers to gain speed during assembly language constructs Review of use of index and modify registers Prologue, Body and Epilogue of “C” program translated to assembly code (NO DIFFERENCE by hand or by compiler) Example conversion of “C” program into ADSP21061 using a standard procedure Take into account register architecture Take into account LOAD/STORE architecture Take into account standard assembly code problems Handle Program Flow Constructs Then do conversion of code on line by line basis Learning why to avoid calling “C” from assembly

3 ADSP-2106x Core Architecture

4 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 4 / 56 -- 2 DAYS Typical 68K operations to Memory using Indirect Addressing Manipulate a value using address register as a pointer MOVE.L (0, A0), D0 variable_D0 = *pt_A0 Read Adjacent Elements in an Array by incrementing the pointer MOVE.L (0, A0), D0 variable_D0 = *pt_A0 ADD.L #4, A0 pt_A0++ (increment by 1) MOVE.L (0, A0), D1 variable_D1 = *pt_A0

5 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 5 / 56 -- 2 DAYS Typical 21k operations to Memory using Indirect Addressing Manipulate a value using address register as a pointer R0 = dm(0, I4) c.f MOVE.L (0, A0), D0 Read Adjacent Elements in an Array by incrementing the pointer Note increment by 1 may change I4 by 2 or 4 -- WAIL R0 = dm(0, I4) c.f. MOVE.L (0, A0), D0 I4 = I4 + 1; c.f. ADD.L #4, A0 **** R1 = dm(0, I4) c.f. MOVE.L (0, A0), D1 ILLEGAL SHARC OPERATION

6 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 6 / 56 -- 2 DAYS Register and Register Ops in DAG1 SPECIAL CIRCBUFFER STUFF SPECIAL FFT BIT

7 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 7 / 56 -- 2 DAYS Typical 21k operations to Memory using Indirect Addressing (Manual) Read Adjacent Elements in an Array by incrementing the pointer manually Note increment by 1 may change I4 by 2, 4 bytes (WAIL) R0 = dm(0, I4) c.f. MOVE.L (0, A0), D0 M6 = 1; c.f. ADD.L #4, A0 Modify(I4, M6) R1 = dm(0, I4) c.f. MOVE.L (0, A0), D1 NOTE -- 68k D0, D1 equivalent to 21k R0, R1 but 68k A0 is similar to 21k I4 WHY M6 and not M4?

8 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 8 / 56 -- 2 DAYS Typical 21k operations to Memory using Indirect Addressing Read Adjacent Elements in an Array by incrementing the pointer -- TWO APPROACHES Note increment by 1 may change I4 by 2, 4 bytes M6 = 1; R0 = dm(M6, I4) c.f. MOVE.L (4, A0), D0 WATCH OUT -- OFFSET NOT INCREMENT R0 = dm(I4, M6) c.f. MOVE.L (0, A0), D0 ADD.L #4, A0 WATCH OUT -- INCREMENT NOT OFFSET BUT WITH THE POTENTIAL OF BEING FASTER INSTRUCTION

9 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 9 / 56 -- 2 DAYS PSP -- Code review to avoid DEFECT Post-incrementing and OFFSET M6 = 1; R0 = dm(I4, M6); POST-INCREMENT means R0 = dm(I4) and then I4 = I4 + M6 BUT R0 = dm(M6, I4); OFFSET INDEX ONLY means R0 = dm(M5 +I4) and still keepsI4 = I4

10 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 10 / 56 -- 2 DAYS Worked Example B4 = 4000; L4 = 0; ***** NORMAL APPROACH -- set to 0 I4 = 4002; M6 = 1; PRESET TO 1 in “C” startup R0 = dm(M6, I4); OFFSET INDEX ONLY R1 = dm(M6, I4); means R0 = dm(4002 + 1) and R1 = dm(4002 + 1) with I4 = 4002 still unchanged at the end of the code R0 = dm(I4, M6); POST-INCREMENT R1 = dm(I4, M6); means R0 = dm(4002) and R1 = dm(4003) with I4 = 4004 at the end of the code

11 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 11 / 56 -- 2 DAYS Effect of length register and Post- incrementing and OFFSET -- Lab. 2 B4 = 4000; L4 = 3; *********** Normally set to zero NOT HERE I4 = 4002; ** Allows 1 21k instruction = 63 68k instructions M6 = 1; ** Key DSP architecture characteristic R0 = dm(M6, I4); OFFSET INDEX ONLY R1 = dm(M6, I4); means R0 = dm(4002 + 1) and R1 = dm(4002 + 1) with I4 = 4002 still R0 = dm(I4, M6); POST-INCREMENT R1 = dm(I4, M6); means R0 = dm(4002) BUT R1 = dm(4000) *(4003 - 3)* with I4 = 4001 HARDWARE CIRCULAR BUFFER NO CIRCULAR BUFFER!

12 ADSP-2106x Core Architecture

13 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 13 / 56 -- 2 DAYS Register and Register Ops in DAG1 SPECIAL CIRCBUFFER STUFF SPECIAL FFT BIT

14 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 14 / 56 -- 2 DAYS 21k Code example.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(i7,-2); r2=i1; dm(-3,i6)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4,m6); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4,m6); // line 41 F12=F2*F4, dm(i1,m6)=r1; _L$566002: i12=dm(m7,i6);// line 42 jump(m14,i12) (DB); i1=dm(-3,i6); rframe;

15 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 15 / 56 -- 2 DAYS “C” on a Super-scalar RISC DSP (e.g. SHARC) 68K “C” involves many stack operation subroutine parameters passed on stack local variables stored on stack local arrays stored on stack return address on stack subroutines deeply nested int SomeFunction(int inpar1, float inpar2) { int count; float array[200]; etc. } 5 animations “C” is not natural to SHARC processor -- SHARC has small hardware stack “C” must be made to happen using stack operations working on a LIFO stack in data memory

16 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 16 / 56 -- 2 DAYS 21k Code example reformatted.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(i7,-2);// PROLOGUE r2=i1; dm(-3,i6)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4,m6); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4,m6); // line 41 F12=F2*F4, dm(i1,m6)=r1; _L$566002: i12=dm(m7,i6); // line 42 // Hidden automatic processor NOP -- WAIL jump(m14,i12) (DB);// RETURN TO “C” i1=dm(-3,i6);// EPILOGUE rframe;

17 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 17 / 56 -- 2 DAYS Making “C” work on the 21k “C” is not natural to SHARC processor Set aside certain index and length registers for STACK operations Index registers I6, I7 -- Corresponding length registers L6 and L7 Index registers I6 (FP) and I7 (CTOPstack) NOT SP SP is a specialized SHARC hardware register for LIMITED non-C subroutine return addresses and not for parameter passing Corresponding length registers L6, L7 must be kept as 0 LENGTH registers can provide some very useful properties to arrays and array handling -- circular buffers etc. -- Labs 2, 3 and 4 These “useful” properties are EXACTLY what we DON’T want to happen with the array used as the “C” stack

18 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 18 / 56 -- 2 DAYS 21k Code example reformatted.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2);// PROLOGUE r2=i1; dm(-3, FP)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4,m6); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4,m6); // line 41 F12=F2*F4, dm(i1,m6)=r1; _L$566002: i12=dm(m7, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(m14, i12) (DB);// RETURN TO “C” i1=dm(-3, FP);// EPILOGUE rframe;

19 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 19 / 56 -- 2 DAYS Optimize SHARC assembly code for speed with special registerized constants in Modify registers Store certain commonly used fixed constant offsets in non- volatile Modify registers -- These registers are set automatically during “C” Start-up code IF USED BY PROGRAM DAG1 -- M5 (0), M6 (-1), M7(+1) -- accessing DM memory data DAG2 -- M13 (0), M14 (-1), M15 (+1) -- accessing PM memorydata Highly confusing to remember which register contains what when hand coding (and writing exams) We will make use of a SHARC process involving a cdefines.i file to define standard register names for use with the 21k architecture when coding assembly language programs that link to “C” code during labs and exams. Call it cdefines.i in lectures and labs. Actual file name clanguage_register_defines.i see Lab. 0/1 web

20 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 20 / 56 -- 2 DAYS Hand-coded “C” compatible assembly SHARC process -- use standard “clanguage_register_defines.i” file # define zeroDM M5 (0 offset for DM ops) #define zeroPM M13 (0 offset for PM ops) #define plus1DM M6 (+1 offset for DM ops) #define plus1PM M14 (+1 offset for PM ops) #define minus1DM M7 (-1 offset for DM ops) #define minus1PM M15 (-1 offset for DM ops) Note how must take Harvard Architecture into account so can adjust both DM (data memory) and PM (program memory) index registers using Modify registers from both DAGs -- but can’t be cross used

21 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 21 / 56 -- 2 DAYS 21k Code example reformatted.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2);// PROLOGUE r2=i1; dm(-3, FP)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4, plus1DM); // line 41 F12=F2*F4, dm(i1, plus1DM)=r1; _L$566002: i12=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, i12) (DB);// RETURN TO “C” i1=dm(-3, FP);// EPILOGUE rframe;

22 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 22 / 56 -- 2 DAYS SHARC process -- Respect the registers that the “C” compiler uses Volatile Registers (not used by “C” compiler -- destroyed by “C”) R0, R1, R2 (also F0, F1, F2) R4, I4, M4 (also F4) (S.O.T.T.) R8 (also F8) (S.O.T.T.) R12, I12, M12 (also F12) (S.O.T.T.) S.O.T.T. means Some Of The Time -- special issues Non-volatile Registers (used by “C” compiler) EVERYTHING ELSE SHARC PROCESS -- Save and recover NON-VOLATILE registers

23 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 23 / 56 -- 2 DAYS 21k Code example reformatted.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2);// PROLOGUE r2=i1; dm(-3, FP)=r2; i4=_centigrade; // line 37 r4=1072064102; // line 38 r2=dm(i4, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 F12=F2*F4; lcntr=128, do(pc,_L$566002-1)until lce; F1=F0+F12, r2=dm(i4, plus1DM); // line 41 F12=F2*F4, dm(i1, plus1DM)=r1; _L$566002: i12=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, i12) (DB);// RETURN TO “C” i1=dm(-3, FP);// EPILOGUE rframe;// Hidden changes to FP and CTOPofSTACK

24 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 24 / 56 -- 2 DAYS Conversion of “C” code to assembly -- an example There is a one-to-one equivalence in concept to what happens on MIPS processor (2nd year) 68K processor (3rd year) ADSP21061 processor (4th year) Most other processors Remember that the concept is exactly the same on all processors EXCEPT THE IMPLEMENTATION IS DIFFERENT

25 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 25 / 56 -- 2 DAYS 3 parts of SHARC process to obtain “C” compatible assembly language code PROLOGUE <- Always the “same” CODY BODY EPILOGUE <- Always the “same” Always the “same” means that you learn to write the code once and then use with only minor modification each time you write code in the future Just the same as with 68K “C”/assembly compatibly taught in ENMCM415. Look up those web-pages http://www.enel.ucalgary.ca/People/Smith/2001webs/01encml415

26 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 26 / 56 -- 2 DAYS TYPICAL PROLOGUE FOR A SHARC PROCESS FUNCTION 21K.segment/pm seg_pmco; <- to go into PM memory.global _Example; WARNING -- SEMIColon _Example: WARNING -- Colon (Set stack frame and save non-volatile registers) These semicolons are needed because of the parallel capability of the processor instructions -- 4 operations in one instruction 68K.section code.export _Example _Example: (Set stack frame and save non-volatile registers)

27 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 27 / 56 -- 2 DAYS TYPICAL BODY OF FUNCTION Use and reuse scratch registers Use a standard process to avoid errors -- REQUIRED 21K scratchR1, scratchR2, scratchDMpt (I4), scratchPMpt (I12), scratchDMmod (M4), scratchPMmod 68K scratchD0, scratchD1 scratchA0pt, scratchA1pt If use non-volatile registers as well, then must save to stack (during PROLOGUE) and recover from stack (during EPILOGUE) -- could slow the code e.g during interrupts

28 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 28 / 56 -- 2 DAYS 21k Code example reformatted.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2);// PROLOGUE scratchR2 =i1; dm(-3, FP)= scratchR2 ; scratchDMpt=_centigrade; // line 37 r4=1072064102; // line 38 scratchR2 =dm(scratchDMpt, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 scratchF12= scratchF2 *scratchF4; lcntr=128, do(pc,_L$566002-1)until lce; scratchF1=scratchF0+scratchF12, scratchR2 =dm(scratchDMpt, plus1DM); scratchF12= scratchF2 *scratchF4, dm(i1, plus1DM)= scratchR1; _L$566002: scratchDMpt=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, scratchDMpt) (DB);// RETURN TO “C” i1=dm(-3, FP);// EPILOGUE rframe;// Hidden changes to FP and CTOPofSTACK

29 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 29 / 56 -- 2 DAYS Hidden issues scratchR2 =dm(scratchDMpt, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 scratchF12= scratchF2 *scratchF4; Where did F2 come from? lcntr=128, do(pc,_L$566002-1)until lce; scratchF1=scratchF0+scratchF12, scratchR2 =dm(scratchDMpt, plus1DM); scratchF12= scratchF2 *scratchF4, dm(i1, plus1DM)= scratchR1; Where did F1 go to? Where did F2 come from? scratchR2 =dm(scratchDMpt, plus1DM) is a “bit-pattern fetch from memory” instruction (char, int, float) NOT “integer fetch from memory” instruction The registers STORE ‘bit patterns’ not integer or floating point values. The “floating-point-ness” or “integer-value-ness” is a property of the ALU (operations) and NOT the registers themselves!!!!!

30 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 30 / 56 -- 2 DAYS SHARC PROCESS TYPICAL EPILOGUE 68K ---- “C” Stack is part of normal processor (Recover any saved non-volatile registers) ADD.L #FRAME_SIZE, SP <-- Recover stack space (destroy stack frame) RTS <--- Uses SP (A7) by design 21K There is a 21061 Hardware stack -- 6 or 8 deep. “C” Stack is NOT part of this hardware stack. (Recover any saved non-volatile registers) Activate 21061 code to perform destroy stack frame equivalent -- 68k UNLINK FP Activate 21061 code to perform RTS equivalent Designers have added instructions to the architecture to support the software stack associated with “C” coding -- CJUMP and RFRAME

31 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 31 / 56 -- 2 DAYS 21K return from “C” -- 5 STANDARD MAGIC LINES scratchPMpt = dm(minus1DM, FP); nop; // might be carefully filled -- TIMING ISSUE jump(plus1PM, scratchPMpt) (DB); nop; // might be carefully filled RFRAME; “C” specific assembler instruction Note use of SHARC PROCESS of the INDENTING OF INSTRUCTIONS for denoting delayed branch instructions Note the all key nops in code (timing issues) Always the same code. Cut and paste for now Will become obvious later Timing issue -- fetching of a DAG register and then using it

32 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 32 / 56 -- 2 DAYS 21k Code example reformatted.global _ConvertUsingPointers; // PROCEDURE: ConvertUsingPointers _ConvertUsingPointers: modify(CTOPofSTACK,-2);// PROLOGUE scratchR2 =i1; dm(-3, FP)= scratchR2 ; scratchDMpt=_centigrade; // line 37 r4=1072064102; // line 38 scratchR2 =dm(scratchDMpt, plus1DM); i1=_fahrenheit; r0=1107296256; // line 40 scratchF12= scratchF2 *scratchF4; lcntr=128, do(pc,_L$566002-1)until lce; scratchF1=scratchF0+scratchF12, scratchR2 =dm(scratchDMpt, plus1DM); scratchF12= scratchF2 *scratchF4, dm(i1, plus1DM)= scratchR1; _L$566002: scratchDMpt=dm(minus1PM, FP); // line 42 // Hidden automatic processor NOP -- WAIL jump(plus1PM, scratchDMpt) (DB);// RETURN TO “C” i1=dm(-3, FP);// EPILOGUE rframe;// Hidden changes to FP and CTOPofSTACK

33 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 33 / 56 -- 2 DAYS STANDARD SHARC PROCESS We want to have a PROCESS to convert the basic parts of a design in “C” pseudo-code to SHARC 21k assembly code Minimize ERRORS -- jumping backwards and forwards between editor, assembler and linker while developing a prototype ERRORs become the big time waster when jumping to and from the simulator while testing this prototype. Minimize DEFECTS -- Defects are the carry over of the mistakes from one apparently working prototype into another protype -- HUGE TIME WASTER

34 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 34 / 56 -- 2 DAYS Remember the 5-OR-60 rule Spend enough time in design and code review. An EXTREME PROGRAMMING APPROACH with 5 minutes for design and code review will save you 60 minutes during testing. What’s enough time? -- SEI INDUSTRY VALIDATION

35 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 35 / 56 -- 2 DAYS SHARC code -- FM-STEREO Example AM - amplitude modulation -- typically MONO Carrier with varying amplitude Mix to bring to base frequency then rectify FM - frequency modulation Carrier with varying frequency/phase Use FM demodulator to convert frequency changes into amplitude changes Get DC components (0 -- 10 kHz) plus an AM modulated carrier (10 - 30 khz) Channel 1 -- Left sound + Right Sound from DC Channel 2 -- Left Sound - Right Sound from carrier

36 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 36 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *) void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; if (!comment) { Jump to “C” -- printf( ) -- why code the slow and obvious printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } // If Channel Strength is too weak then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; // L + R else { *channel_one = (temp_one + temp_two) >> 1; // L+ R +(L - R) *channel_two = (temp_one - temp_two) >> 1; // L+ R - (L - R) }

37 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 37 / 56 -- 2 DAYS SHARC PROCESS -- STEP 1A Convert C-design to account for RISC architecture void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { ON SHARC -- First three subroutine parameters are PLACED in DATA registers even if the parameters are copies of values of pointer registers (index registers) void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) {

38 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 38 / 56 -- 2 DAYS SHARC PROCESS -- STEP 1B Convert C-design to account for RISC architecture int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; if (!comment) { ……………. } BECOMES register int temp_one = *channel_one; register int temp_two = *channel_two; static int comment = 0; <- must be stored in memory and not register if (comment == 0) { <- Got to be specific when writing assembly ………... }

39 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 39 / 56 -- 2 DAYS SHARC PROCESS -- STEP 1C Convert C-design to account for RISC architecture static int comment = 0; <- Must be stored in memory and not register if (comment != 0) { <- Tests can’t be done on memory values printf( ); in a RISC processor architecture comment = 1; } BECOMES static int comment = 0; <- Must be stored in memory -- not register register int temp_comment; temp_comment = comment;<- Grab the value from memory if (temp_comment == 0) {<- Test using a register printf( ); comment = 1; <- Still okay in THIS RISC architecture } ENDIF:<- Must add this to handle assembly code GOTO structure

40 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 40 / 56 -- 2 DAYS SHARC PROCESS -- STEP 1D Convert C-design to account for RISC architecture if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } BECOMES register int temp_constant; temp_constant = 25;***************!!!!!*****!!!!!********* if (channel_two_strength < temp-constant) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two); *channel_one = *channel_one >> 1; *channel_two = (temp_one - temp_two); *channel_two = *channel_two >> 1; }

41 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 41 / 56 -- 2 DAYS void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { register int temp_one = *channel_one; register int temp_two = *channel_two; register temp_value; static int comment = 0; temp_value = comment; if (temp_value == 0) {WARNING -- SPECIAL CASE printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } else /* DO NOTHING */;WARNING -- MUST ADD THIS temp_value = 25; if (channel_two_strength < temp_value) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two); *channel_one = *channel_one >> 1; *channel_two = (temp_one - temp_two); *channel_two = *channel_two >> 1; }

42 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 42 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2 Develop the subroutine PROLOGUE void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { register int temp_one = *channel_one; register int temp_two = *channel_two; register temp_value; Incoming register int channel_two_strength -- INPAR1 -- in R4 -- leave it there Incoming register int *channel_one -- INPAR2 -- in R8 -- CAN’T leave it there Must move into volatile DM pointer -- I4 Incoming register int *channel_two -- INPAR3 -- in R12 -- CAN’T leave it there Must move into volatile DM pointer -- BUT I4 already in use register int temp_one = *channel_one; Allowed in R1? register int temp_two = *channel_two; Allowed in R2? register temp_value; Allowed in R3?

43 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 43 / 56 -- 2 DAYS Make use of a standard format for register names -- “cdefines.i” #define scratchR0R0(WARNING -- also retvalueR0) #define scratchR1 R1 #define scratchR2 R2 #define scratchF1 F1(WARNING -- identical to R1 for storage) #define scratchF2 F2(WARNING -- identical to R2 for storage) #define scratchDMpt I4 #define scratchDMmod M4 #define scratchPMpt I12(WARNING -- Program Memory DAG) #define scratchPMmod M12 (WARNING -- Program Memory DAG) #define INPAR1R4(WARNING -- DATA register NOT POINTER) #define INPAR2R8 even when used to pass copy of pointer #define INPAR3R12 #define scratchR4R4etc.

44 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 44 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2A Develop the subroutine PROLOGUE // Show the parameters being passed as part of documentation #define channel_two_strengthR4 scratchR4 // Same as INPAR1 void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { #define temp_oneR1 scratchR1// register int temp_one = GARBAGE register int temp_one = *channel_one; #define temp_twoR2 scratchR2// register int temp_two = GARBAGE register int temp_two = *channel_two; #define temp_valueR0 scratchR0// register temp_value = GARBAGE Incoming register int *channel_one -- INPAR2 -- in R8 -- CAN’T leave it there Must move into volatile DM pointer -- I4 Incoming register int *channel_two -- INPAR3 -- in R12 -- CAN’T leave it there Must move into volatile DM pointer -- BUT I4 already in use CHOICES -- Place I3 onto stack or Reuse I4 -- worry about speed later

45 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 45 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2B Develop the subroutine PROLOGUE // Show the parameters being passed as part of documentation #define channel_two_strengthR4 scratchR4 // Same as INPAR1 void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) { #define temp_oneR1 scratchR1// register int temp_one = GARBAGE scratchDMpt = INPAR2;// register int temp_one = *channel_one; temp_oneR1 = dm(scratchDMpt); #define temp_twoR2 scratchR2// register int temp_two = GARBAGE YOU ADD THE CODE// register int temp_two = *channel_two; #define temp_valueR0 scratchR0// register temp_value = GARBAGE Placing I3 onto stack // Two extra lines -- if you get it right (Save/Recover) Reuse I4 // Four EXTRA lines of which only two shown here // Actually do-able in 3 (a little dicey)

46 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 46 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2C Correct the subroutine PROLOGUE.segment/pm seg_pmco; // void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) {.global _DecodeFMSTEREO, DecodeFM_STEREO _DecodeFMSTEREO: DecodeFMSTEREO: // Show the parameters being passed as part of documentation #define channel_two_strengthR4 scratchR4 // Same as INPAR1 #define temp_one scratchR1// register int temp_one = GARBAGE scratchDMpt = INPAR2;// register int temp_one = *channel_one; temp_oneR1 = dm(scratchDMpt); #define temp_twoR2 scratchR2// register int temp_two = GARBAGE YOU ADD THE CODE// register int temp_two = *channel_two; #define temp_valueR0 scratchR0// register temp_value = GARBAGE

47 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 47 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2D Correct the subroutine PROLOGUE CORRECTLY void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; if (!comment) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } // If Channel Strength is too weak then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } NOT A REGISTER NOR A STACK VALUE

48 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 48 / 56 -- 2 DAYS SHARC PROCESS -- STEP 2D Correct the subroutine PROLOGUE CORRECTLY.segment/dm seg_dmda var int comment = 0;// NASTY HIDDEN ERROR.endseg;.segment/pm seg_pmco; // void DecodeFMSTEREO(register int channel_two_strength, register int *channel_one, register int *channel_two) {.global _DecodeFMSTEREO, DecodeFM_STEREO _DecodeFMSTEREO: DecodeFMSTEREO: // Show the parameters being passed as part of documentation #define channel_two_strength scratchR4 // Same as INPAR1 What’s missing?

49 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 49 / 56 -- 2 DAYS SHARC PROCESS -- STEP 3 Modify the standard EPILOGUE // Place the return value in retvalueR0 -- N/A // Recover non-volatile registers from stack -- N/A scratchPMpt = dm(minus1DM, FP); nop; // might be carefully filled jump(plus1PM, scratchPMpt) (DB); nop; // might be carefully filled RFRAME;.endseg; Just a CUT-AND-PASTE job

50 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 50 / 56 -- 2 DAYS SHARC PROCESS -- STEP 4A Convert C-design body -- standard IF-ELSE scratchR0 = 25;// temp_constant = 25; // if (channel_two_strength < temp-constant) COMP(channel_two_strength, scratchR0);// dead <- scratchR0 if LE jump(PC, DO_ELSE) (DB); nop;// Are these delayed branches fillable nop; scratchDMpt = INPAR2;// *channel_two = *channel_one; scratchR0 = dm(scratchDMpt); scratchDMpt = INPAR3; // Note the indenting as part of the documentation dm(scratchDMpt) = scratchR0; jump (PC, ENDIF) (DB); nop; DO_ELSE: // else { // *channel_one = (temp_one + temp_two); // *channel_one = *channel_one >> 1; // *channel_two = (temp_one - temp_two); // *channel_two = *channel_two >> 1; // }

51 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 51 / 56 -- 2 DAYS SHARC PROCESS -- STEP 4A -- in this particular subroutine Convert C-design body -- standard IF-ELSE scratchR0 = 25;// temp_constant = 25; // if (channel_two_strength < temp-constant) COMP(channel_two_strength, scratchR0);// dead <- scratchR0 if LE jump(PC, DO_ELSE) (DB); nop;// Are these delayed branches fillable nop; dm(scratchDMpt) = temp_oneR1 // *channel_two = *channel_one (temp_one); // INPAR3 just HAPPENS to be in scratchDMpt already jump (PC, ENDIF) (DB); // because of the code you added earlier nop; DO_ELSE: // else { // *channel_one = (temp_one + temp_two); // *channel_one = *channel_one >> 1; // *channel_two = (temp_one - temp_two); // *channel_two = *channel_two >> 1; // }

52 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 52 / 56 -- 2 DAYS ENORMOUS DEFECT INTRODUCED You can’t do any of this -- ALL WRONG You have forgotten what you are coding in the whole while micromanaging the details Key issues -- volatile/non-volatile register use. 21k “C” subroutines -- like 68k “C” subroutines -- destroy volatile registers (R0)

53 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 53 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *) void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; if (!comment) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } // If Channel Strength is too weak then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } Probably destroys R1, R0, R4, I4 etc Using R1, R0, R4, I4 etc

54 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 54 / 56 -- 2 DAYS Program smart -- and cut the DEFECTS void DecodeFMSTEREO(int channel_two_strength, int *channel_one, int *channel_two) { int temp_one = *channel_one; int temp_two = *channel_two; static int comment = 0; // printf( ) CODE WAS HERE // If Channel Strength is too week then just use channel_one on both channels if (channel_two_strength < 25) *channel_two = *channel_one; else { *channel_one = (temp_one + temp_two) >> 1; *channel_two = (temp_one - temp_two) >> 1; } if (!comment) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } “C” CAN DESTROY VOLATILES TO HEART’S CONTENT

55 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 55 / 56 -- 2 DAYS SHARC PROCESS -- STEP 4A -- in this particular subroutine Convert C-design body -- standard IF-ELSE scratchR0 = 25;// temp_constant = 25; // if (channel_two_strength < temp-constant) COMP(channel_two_strength, scratchR0);// dead <- scratchR0 if LE jump(PC, DO_ELSE) (DB); nop;// Are these delayed branches fillable nop; dm(scratchDMpt) = temp_oneR1// *channel_two = *channel_one (temp_one); jump (PC, ENDIF) (DB); nop; DO_ELSE: // else { scratchR0 = temp_oneR1 + temp_twoR2; // *channel_one = (temp_one + temp_two); scratchR0 = ASHIFT scratchR0 BY -1; // *channel_one = *channel_one >> 1; scratchDMpt = INPAR2; dm(scratchDMpt) = scratchR0; // dead <- scratchR0 // *channel_two = (temp_one - temp_two); // *channel_two = *channel_two >> 1; ENDIF:// } YOU COMPLETE

56 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 56 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *) static int comment = 0; if (comment == 0) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } Got placed in seg_dmda in PROLOGUE and given label “comment” Label means “address-location” not value scratchR0 = dm(comment);// NOT scratchR0 = comment // This operation would set 68k N and Z flags // which could then be used to control conditional branch // Not true on the 21k scratchR0 = PASS scratchR0;// Test for Zero and Negative if NE jump (PC, NOCOMMENT) (DB); NOP;// NOT pass(scratchR0) NOP; // which is MFE

57 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 57 / 56 -- 2 DAYS void DecodeFMSTEREO(int, int *, int *) static int comment = 0; if (comment == 0) { printf("Smith DecodeFMStereo() -- FM_STEREO demodulation example"); comment = 1; } #define commentR0 scratchR0// Better code maintainability commentR0 = dm(comment); commentR0 = PASS commentR0;// Test for Zero and Negative if NE jump (PC, NOCOMMENT) (DB);// dead <- R0 NOP; Code to call printf ( ) here NOCOMMENT:

58 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 58 / 56 -- 2 DAYS Why we don’t “Call C” from assembly Coding the “printf( )” call printf(“Print out the value of %d”, comment);.segment/dm seg_dmda var int comment = 0; FORMAT1_LABEL:.var FORMAT1_STRING[ ] =83,109,105,116,104,32,68,101, etc, 0;// Don’t forget me! -- “C” EOS.endseg; Ascii code for “Print out the value of %d”.segment/pm seg_pmco; OUTPAR2 = FORMAT1_LABEL;// Pointer to string OUTPAR1 = dm(comment);// Value NOT pointer CALL _printf (DB): nop;

59 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 59 / 56 -- 2 DAYS Why we don’t “Call C” from assembly Coding the “printf( )” call // Get starting address of printf format scratchR0 = FORMAT1_LABEL; // Note that is not the stack controlled by SP dm(CTOPstack,minus1DM) = scratchR0;.extern _printf; cjump _printf (DB); dm(CTOPstack,minus1DM) = r2; dm(CTOPstack,minus1DM) = pc; modify(CTOPstack,plus1DM); GOT ONE LINE RIGHT USING CJUMP not CALL CJUMP causes R2 <- FP (I6) R2 is destroyed “internally” 3 Values placed on stack Only 1 taken off here dm(CTOPstack,minus1DM) = scratchR0; dm(CTOPstack,minus1DM) = r2; dm(CTOPstack,minus1DM) = pc; modify(CTOPstack,plus1DM); Save FP (as R2) Save Return Address (one off)

60 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 60 / 56 -- 2 DAYS The importance of “C” The use of “C” language is so important that there are specialized instructions added to the processor instruction set in order to support an efficient “C” language interface CJUMP RFRAME What rules does the compiler use to determine whether to call CJUMP or CALL? No idea -- but I have never had a problem where the compiler generated the wrong code to access my “C” or assembly routines. Suspect -- “CJUMP” for library calls (flag in header file?)

61 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 61 / 56 -- 2 DAYS How should I code in? Assume that your routines will always get their parameters passed to them in INPAR1, INPAR2 and INPAR3 THE CONCEPT MUST WORK This is the first year I have worried about CJUMP and I have not had problems before. WORRIED is the wrong word -- never ever noticed the distinction before Don’t call “C” routines from your assembly unless you know what you are doing! Call “C” from “C” instead

62 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 62 / 56 -- 2 DAYS SHARC PROCESS -- STEP 5 OPTIMIZE THE CODE Remember -- not normally worth the effort Going to require Knowing the parallel instructions Knowing which ones are valid in combination Taking into account the limitations associated with the finite number of bits in the op-code to describe the parallel operations wanted Understanding Hardware loops Understanding memory and ALU pipelining Optimization is “NEXT WEEK COUNTRY”

63 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 63 / 56 -- 2 DAYS Other examples of code conversion Many examples in previous year’s web pages Take a look at the “assembly output” generated by the “C”-compiler for Lab. 0 Use the -S option and look for the.asm file Get to know the “required stuff” so you can quickly break through the “barrier” and get to the stuff you really want to do -- DSP customization KEY -- Develop a PSP code review process

64 6/15/2015 ENCM515 -- Process for “pseudo-C” design to 21k assembly Copyright smithmr@ucalgary.ca 64 / 56 -- 2 DAYS Tackled over the past 2 lectures Setting up special processor constants and registers to gain speed during assembly language constructs Review of use of index and modify registers Prologue, Body and Epilogue of “C” program translated to assembly code (NO DIFFERENCE by hand or by compiler) Example conversion of “C” program into ADSP21061 using a standard procedure Take into account register architecture Take into account LOAD/STORE architecture Take into account standard assembly code problems Handle Program Flow Constructs Then do conversion of code on line by line basis Learning why to avoid calling “C” from assembly


Download ppt "Process for systematic conversion of a design in “C-pseudo code” to SHARC 21061 assembly code M. Smith, Electrical and Computer Engineering, University."

Similar presentations


Ads by Google