Program Flow on ADSP2106X SHARC Pipeline issues This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. Program Flow on ADSP2106X SHARC Pipeline issues M. R. Smith, Electrical and Computer Engineering University of Calgary, Alberta, Canada smithmr @ ucalgary.ca
To be tackled today Parts of the SHARC program sequencer Similarity to “old” micro-sequencers used when design custom byte-slice array processors back in early 80’s Pipelining issues Resource conflict between instructions Delayed branches -- nops or instructions to find Loop, restrictions and “short loops”, counter and non-counter based loops interrupt concepts -- see later lecture Instruction Cache 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
SHARC OVERVIEW Linear Flow Loops -- one sequence of instructions executed several times with no overhead Subroutines -- temporary interruption of sequential flow with associated return Jumps -- permanently transfers flow Interrupts -- special case of “subroutines” triggered by an event at run time Idle -- cease current operation, hold state till interrupt -- low power mode? -- event driven programs? 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Functionality of the Sequencer More sophisticated than in the CPU in 68K 21061 Sequencer Selects address of next instruction -- pre-calculated addresses earlier Generate many address choices and also Incrementing the fetch address maintaining small hardware stacks evaluating whether to do conditional operations decrementing the hardware loop counter calculating new address -- circular buffer, bit reverse handling interrupts 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
21K Program Sequencer 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Nothing new in this world I could just as easily hand round “CUSTOM” microcoded DSP array processor using AMD2911 instruction sequencer from ENEL515 notes from 1981. 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Generic “microcoded CPU” 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
AMD2911 Microsequencer chip/library ‘micro’PC can be kept the same “single cycle” loops or loaded from “hardware increment of NEXT ADDRESS” “Next address” is not from microPC but “selection” Mux 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
AMD2911 Microsequencer chip/library Entry path for JUMP address or counter value (cf 21k LCNTR) Hardware counter -- cf. 21k LCE or “Remember the bottom of loop” Hardware PC stack with hardware SP Used for subroutines and “remembering” the first address in a hardware controlled loop. Since limited hardware size then stack overflow must be allowed to activate exception handler. 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
AMD2911 Microsequencer chip/library Sequencer control logic and associated control signals Multiplexer -- allows “immediate selection” of a variety of pre-calculated “next addresses” 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
AMD2911 -- Handwired (ENEL515 -- 1981) ADSP21061 ENCM515 -- 2002 AMD2911 -- Handwired (ENEL515 -- 1981) 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
For more information see p 3-5 Program sequencer registers FADDR (fetch), DADDR (decode) PC (execute) PCSTK, PCSTKP (PC stack control) LADDR, CURLCNTR, LCNTR (loop) Also many related system registers including user defined status flags Means loop till user says not to 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
21k instruction cycle -- 3 phases In the fetch cycle, reads instruction from either on-chip instruction cache or program memory During the decode, the instruction is decoded, generating conditions that control execution. Not the same as 68k decode phase In the execute cycle, executes the instructions and the operations specified by the instruction are completed. Essentially the 68k decode, execute and writeback phases at the same time 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Instruction Cycles are overlapped/pipelined EXECUTE DECODE FETCH n n+1 n+2 n+3 n+4 n n+1 n+2 n+3 n+4 Cycle # 1 2 3 4 5 6 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
For Interesting implications Since the instruction stream is pipelined then need “3 PC” in order to be able to restart an interrupted program sequence. Implies that interrupts have considerable overhead -- getting in, getting out and restarting old sequence. Several “PC” -- Program sequencer registers FADDR (fetch), DADDR (decode) PC (execute) 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Conditional Branches MOST INSTRUCTIONS CONDITIONAL TRUE, FALSE NOTE: WHOLE OF INSTRUCTION is conditional, NOT PARTS 21K condition codes with 68K equivalent EQ, LT, LE AC (unsigned carry), AV (signed overflow) ‘Special’ 21K condition codes -- parallel operations Multiplier (MV, MS overflows) and Shifter (SV, SZ overflows) Flags (FLAG0_IN, 1, 2, 3 -- specialized “build-in” I/O lines) BM (Bus master) LCE (Loop counter expired) 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Implications -- JUMP (PC, NEXT) ; non-delayed branch -- no (DB) if Call(PC, NEXT); then address n+1 pushed to PC hardware stack as instruction j is Fetched) CondBr n nop nop j EXECUTE DECODE FETCH CondBr n n+1 ->nop n+2 ->nop j j+1 CondBr n n+1 n+2 j j+1 1 2 3 4 5 6 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Implications -- JUMP (PC, NEXT) (DB); Delayed branch if Call(PC, NEXT) (DB); then address n+ 3 pushed to PC hardware stack as instruction j is Fetched) CondBr n n+1 n+1 j EXECUTE DECODE FETCH CondBr n n+1 n+2 j j+1 CondBr n n+1 n+2 j j+1 1 2 3 4 5 6 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Key Issue All Jumps are conditional Condition is assumed as TRUE if not the condition is not specified -- ie default is TRUE Conditional JUMP Transparent stalls after the instruction Conditional JUMP (DB) Always executes the two instructions after the jump whether conditional jump is taken or not 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Loops (DO UNTIL) LCNTR = 30, DO ENDLOOP - 1 UNTIL LCE TERMINATION LCNTR = 30, DO ENDLOOP - 1 UNTIL LCE R0 = dm(I0, M0), F2 = PM(I8, M8) R1 = R0 - R14 F4 = F2 + F3 ENDLOOP: During DO UNTIL push last address and termination condition onto LOOP STACK pushes top of loop address onto PC stack Maximum of 6 entries in LOOP STACK TOP OF LOOP LAST ADDRESS 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Flow during the hardware “Loop Back” portion Note that “beginning of loop” is pre-fetched with no stalls in the instruction fetches E-2 E-1 E B EXECUTE DECODE FETCH E-2 E-1 E B B+1 E-2 E-1 E B B+1 1 2 3 4 5 6 END OF LOOP BEGIN OF LOOP 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Flow during the hardware loop termination Note that “beginning of loop” is NOT pre-fetched but “instructions outside the loop” are fetched with no stalls. E-2 E-1 E E+1 EXECUTE DECODE FETCH E-2 E-1 E E+1 E+2 E-2 E-1 E E+1 E+2 1 2 3 4 5 6 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Implications Maximum depth of loops and subroutine calls Implications on running “C” code Condition is tested “two cycles” before the end of the instruction Means loop counter (if used) is decremented “two cycles” before the end of the loop -- What if loop of 1 cycle? May have to “unwind” some instruction fetches/decodes Nested loops can’t end of same instruction Last three instructions of a loop can’t be branch, jump, call or return Exception -- Non-delayed call where subroutine uses RTS (LR) instruction (loop return) 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Non-counter based loops Not using LCE (counter = 0) as the loop exit condition If this is the outer loop of nested loops then end address of loop must be at least 2 addresses after last address of inner loop Pipeline and loops 3 instruction loop -- tested at top of loop, COMPLETES loop before exitting -- do-while not while 2 instruction loop -- if condition already true, complete this and next pass -- even if not wanted? 1 instruction loop -- completes three more cycles though loop -- even if not wanted? 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
That looks like a midterm/final exam question if I ever saw one That looks like a midterm/final exam question if I ever saw one. Implement and discuss the operation of exitting from a hardware loop based on whether FLAG1 is set or not n n+1 n+2 n+3 EXECUTE DECODE FETCH n n+1 n+2 n+3 n+4 n n+1 n+2 n+3 n+4 1 2 3 4 5 6 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Interrupt issues Plan to tackle in Lab.4 Latency Interrupt Vector Table ISR Trampoline code Instruction completion Multiple interrupts Timer and IDLE, IDLE16 For more details see -- Manual 3-21 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Cache issues Key issues See Manual 3-38 Fetching of instruction from PM can conflict with fetching of data from PM from another instruction Instruction not cached until used once therefore data/instruction conflict possible on first time round a loop Cache is limited in size -- 40 instructions See Manual 3-38 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Example of conflict issues 1 - R1 = 0, R3 = dm(I4, plus1DM); 2 - R1 = R1 + R3, R2 = dm(I4, plus1DM); 3 - R1 = R1 + R2, R2 = dm(I4, plus1DM); 4 - R1 = R1 + R2, R2 = dm(I4, plus1DM); PM FETCH of INSTR1 PM FETCH of INSTR2, DECODE of INSTR1 PM FETCH INSTR3, DECODE INSTR2, EXECUTE INSTR1 (DM) PM FETCH INSTR4, DECODE INSTR3, EXECUTE INSTR2 (DM) NO PROBLEM HERE 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Example of conflict issues 1 - R1 = 0, R3 = pm(I12, plus1PM); 2 - R1 = R1 + R3, R2 = pm(I12, plus1PM); 3 - R1 = R1 + R2, R2 = pm(I12, plus1PM); 4 - R1 = R1 + R2, R2 = pm(I12, plus1PM); PM FETCH of INSTR1 -- NO CACHE OCCURS PM FETCH of INSTR2, DECODE of INSTR1 -- NO CACHE OCCURS PM FETCH INSTR3, DECODE INSTR2, EXECUTE INSTR1 (PM) CONFLICT -- TRANSPARENT EXTRA CYCLE -- CACHE 3 PM FETCH INSTR4, DECODE INSTR3, EXECUTE INSTR2 (PM) CONFLICT -- TRANSPARENT EXTRA CYCLE -- CACHE 4 TRANSPARENT -- Programmer does not need to add NOP (cf intel 860) 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Example of conflict issues -in loop 1 - R1 = 0, R3 = pm(I12, plus1PM); 2 - R1 = R1 + R3, R2 = pm(I12, plus1PM); 3 - R1 = R1 + R2, R2 = pm(I12, plus1PM); 3 - R1 = R1 + R2, R2 = pm(I12, plus1PM); (IN LOOP) PM FETCH of INSTR1 -- NO CACHE OCCURS PM FETCH of INSTR2, DECODE of INSTR1-- NO CACHE OCCURS PM FETCH INSTR3, DECODE INSTR2, EXECUTE INSTR1 (PM) CONFLICT -- TRANSPARENT EXTRA CYCLE -- CACHE 3 CACHED INSTR3B, DECODE INSTR3, EXECUTE INSTR2 (PM) NO CONFLICT 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Reference Material SHARC manual You might want to look at www.techonline.com Also look at SHARC NAVIGATOR ONLINE! 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca
Tackled today Parts of the SHARC program sequencer Pipelining issues Similarity to “old” micro-sequencers used when design custom byte-slice array processors back in early 80’s Pipelining issues Resource conflict between instructions Delayed branches -- nops or instructions to find Loop, restrictions and “short loops”, counter and non-counter based loops interrupt concepts -- see later lecture Instruction Cache -- Opens up a third bus when instruction fetch of Instruction N+2 conflicts with executing of Instruction N 11/24/2018 ENCM515 -- Program flow issues on SHARC ADSP21061 Copyright smithmr@ucalgary.ca