Download presentation
Presentation is loading. Please wait.
Published byΦιλοκράτης Κοντόσταυλος Modified over 6 years ago
1
Program Flow on ADSP2106X SHARC Pipeline issues
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. Program Flow on ADSP2106X SHARC Pipeline issues M. R. Smith, Electrical and Computer Engineering University of Calgary, Alberta, Canada ucalgary.ca
2
To be tackled today Parts of the SHARC program sequencer
Similarity to “old” micro-sequencers used when design custom byte-slice array processors back in early 80’s Pipelining issues Resource conflict between instructions Delayed branches -- nops or instructions to find Loop, restrictions and “short loops”, counter and non-counter based loops interrupt concepts -- see later lecture Instruction Cache 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
3
SHARC OVERVIEW Linear Flow
Loops -- one sequence of instructions executed several times with no overhead Subroutines -- temporary interruption of sequential flow with associated return Jumps -- permanently transfers flow Interrupts -- special case of “subroutines” triggered by an event at run time Idle -- cease current operation, hold state till interrupt -- low power mode? -- event driven programs? 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
4
Functionality of the Sequencer
More sophisticated than in the CPU in 68K 21061 Sequencer Selects address of next instruction -- pre-calculated addresses earlier Generate many address choices and also Incrementing the fetch address maintaining small hardware stacks evaluating whether to do conditional operations decrementing the hardware loop counter calculating new address -- circular buffer, bit reverse handling interrupts 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
5
21K Program Sequencer 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
6
Nothing new in this world
I could just as easily hand round “CUSTOM” microcoded DSP array processor using AMD2911 instruction sequencer from ENEL515 notes from 1981. 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
7
Generic “microcoded CPU”
11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
8
AMD2911 Microsequencer chip/library
‘micro’PC can be kept the same “single cycle” loops or loaded from “hardware increment of NEXT ADDRESS” “Next address” is not from microPC but “selection” Mux 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
9
AMD2911 Microsequencer chip/library
Entry path for JUMP address or counter value (cf 21k LCNTR) Hardware counter -- cf. 21k LCE or “Remember the bottom of loop” Hardware PC stack with hardware SP Used for subroutines and “remembering” the first address in a hardware controlled loop. Since limited hardware size then stack overflow must be allowed to activate exception handler. 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
10
AMD2911 Microsequencer chip/library
Sequencer control logic and associated control signals Multiplexer -- allows “immediate selection” of a variety of pre-calculated “next addresses” 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
11
AMD2911 -- Handwired (ENEL515 -- 1981)
ADSP21061 ENCM AMD Handwired (ENEL ) 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
12
For more information see p 3-5
Program sequencer registers FADDR (fetch), DADDR (decode) PC (execute) PCSTK, PCSTKP (PC stack control) LADDR, CURLCNTR, LCNTR (loop) Also many related system registers including user defined status flags Means loop till user says not to 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
13
21k instruction cycle -- 3 phases
In the fetch cycle, reads instruction from either on-chip instruction cache or program memory During the decode, the instruction is decoded, generating conditions that control execution. Not the same as 68k decode phase In the execute cycle, executes the instructions and the operations specified by the instruction are completed. Essentially the 68k decode, execute and writeback phases at the same time 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
14
Instruction Cycles are overlapped/pipelined
EXECUTE DECODE FETCH n n+1 n+2 n+3 n+4 n n+1 n+2 n+3 n+4 Cycle # 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
15
For Interesting implications
Since the instruction stream is pipelined then need “3 PC” in order to be able to restart an interrupted program sequence. Implies that interrupts have considerable overhead -- getting in, getting out and restarting old sequence. Several “PC” -- Program sequencer registers FADDR (fetch), DADDR (decode) PC (execute) 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
16
Conditional Branches MOST INSTRUCTIONS CONDITIONAL
TRUE, FALSE NOTE: WHOLE OF INSTRUCTION is conditional, NOT PARTS 21K condition codes with 68K equivalent EQ, LT, LE AC (unsigned carry), AV (signed overflow) ‘Special’ 21K condition codes -- parallel operations Multiplier (MV, MS overflows) and Shifter (SV, SZ overflows) Flags (FLAG0_IN, 1, 2, 3 -- specialized “build-in” I/O lines) BM (Bus master) LCE (Loop counter expired) 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
17
Implications -- JUMP (PC, NEXT) ; non-delayed branch -- no (DB)
if Call(PC, NEXT); then address n+1 pushed to PC hardware stack as instruction j is Fetched) CondBr n nop nop j EXECUTE DECODE FETCH CondBr n n+1 ->nop n+2 ->nop j j+1 CondBr n n+1 n+2 j j+1 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
18
Implications -- JUMP (PC, NEXT) (DB); Delayed branch
if Call(PC, NEXT) (DB); then address n+ 3 pushed to PC hardware stack as instruction j is Fetched) CondBr n n+1 n+1 j EXECUTE DECODE FETCH CondBr n n+1 n+2 j j+1 CondBr n n+1 n+2 j j+1 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
19
Key Issue All Jumps are conditional
Condition is assumed as TRUE if not the condition is not specified -- ie default is TRUE Conditional JUMP Transparent stalls after the instruction Conditional JUMP (DB) Always executes the two instructions after the jump whether conditional jump is taken or not 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
20
Loops (DO UNTIL) LCNTR = 30, DO ENDLOOP - 1 UNTIL LCE
TERMINATION LCNTR = 30, DO ENDLOOP - 1 UNTIL LCE R0 = dm(I0, M0), F2 = PM(I8, M8) R1 = R0 - R14 F4 = F2 + F3 ENDLOOP: During DO UNTIL push last address and termination condition onto LOOP STACK pushes top of loop address onto PC stack Maximum of 6 entries in LOOP STACK TOP OF LOOP LAST ADDRESS 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
21
Flow during the hardware “Loop Back” portion Note that “beginning of loop” is pre-fetched with no stalls in the instruction fetches E-2 E-1 E B EXECUTE DECODE FETCH E-2 E-1 E B B+1 E-2 E-1 E B B+1 END OF LOOP BEGIN OF LOOP 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
22
Flow during the hardware loop termination Note that “beginning of loop” is NOT pre-fetched but “instructions outside the loop” are fetched with no stalls. E-2 E-1 E E+1 EXECUTE DECODE FETCH E-2 E-1 E E+1 E+2 E-2 E-1 E E+1 E+2 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
23
Implications Maximum depth of loops and subroutine calls
Implications on running “C” code Condition is tested “two cycles” before the end of the instruction Means loop counter (if used) is decremented “two cycles” before the end of the loop -- What if loop of 1 cycle? May have to “unwind” some instruction fetches/decodes Nested loops can’t end of same instruction Last three instructions of a loop can’t be branch, jump, call or return Exception -- Non-delayed call where subroutine uses RTS (LR) instruction (loop return) 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
24
Non-counter based loops
Not using LCE (counter = 0) as the loop exit condition If this is the outer loop of nested loops then end address of loop must be at least 2 addresses after last address of inner loop Pipeline and loops 3 instruction loop -- tested at top of loop, COMPLETES loop before exitting -- do-while not while 2 instruction loop -- if condition already true, complete this and next pass -- even if not wanted? 1 instruction loop -- completes three more cycles though loop -- even if not wanted? 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
25
That looks like a midterm/final exam question if I ever saw one
That looks like a midterm/final exam question if I ever saw one. Implement and discuss the operation of exitting from a hardware loop based on whether FLAG1 is set or not n n+1 n+2 n+3 EXECUTE DECODE FETCH n n+1 n+2 n+3 n+4 n n+1 n+2 n+3 n+4 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
26
Interrupt issues Plan to tackle in Lab.4
Latency Interrupt Vector Table ISR Trampoline code Instruction completion Multiple interrupts Timer and IDLE, IDLE16 For more details see -- Manual 3-21 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
27
Cache issues Key issues See Manual 3-38
Fetching of instruction from PM can conflict with fetching of data from PM from another instruction Instruction not cached until used once therefore data/instruction conflict possible on first time round a loop Cache is limited in size instructions See Manual 3-38 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
28
Example of conflict issues
1 - R1 = 0, R3 = dm(I4, plus1DM); 2 - R1 = R1 + R3, R2 = dm(I4, plus1DM); 3 - R1 = R1 + R2, R2 = dm(I4, plus1DM); 4 - R1 = R1 + R2, R2 = dm(I4, plus1DM); PM FETCH of INSTR1 PM FETCH of INSTR2, DECODE of INSTR1 PM FETCH INSTR3, DECODE INSTR2, EXECUTE INSTR1 (DM) PM FETCH INSTR4, DECODE INSTR3, EXECUTE INSTR2 (DM) NO PROBLEM HERE 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
29
Example of conflict issues
1 - R1 = 0, R3 = pm(I12, plus1PM); 2 - R1 = R1 + R3, R2 = pm(I12, plus1PM); 3 - R1 = R1 + R2, R2 = pm(I12, plus1PM); 4 - R1 = R1 + R2, R2 = pm(I12, plus1PM); PM FETCH of INSTR1 -- NO CACHE OCCURS PM FETCH of INSTR2, DECODE of INSTR1 -- NO CACHE OCCURS PM FETCH INSTR3, DECODE INSTR2, EXECUTE INSTR1 (PM) CONFLICT -- TRANSPARENT EXTRA CYCLE -- CACHE 3 PM FETCH INSTR4, DECODE INSTR3, EXECUTE INSTR2 (PM) CONFLICT -- TRANSPARENT EXTRA CYCLE -- CACHE 4 TRANSPARENT -- Programmer does not need to add NOP (cf intel 860) 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
30
Example of conflict issues -in loop
1 - R1 = 0, R3 = pm(I12, plus1PM); 2 - R1 = R1 + R3, R2 = pm(I12, plus1PM); 3 - R1 = R1 + R2, R2 = pm(I12, plus1PM); 3 - R1 = R1 + R2, R2 = pm(I12, plus1PM); (IN LOOP) PM FETCH of INSTR1 -- NO CACHE OCCURS PM FETCH of INSTR2, DECODE of INSTR1-- NO CACHE OCCURS PM FETCH INSTR3, DECODE INSTR2, EXECUTE INSTR1 (PM) CONFLICT -- TRANSPARENT EXTRA CYCLE -- CACHE 3 CACHED INSTR3B, DECODE INSTR3, EXECUTE INSTR2 (PM) NO CONFLICT 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
31
Reference Material SHARC manual
You might want to look at Also look at SHARC NAVIGATOR ONLINE! 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
32
Tackled today Parts of the SHARC program sequencer Pipelining issues
Similarity to “old” micro-sequencers used when design custom byte-slice array processors back in early 80’s Pipelining issues Resource conflict between instructions Delayed branches -- nops or instructions to find Loop, restrictions and “short loops”, counter and non-counter based loops interrupt concepts -- see later lecture Instruction Cache -- Opens up a third bus when instruction fetch of Instruction N+2 conflicts with executing of Instruction N 11/24/2018 ENCM Program flow issues on SHARC ADSP21061 Copyright
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.