This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. Comparing 68k (CISC) with 21k (Superscalar RISC DSP) M. R. Smith, Electrical and Computer Engineering University of Calgary, Alberta, Canada
6/20/2015 ENCM Compare 68k and 21k Copyright 2 / 37 To be tackled today When to use assembly code Useful sub-set of 68K CISC instructions Recap Effective addressing modes Load/Store Programming style for 68K Load/Store Architecture of 21K by comparison with 68K
6/20/2015 ENCM Compare 68k and 21k Copyright 3 / 37 “Reminder” Reuse the following ENCM415 concepts Don’t use “Assembly Code” unless “really have” to Write in “C/C++” whenever appropriate Connect to the hardware “in assembler” using instructions that always work -- RISC-like (MIPS) Understand linkages between “assembly” and “C” Customize “C” only when necessary ENCM515 Basic requirement for “Custom DSP” code -- need to know features of processor Recognize that speed comes from instructions that work only under special conditions because of processor architectural constraints -- opcode size, bus availability
6/20/2015 ENCM Compare 68k and 21k Copyright 4 / 37 Very limited set of instructions used in Assembly Code most of the time Operational Instructions MOVE ADD, SUB (FADD, FSUB) AND, OR Program Flow BRA, JMP, JSR, RTS, TRAP CMP, BNE, BEQ BHI, HLO, BLS (unsigned branches) BGE, BLT, BGT (signed branches)
6/20/2015 ENCM Compare 68k and 21k Copyright 5 / 37 Easiest way to program 68K in assembly Have a PSP process to avoid the stupid mistakes that stop you getting to the stuff that is worth doing Never bother with the complex EA-mode instructions Don’t gain much any way Program CISC as if had “LOAD/STORE” architecture like the MIPS processor MOVE memory to register (LOAD) MOVE register to memory(STORE) OPERATE register on register -- Memory access in FETCH only Plus a few other non-RISC instructions that you find very useful to use (e.g. ADD.L #5, D0) Customize for speed later -- if it is worth the effort EASIER TO CUSTOMIZE when in this “simple” mode
6/20/2015 ENCM Compare 68k and 21k Copyright 6 / 37 Easiest way to program 21k in assembly Have a PSP process to avoid the stupid mistakes that stop you getting to the stuff that is worth doing Never bother with the complex EA-mode instructions Don’t gain much any way Program Superscalar RISC DSP which has “LOAD/STORE” architecture like the MIPS processor PLUS DSP-special MOVE memory to register (LOAD) MOVE register to memory(STORE) OPERATE register on register Plus a few other non-RISC instructions that you find very useful to use (e.g. ADD.L #5, D0) Customize for speed later -- if it is worth the effort
6/20/2015 ENCM Compare 68k and 21k Copyright 7 / 37 Some of the effective address modes for 68k MOVE Register to Register -- RISC like MOVE.L D1, D0 [D0] <- [D1] (31:0) Immediate to Register -- RISC like MOVE.L #0x5000, D1 [D1] <- 0x5000 (31:0) Memory to Register -- RISC like MOVE.L 0x5000, D1 [D1] <- [M(0x5000)] (31:0) Memory to Memory -- CISC MOVE.L 0x5000, 0x6000 [M(0x6000)] <- [M(0x5000)] (31:0) 21k equivalent R0 = R1; 21k equivalent R1 = 0x5000; 21k equivalent R1 = dm(0x5000); 21k equivalent 4 animations
6/20/2015 ENCM Compare 68k and 21k Copyright 8 / 37 Look behind the instruction at the architecture 68k --- MOVE.L D0, D0 Involves fetching the instruction (4 cycles) and then everything else is done with out extra (slow) memory operations 21k --- R0 = R1 Involves fetching the instruction (1 cycle) and then everything else is done with out extra memory operations. Pipelining issue
6/20/2015 ENCM Compare 68k and 21k Copyright 9 / 37 Look behind the instruction at the architecture 68k --- MOVE.L #0x5000, D0 Involves fetching the instruction, (4 cycles) then fetching the hi (4 cycles) and low (4 cycles) components of the constant stored in program space and then everything else is done with out extra memory operations -- Really MOVE.L #0x , D0 21k --- R0 = 0x5000 Involves fetching the instruction (1 cycle) and then everything else is done with out extra memory operations. More like MOVEQ.L #0x5000, D0 where constant is built into the op-code
6/20/2015 ENCM Compare 68k and 21k Copyright 10 / 37 Look behind the instruction at the architecture 68k --- MOVE.L 0x5000, D0 Involves fetching the instruction (4), then fetching the hi (4) and low (4) components of the constant stored in program space, then fetching the hi (4) and low (4) values from adjacent addresses in data space and then everything else is done with out extra memory operations. Again really MOVE.L 0x ,D0 21k --- R0 = dm(0x5000) Involves fetching the instruction (1) and then later fetching the value from data memory space (1). More like MOVE.L (Address_temp), D0 with the address register being preloaded during the instruction fetch.
6/20/2015 ENCM Compare 68k and 21k Copyright 11 / 37 Some of the effective address modes for ADD Register to Register -- RISC like ADD.L D1, D0 [D0] <- [D0] + [D0] Immediate to Register -- CISC ADD.L #0x5000, D1 [D1] <- [D1] + 0x5000 Memory to Register -- CISC ADD.L 0x5000, D1 [D1] <- [D1] + [M(0x5000)] Memory to Memory -- CISC ADD.L 0x5000, 0x illegal on 68K [M(0x6000)] <- [M(0x6000)] + [M(0x5000)] 2 animations 21k equivalent R0 = R0 + R1; 21k equivalent 21k illegal too
6/20/2015 ENCM Compare 68k and 21k Copyright 12 / 37 Look behind the instruction at the architecture 68k --- ADD.L #0x5000, D0 Involves fetching the instruction (4), then fetching the hi (4) and low (4) components of the constant stored in program space and then doing addition during “execution” phase. On the 68k the 32-bit add takes extra cycles. 21k --- R1 = 0x5000; R0 = R1 + R0; Involves fetching the two instructions and then everything else is done with out extra memory operations. More like MOVEQ.L #0x5000, D0
6/20/2015 ENCM Compare 68k and 21k Copyright 13 / 37 Basic LOAD/STORE operations LOAD -- Memory to register [Reg] <- [Memory(address)] MOVE.L 0x5000, D1R1 = dm(0x5000); [D1] <- [Memory(0x5000)]R1 = pm(0x5000); F1 = dm(0x5000); STORE -- Register to Memory [Memory(address)] <- [Reg] MOVE.L D1, 0x5000dm(0x5000) = R1; [Memory(0x5000)] <- [D1]pm(0x5000) = R1; CAREFULL!!!! 21k -- NOT QUITE 2 memory busses
6/20/2015 ENCM Compare 68k and 21k Copyright 14 / 37 Basic LOAD/STORE operations LOAD register with a constant [Reg] <- constant value MOVE.L #0x5000, D1 R1 = 0x5000; [D1] <- 0x5000 CAREFULL!!!! 21k -- NOT QUITE Can’t always make parallel
6/20/2015 ENCM Compare 68k and 21k Copyright 15 / 37 Basic Register-to Register operations LOAD -- Register to register [Reg] <- [Reg2] MOVE.L D1, D0R0 = R1; [D0] <- [D1]Sometimes R0 = pass R1; is better Operation -- Register to register [Reg] <- [Reg] Operation [Reg2] ADD.L D1, D0 R0 = R1 + R2; [D0] <- [D0] + [D1]is also possible on 21k CAREFULL!!!! 21k -- NOT QUITE especially when parallel
6/20/2015 ENCM Compare 68k and 21k Copyright 16 / 37 Basic 68k Register-to Register operations Operation -- Register to register [Reg] <- [Reg] Operation [Reg2] ADD.L D0, D1 [D1] <- [D1] + [D0] SUB.L D0, D1 [D1] <- [D1] - [D0] AND.L D0, D1 [D1] <- [D1] & [D0] OR.L D0, D1 [D1] <- [D1] | [D0] CMP.L D0, D1 [BB] <- [D1] - [D0] ASR #3, D0[D0] > 3 (signed) LSR #3, D0[D0] > 3 (unsigned)
6/20/2015 ENCM Compare 68k and 21k Copyright 17 / 37 Basic 21k Register-to Register operations Operation -- Register to register [Reg] <- [Reg1] Operation [Reg2] [D1] <- [D2] + [D0] [D1] <- [D1] - [D2] [D1] <- [D1] & [D2] [D1] <- [D1] | [D2] Compare [D0] > 3 (signed) [D0] > 3 (unsigned) YOU COMPLETE THE 21k Instructions
6/20/2015 ENCM Compare 68k and 21k Copyright 18 / 37 Basic Indirect Addressing Operations to Memory LOAD INDIRECT [Reg] <- [Memory([AddressReg2])] R1 = dm(0, I4); MOVE.L (A0), D0R1 = dm(I4, 0) ; [D0] <- [Memory([A0])]R1 = pm(I12, 0); R1 = dm(I12, 0); NO! LOAD INDIRECT with CONSTANT offset [Reg] <- [Memory([AddressReg2 + offset])] MOVE.L (8, A0), D0R1 = dm(2, I4); [D0] <- [Memory([A0] + 8)] R1 = dm(I4, 2) ; NO! Same with store operations CAREFULL!!!! 21k -- NOT QUITE
6/20/2015 ENCM Compare 68k and 21k Copyright 19 / 37 Indirect Addressing Operations to Memory LOAD INDIRECT with Register offset [Reg] <- [Memory([AddressReg2 + offset])] D1 used as loop counter R0 = dm(R1, I4); NO!! MOVE.L (A0,D1), D0 M4 = R1; R0 = dm(M4, I4); [D0] <- [Memory([A0] + [D1])] R1 = dm(I4, M4); NO!! R1 = pm(M12, I12); R1 = pm(M4, I12); NO!! Same with store operations LOAD INDIRECT with Register + constant offset [Reg] <- [Memory([AddressReg2 + offset1 + offset 2])] MOVE.L (8, A0, D1), D0 NO!!, multiple 21k [D0] <- [Memory([A0] + [D1] + 8)] CAREFULL!!!! 21k -- NOT QUITE
6/20/2015 ENCM Compare 68k and 21k Copyright 20 / 37 MOVE.L (8,A0,D1),D0 Fetch the MOVE instruction (4 cycles) Fetch the Value 8 (4 cycles) Move A0 to ALU then add D1 (loop variable) Move result of ALU to ALU then add 8 (structure offset) Move result to address register -- fetch memory value and store in high part of D0 (4 cycles) Move result of ALU and add 2 (next address) (?) Move result to address register -- fetch (4 cycles) memory value and store in low part of D0 Note A0 and D1 must remain unchanged
6/20/2015 ENCM Compare 68k and 21k Copyright 21 / 37 MOVE.L (8,A0,D1),D k style A0 -> I4, D1 -> R1, D0 -> R0 Fetch the MOVE instruction (4 cycles) Fetch the Value 8 (4 cycles)R2 = 8; Move A0 to ALU then add D1 R2 = R1 + R2; Move result of ALU to ALU then add 8 M4 = R2; Move result to address register -- fetch memory value and store in high part of D0 Move result of ALU and add 2 (next address) Move result to address register -- fetch memory value and store in low part of D0 R0 = dm(M4, I4) If using 21k hardware loop, how do you access the loop counter with minimum overhead?
6/20/2015 ENCM Compare 68k and 21k Copyright 22 / 37 Indirect Addressing Operations to Memory LOAD INDIRECT with register post-increment [Reg] <- [Memory([AddressReg2])] [AddressReg2] <- [AddressReg2] + 4 MOVE.L (A0)+, D0 R0 = dm(I4, 1); [D0] <- [Memory([A0])] ; [A0] <- [A0] + 4 LOAD INDIRECT with register pre-decrement [AddressReg2] <- [AddressReg2] - 4 [Reg] <- [Memory([AddressReg2])] Modify (I4, -1); MOVE.L -(A0), D0R0 = dm(0, I4); [A0] <- [A0] - 4 ; [D0] <- [Memory([A0])]
6/20/2015 ENCM Compare 68k and 21k Copyright 23 / 37 21k processor is DSP Digital Signal Processing Processor Customized for DSP In real life, programmer must be really close to the architecture if want speed However most of the time, treat like a version of the 68K
6/20/2015 ENCM Compare 68k and 21k Copyright 24 / 37 Compare MOVE on 29K and 68K Register to Register R1 = R0 MOVE.L D0, D1 Immediate to Register R0 = 0x5000 MOVE.L #0x5000, D0 Memory to Register R0 = dm(0x5000) MOVE.L 0x5000, D0 R0 = pm(0x5000) --- No equivalent --- Memory to Memory -- No equivalent -- MOVE.L 0x5000, 0x6000 R0 = dm(0x5000); dm(0x6000) = R0;
6/20/2015 ENCM Compare 68k and 21k Copyright 25 / 37 Comparing ADD operations Register to Register Add R1 = R1 + R0ADD.L D0, D1 Immediate to Register Add -- No equivalent -- ADD.L #0x5000, D0 R1 = 0x5000; R0 = R1 + R0; Memory to Register Add -- No equivalent -- ADD.L 0x5000, D0 What are the equivalent? Memory to Memory Not available on EITHER processor What are the equivalents
6/20/2015 ENCM Compare 68k and 21k Copyright 26 / 37 Easiest way to program 21K assembly Can’t bother with the complex instructions DSP has “LOAD/STORE” architecture like the MIPS processor MOVE memory to register (LOAD) MOVE register to memory(STORE) OPERATE register on register There are not any other type of instructions Customize for speed later using hardware Develop a process to avoid the standard simple errors so that you can get to the stuff that is important. Most of you will not bother to use the process for 5 minutes in order to avoid wasting 1 hour of time
6/20/2015 ENCM Compare 68k and 21k Copyright 27 / 37 Basic LOAD/STORE operations LOAD -- Memory to register [Reg] <- [Memory(address)] R0 = dm(0x5000) MOVE.L 0x5000, D0 STORE -- Register to Memory [Memory(address)] <- [Reg] pm(0x5000) = R0 no 68k equivalent for pm
6/20/2015 ENCM Compare 68k and 21k Copyright 28 / 37 Basic LOAD/STORE operations LOAD register with a constant [Reg] <- constant value R0 = 0x5000MOVE.L #0x5000, D0
6/20/2015 ENCM Compare 68k and 21k Copyright 29 / 37 Basic Register-to Register operations LOAD -- Register to register [Reg] <- [Reg2] R0 = R1MOVE.L D1, D0 Operation -- Register to register [Reg] <- [Reg] Operation [Reg2] R1 = R1 + R0ADD.L D0, D1 R1 = R2 + R3-- no equivalent --
6/20/2015 ENCM Compare 68k and 21k Copyright 30 / 37 Basic Register-to Register operations Operation -- Register to register [Reg] <- [Reg] Operation [Reg2] R1 = R1 + R0ADD.L D0, D1 R1 = R1 - R0SUB.L D0, D1 R1 = R1 AND R0AND.L D0, D1 R1 = R1 OR R0OR.L D0, D1 -- many alternatives --CMP.L D0, D1
6/20/2015 ENCM Compare 68k and 21k Copyright 31 / 37 Basic Indirect Addressing Operations to Memory LOAD INDIRECT [Reg] <- [Memory([AddressReg2])] R0 = dm(I0)MOVE.L (A0), D0 LOAD INDIRECT with CONSTANT offset [Reg] <- [Memory([AddressReg2 + offset])] R0 = dm (2, I4 )MOVE.L (8, A0), D0 but R0 = pm (2, I12 )-- No need for distinction -- Special DAGS for custom data and program memory ops
6/20/2015 ENCM Compare 68k and 21k Copyright 32 / 37 Indirect Addressing Operations to Memory LOAD INDIRECT with Register offset [Reg] <- [Memory([AddressReg2 + offset])] R0 = dm(M4, I4)MOVE.L (A0,D1), D0 Order is absolutely key -- dm(I4, M4) means something VERY different Same with store operations LOAD INDIRECT with Register + constant offset [Reg] <- [Memory([AddressReg2 + offset1 + offset 2])] -- NO Equivalent -- MOVE.L (8, A0, D1), D0 but wait till Lab. 2, 3 and 4 for some REALLY fancy SHARC addressing modes
6/20/2015 ENCM Compare 68k and 21k Copyright 33 / 37 Indirect Addressing Operations to Memory LOAD INDIRECT with register post-increment [Reg] <- [Memory([AddressReg2])] [AddressReg2] <- [AddressReg2] + 4 R0 = dm(I4, M6)MOVE.L (A0)+, D0 (with M6 preset to 1) R0 = dm(I4, 1) -- An instruction that is only useful on a Monday/Weds and our labs are on Friday and exams on Tues! LOAD INDIRECT with register pre-decrement R0 = dm(I4, M7)MOVE.L -(A0), D0 (with M7 preset to -1) R0 = dm(I4, -1) -- Only useful on a Monday/Weds R0 = dm(I4, M15) illegal but R0 = pm(I12, M15) is OKAY
6/20/2015 ENCM Compare 68k and 21k Copyright 34 / 37 You complete, without next slide // long int value = 6; // Memory[2000] = value; // Memory[3000] = 7; // long int pt = &Memory[4000]; //*pt = value; //*pt = 9; // *pt++ = value + 1; //*pt-- = value + 2;
6/20/2015 ENCM Compare 68k and 21k Copyright 35 / 37 Fix RISC architecture and speed Issues #define valueR1 R1 valueR1 = 6;// long int value = 6; dm(2000) = value;// Memory[2000] = value; #define tempR0 R0 tempR0 = 7; dm(2000) = tempR0;// Memory[3000] = 7; #define ptI4 I4// long int pt = &Memory[4000]; ptI4 = 4000; dm(ptI4) = value;//*pt = value; tempR0 = 9;//*pt = 9; dm(ptI4, M5) = tempR0;// M5 preset to 0 by C start-up procedure #define tempR2 R2 tempR2 = valueR1 + 1;// *pt++ = value + 1; dm(ptI4, M6) = tempR2;// M6 preset to +1 by C startup procedure tempR0 = 2;//*pt-- = value + 2; tempR2 = tempR1 + tempR0; dm(pt4, M7) = tempR2// M7 preset to -1 by C startup procedure
6/20/2015 ENCM Compare 68k and 21k Copyright 36 / 37 NON-NEGIOTABLE NON-NEGIOTABLE -- means that is the way the processor is designed and you can’t fight it NON-NEGIOTABLE -- means that if you don’t do it this way you will waste a lot of time in the labs on the simple stuff -- and lose many marks in quizzes NON-NEGIOTABLE -- means that this is fixed, standard, life. Develop a simple PSP process to review code to make sure this stuff is not there and you can get onto the interesting stuff. CONTRACT -- The moment the class stops making 80% of these simple errors, I will stop taking most marks off in the quizzes for the simple stuff.
6/20/2015 ENCM Compare 68k and 21k Copyright 37 / 37 Tackled today When to use assembly code Useful sub-set of 68K CISC instructions Recap Effective addressing modes Load/Store Programming style for 68K Load/Store Architecture of 21K by comparison with 68K 21K architecture is customized for DSP