嵌入式處理器架構與程式設計 王建民 中央研究院 資訊所 2008 年 7 月. 2 Contents Introduction Computer Architecture ARM Architecture Development Tools  GNU Development Tools ARM Instruction.

Slides:



Advertisements
Similar presentations
ARM versions ARM architecture has been extended over several versions.
Advertisements

Appendix D The ARM Processor
Wat gaan we doen? harhaling data types
1 ARM Movement Instructions u MOV Rd, ; updates N, Z, C Rd = u MVN Rd, ; Rd = 0xF..F EOR.
Chapter 2 Instruction Sets 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides)
Embedded System Design Center ARM7TDMI Microprocessor Data Processing Instructions Sai Kumar Devulapalli.
INSTRUCTION SET ARCHITECTURES
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Load and store instruction.
COMP3221 lec9-logical-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 9: C/Assembler Logical and Shift - I
COMP3221 lec-12-mem-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 12: Memory Access - II
ARM Microprocessor “MIPS for the Masses”.
Computer Organization and Architecture
Multiple data transfer instructions ARM also supports multiple loads and stores: ldm/ldmia/ldmfd: load multiple registers starting from [base register],
Topics covered: ARM Instruction Set Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Agenda Introduction Architecture Programmers Model Instruction Set
ARM Instructions I Prof. Taeweon Suh Computer Science Education Korea University.
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Thumb Instruction Set.
Lecture 18 Last Lecture Today’s Topic Instruction formats
Topic 8: Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Winter 2010 Prof. Ryan Kastner Dept. of Computer Science and.
Assembly Programming on the TI-89 Created By: Adrian Anderson Trevor Swanson.
Subroutines and Stacks 1. Subroutines Separate, independent module of program, performs a specific task shortens code, provide reusable “tools” High-level.
ARM Assembly Programming Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/19 with slides by Peng-Sheng Chen.
Topic 10: Instruction Representation CSE 30: Computer Organization and Systems Programming Winter 2011 Prof. Ryan Kastner Dept. of Computer Science and.
Lecture 2: Basic Instructions CS 2011 Fall 2014, Dr. Rozier.
Lecture 4. ARM Instructions #1 Prof. Taeweon Suh Computer Science Education Korea University ECM586 Special Topics in Embedded Systems.
Chapter 3-1 ARM ISA ARM Instruction Set Architecture ARM Instruction Set Architecture Next Lecture Next Lecture  ARM program examples.
Lecture 4. ARM Instructions Prof. Taeweon Suh Computer Science & Engineering Korea University COMP427 Embedded Systems.
Topic 7: Control Flow Instructions CSE 30: Computer Organization and Systems Programming Winter 2011 Prof. Ryan Kastner Dept. of Computer Science and Engineering.
ARM7TDMI Processor. 2 The ARM7TDMI processor is a member of the Advanced RISC machine family of general purpose 32-bit microprocessor What does mean ARM7TDMI.
The ARM Instruction Set - ARM University Program - V1.0 1 The ARM Instruction Set ARM Advanced RISC Machines.
11 Architecture Revisions time version ARMv5 ARMv V4 StrongARM ® ARM926EJ-S™ XScale TM ARM102xE ARM1026EJ-S™ ARM9x6E ARM92xT.
1 Chapter 4 ARM Assembly Language Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
Lecture 2: Advanced Instructions, Control, and Branching EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer.
Unit-2 Instruction Sets, CPUs
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 4: Load/Store Architectures CS 2011 Fall 2014, Dr. Rozier.
Lecture 8: Loading and Storing to Memory CS 2011 Fall 2014, Dr. Rozier.
Assembly Variables: Registers Unlike HLL like C or Java, assembly cannot use variables – Why not? Keep Hardware Simple Assembly Operands are registers.
1 TM T H E A R C H I T E C T U R E F O R T H E D I G I T A L W O R L D The ARM Architecture.
Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations –VAX architecture had an instruction.
Ch 5. ARM Instruction Set  Data Type: ARM processors supports six data types  8-bit signed and unsigned bytes  16-bit signed and unsigned half-words.
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
ARM Instruction Set Computer Organization and Assembly Languages Yung-Yu Chuang with slides by Peng-Sheng Chen.
Lecture 6: Decision and Control CS 2011 Spring 2016, Dr. Rozier.
1 TM T H E A R C H I T E C T U R E F O R T H E D I G I T A L W O R L D The ARM Architecture.
Multiple data transfer instructions ARM also supports multiple loads and stores: When the data to be copied to the stack is known to be a multiple of 4.
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
ARM Programming CMPE 450/490 ©2010 Elliott, Durdle, Minderman
Smruti Ranjan Sarangi, IIT Delhi Chapter 4 ARM Assembly Language
Main features of the ARM Instruction Set
ARM Assembly Language Programming
Chapter 15: Higher Level Constructs
ECE 3430 – Intro to Microcomputer Systems
Introduction to the ARM Instruction Set
ARM Registers Register – internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has.
ECE 3430 – Intro to Microcomputer Systems
The Cortex-M3/m4 Embedded Systems: Cortex-M3/M4 Instruction Sets
The ARM Instruction Set
ECM586 Special Topics in Embedded Systems Lecture 4. ARM Instructions
Topic 6: Bitwise Instructions
ARM Load/Store Instructions
Computer Organization and Assembly Languages Yung-Yu Chuang 2008/11/17
Branching instructions
ARM Introduction.
Overheads for Computers as Components 2nd ed.
Computer Architecture
Multiply Instructions
Introduction to Assembly Chapter 2
An Introduction to the ARM CORTEX M0+ Instructions
Presentation transcript:

嵌入式處理器架構與程式設計 王建民 中央研究院 資訊所 2008 年 7 月

2 Contents Introduction Computer Architecture ARM Architecture Development Tools  GNU Development Tools ARM Instruction Set ARM Assembly Language ARM Assembly Programming  GNU ARM ToolChain Interrupts and Monitor

Lecture 6 ARM Instruction Set

4 Outline Main Features Data Processing and Branch Instructions Data Transfer Instructions

5 Main Features 1 Fully 32-bit instruction set in native operating modes 32-bit long instruction word All instructions are conditional Normal execution with condition AL (always) Most instructions execute in a single cycle. For a RISC processor, the instruction set is quite diverse with different addressing modes 36 instruction formats

6 Main Features 2 A load/store architecture Data processing instructions act only on registers  Three operand format  Combined ALU and shifter for high speed bit manipulation Specific memory access instructions with powerful auto-indexing addressing modes.  32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4.  Flexible multiple register load and store instructions Instruction set extension via coprocessors

7 ARM Instruction Set Format cond IopcodeSRnRdoperand2 cond0 0 0 SRdRnRsARm cond SRdHiRdLoRsARm U cond RnRd0 0 0Rm B cond0 1LRnRdoffsetWBIUP cond1 0 0LRnRegister listWSUP cond0 0 0LRnW1UPRdoffset1offset21 S H 1 cond0 0 0LRnW0UPRd0 0 Rm1 S H 1 cond1 0 1offsetL cond Rn cond1 1 0LRnWNUPCRdCPNumoffset cond CRnop1CRdCPNumCRmop20 cond CRnop1RdCPNumCRmop21L cond1 1 SWI number data processing multiply long multiply swap load/store halfword transfer branch halfword transfer branch exchange coprocessor software interrupt

Conditional Execution 1 Most instruction sets only allow branches to be executed conditionally. However by reusing the condition evaluation hardware, ARM effectively increases number of instructions. All instructions contain a condition field which determines whether the CPU will execute them. Non-executed instructions soak up 1 cycle.  Still have to complete cycle so as to allow fetching and decoding of following instructions.

Conditional Execution 2 This removes the need for many branches, which stall the pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The time penalty of not executing several conditional instructions is frequently less than overhead of the branch or subroutine call that would otherwise be needed.CMP r3,#0 BEQ skipADDNE r0,r1,r2 ADD r0,r1,r2 Skip:

10 Conditional Execution and Flags By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “ S ”. CMP does not need “ S ”. Loop: … SUBS r1,r1,#1 BNE loop decrement r1 and set flags if Z flag clear then branch

11 The Condition Field Cond 0000 = EQ - Z set (equal) 0001 = NE - Z clear (not equal) 0010 = HS / CS - C set (unsigned higher or same) 0011 = LO / CC - C clear (unsigned lower) 0100 = MI -N set (negative) 0101 = PL - N clear (positive or zero) 0110 = VS - V set (overflow) 0111 = VC - V clear (no overflow) 1000 = HI - C set and Z clear (unsigned higher) 1001 = LS - C clear or Z set (unsigned lower or same) 1010 = GE - N set and V set, or N clear and V clear (>or =) 1011 = LT - N set and V clear, or N clear and V set (<) 1100 = GT - Z clear, and either N set and V set, or N clear and V set (>) 1101 = LE - Z set, or N set and V clear,or N clear and V set (<, or =) 1110 = AL - always 1111 = NV - reserved.

12 Condition Codes AL is the default and does not need to be specified Not equal Unsigned higher or same Unsigned lower Minus Equal Overflow No overflow Unsigned higher Unsigned lower or same Positive or Zero Less than Greater than Less than or equal Always Greater or equal EQ NE CS/HS CC/LO PL VS HI LS GE LT GT LE AL MI VC SuffixDescription Z=0 C=1 C=0 Z=1 Flags tested N=1 N=0 V=1 V=0 C=1 & Z=0 C=0 or Z=1 N=V N!=V Z=0 & N=V Z=1 or N=!V

13 Examples of Conditional Execution 1 Use a sequence of several conditional instructions if (a==0) func(1); CMP r0,#0 MOVEQ r0,#1 BLEQ func Set the flags, then use various condition codes if (a==0) x=0; if (a>0) x=1; CMP r0,#0 MOVEQ r1,#0 MOVGT r1,#1

14 Examples of Conditional Execution 2 Use conditional compare instructions if (a==4 || a==10) x=0; CMP r0,#4 CMPNE r0,#10 MOVEQ r1,#0

15 Outline Main Features Data Processing and Branch Instructions Data Transfer Instructions

16 Branch Instructions 1 Branch: B{ } label Branch with Link : BL{ } subroutine_label The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to the PC ± 32 Mbyte range How to perform longer branches? Cond L Offset Condition field Link bit 0 = Branch 1 = Branch with link

17 Branch Instructions 2 The "Branch with link" instruction implements a subroutine call by writing PC-4 into the LR of the current bank. i.e. the address of the next instruction following the branch with link (allowing for the pipeline). To return from subroutine, simply need to restore the PC from the LR: MOV pc, lr Again, pipeline has to refill before execution continues. The "Branch" instruction does not affect LR.

18 Data Processing Instructions Consist of : Arithmetic: ADD ADC SUB SBC RSB RSC Logical: AND ORR EOR BIC Comparisons: CMP CMN TST TEQ Data movement: MOV MVN These instructions only work on registers, NOT memory. Syntax: { }{S} Rd, Rn, Operand2 Comparisons set flags only - they do not specify Rd Data movement does not specify Rn Second operand is sent to the ALU via barrel shifter.

19 Arithmetic Operations Operations are: ADDRd = operand1 + operand2 ADCRd = operand1 + operand2 + carry SUBRd = operand1 - operand2 SBCRd = operand1 - operand2 + carry -1 RSBRd = operand2 - operand1 RSCRd = operand2 - operand1 + carry - 1 Examples ADD r0, r1, r2 SUBGT r3, r3, #1 RSBLES r4, r5, #5

20 Logical Operations Operations are: ANDRd = operand1 & operand2 EORRd = operand1 ^ operand2 ORRRd = operand1 | operand2 BICRd = operand1 & NOT operand2 [ie bit clear] Examples: ANDr0, r1, r2 BICEQr2, r3, #7 EORSr1, r3, r0

21 Comparisons The only effect of the comparisons is to update the condition flags. No need to set S bit. No need to specify Rd. Operations are: CMPoperand1 - operand2, but result not written CMNoperand1 + operand2, but result not written TSToperand1 & operand2, but result not written TEQoperand1 ^ operand2, but result not written Examples: CMPr0, r1 TSTEQr2, #5

22 Data Movement Operations are: MOVRd = operand2 MVNRd = NOT operand2 Note that these make no use of operand1. Examples: MOV r0, r1 MOVS r2, #10 MVNEQ r1, #0

23 Quiz #2 Convert the GCD algorithm given in this flowchart into 1)“Normal” assembly, where only branches can be conditional. 2)ARM assembly, where all instructions are conditional, thus improving code density. The only instructions you need are CMP, B and SUB Start Stop r0 = r1 ? r0 > r1 ? r0 = r0 - r1r1 = r1 - r0 Yes NoYes No

24 The Barrel Shifter Destination CF 0 Destination CF LSL : Logical Left Shift ASR: Arithmetic Right Shift Multiplication by a power of 2 Division by a power of 2, preserving the sign bit Destination CF...0 Destination CF LSR : Logical Shift Right ROR: Rotate Right Division by a power of 2 Bit rotate with wrap around from LSB to MSB Destination RRX: Rotate Right Extended Single bit rotate with wrap around from CF to MSB CF

25 Using the Barrel Shifter Register, optionally with shift operation Shift value can be either be:  5 bit unsigned integer  Specified in bottom byte of another register. Used for multiplication by constant Immediate value 8 bit number, 0 ~ 255.  Rotated right through even number of positions Allows increased range of 32-bit constants to be loaded directly into registers Result Operand 1 Barrel Shifter Operand 2 ALU

26 Second Operand: Shifted Register The amount by which the register is to be shifted is contained in either: the immediate 5-bit field in the instruction  NO OVERHEAD  Shift is done for free - executes in single cycle. the bottom byte of a register (not PC)  Then takes extra cycle to execute  ARM doesn’t have enough read ports to read 3 registers at once.  Then same as on other processors where shift is separate instruction. If no shift is specified then a default shift is applied: LSL #0 i.e. barrel shifter has no effect on value in register.

27 Using a Shifted Register A more efficient solution of multiplication can often be found by using some combination of MOVs, ADDs, SUBs and RSBs with shifts. Multiplications by a constant ((power of 2) ± 1) can be done in one cycle. Example r0 = r1 * 5 = r1 + (r1 * 4) ADD r0, r1, r1, LSL #2 Example r2 = r3 * 105 = r3 * 15 * 7 = r3 * (16 - 1) * (8 - 1) RSB r2, r3, r3, LSL #4 ;r2 = r3 * 15 RSB r2, r2, r2, LSL #3 ;r2 = r2 * 7

28 Immediate Constants 1 No ARM instruction can contain a 32 bit immediate constant All ARM instructions are fixed as 32 bits long The data processing instruction format has 12 bits available for operand2 4 bit rotate value (0-15) is multiplied by two to give range 0-30 in steps of 2 Rule to remember is “ 8-bits shifted by an even number of bit positions ” immed_8 Shifter ROR rot x2 Quick Quiz: 0xe3a004ff MOV r0, #???

29 Immediate Constants 2 Examples: The assembler converts immediate values to the rotate form: MOV r0,#4096 ;uses 0x40 ror 26 ADD r1,r2,#0xFF0000 ;uses 0xFF ror 16 The bitwise complements can also be formed using MVN: MOV r0,#0xFFFFFFFF ;MVN r0,#0 Values that cannot be generated in this way will cause an error. ror #0 range 0-0xff step 0x ror #8 range 0-0x000000ff step 0x range 0-0x000003fc step 0x ror #

30 Loading 32 Bit Constants To allow larger constants to be loaded, the assembler offers a pseudo-instruction: LDR rd,=const This will either: Produce a MOV or MVN instruction to generate the value (if possible) or Generate a LDR instruction with a PC-relative address to read the constant from a literal pool. For example LDR r0,=0xFF  MOV r0,#0xFF LDR r0,=0x  LDR r0,[PC,#Imm12] … DCD 0x

31 Multiplication Instructions 1 Two multiplication instructions: Multiply MUL{ }{S} Rd,Rm,Rs ;Rd=Rm*Rs Multiply Accumulate - does addition for free MLA{ }{S} Rd,Rm,Rs,Rn ;Rd=(Rm*Rs)+Rn Restrictions on use: Rd and Rm cannot be the same register  Can be avoid by swapping Rm and Rs around. Cannot use PC. These will be picked up by the assembler if overlooked. Operands can be considered signed or unsigned Up to user to interpret correctly.

32 Multiplication Instructions 2 Cycle time Basic MUL instruction  2-5 cycles on ARM7TDMI  1-3 cycles on StrongARM/XScale  2 cycles on ARM9E/ARM102xE +1 cycle for ARM9TDMI (over ARM7TDMI) +1 cycle for accumulate (not on 9E though result delay is one cycle longer) +1 cycle for “ long ” Above are “ general rules ” - refer to the TRM for the core you are using for the exact details.

33 Multiply-Long Instructions Instructions are MULLRdHi,RdLo:=Rm*Rs MLALRdHi,RdLo:=(Rm*Rs)+RdHi,RdLo The full 64 bits of the result now matter Need to specify whether operands are signed or unsigned Therefore syntax of new instructions are: UMULL{ }{S} RdLo,RdHi,Rm,Rs UMLAL{ }{S} RdLo,RdHi,Rm,Rs SMULL{ }{S} RdLo,RdHi,Rm,Rs SMLAL{ }{S} RdLo,RdHi,Rm,Rs Not generated by the compiler. Warning : Unpredictable on non-M ARMs.

34 Quiz #3 1. Specify instructions which will implement the following: a) r0 = 16b) r1 = r0 * 4 c) r0 = r1 / 16 ( r1 signed 2's comp.)d) r1 = r2 * 7 2. What will the following instructions do? a) ADDS r0, r1, r1, LSL #2b) RSB r2, r1, #0 3. What does the following instruction sequence do? ADD r0, r1, r1, LSL #1 SUB r0, r0, r1, LSL #4 ADD r0, r0, r1, LSL #7

35 Outline Main Features Data Processing and Branch Instructions Data Transfer Instructions

36 Load / Store Instructions The ARM is a Load / Store Architecture: Does not support memory to memory data processing operations. Must move data values into registers before using them. This might sound inefficient, but in practice isn’t: Load data values from memory into registers. Process data in registers using a number of data processing instructions which are not slowed down by memory access. Store results from registers out to memory.

37 Single Register Data Transfer Operations are: LDRSTR Word LDRBSTRB Byte LDRHSTRH Halfword LDRSB Signed byte load LDRSH Signed halfword load Memory system must support all access sizes Syntax: LDR{ }{ } Rd, STR{ }{ } Rd, e.g. LDREQB

38 Load/Store Memory Address 1 Address accessed by LDR/STR is specified by a base register plus an offset. For word and unsigned byte accesses, offset can be An unsigned 12-bit immediate value (ie ). LDR r0,[r1,#8] A register, optionally shifted by an immediate value LDR r0,[r1,r2] LDR r0,[r1,r2,LSL#2]

39 Load/Store Memory Address 2 The offset can be either added or subtracted from the base register: LDR r0,[r1,#-8] LDR r0,[r1,-r2] LDR r0,[r1,-r2,LSL#2] For halfword and signed halfword / byte, offset can be: An unsigned 8 bit immediate value (ie bytes). A register (unshifted). Choice of pre-indexed or post-indexed addressing

40 Example: Based Addressing The memory location to be accessed is held in a base register STR r0, [r1]; Store contents of r0 to location ; pointed to by contents of r1. LDR r2, [r1]; Load r2 with contents of memory ; location pointed to by contents of r1. r1 0x200 Base Register Memory 0x5 0x200 r0 0x5 Source Register for STR r2 0x5 Destination Register for LDR

41 Example: Indexed Addressing The memory location to be accessed is calculate from the values held in a base register and a index register (optionally shifted by a constant). STR r0, [r1, r2, LSL #2]; Addr = (r1) + (r2) * 4 LDR r3, [r1, r2, LSL #2]; Addr = (r1) + (r2) * 4 r1 0x200 Base Register 0x200 r0 0x5 Source Register for STR r3 0x5 Destination Register for LDR Memory 0x5 r2 0x20 Index Register + 0x280  4

42 Pre or Post Indexed Addressing? 0x5 r1 0x200 Base Register 0x200 r0 0x5 Source Register for STR Offset 12 0x20c r1 0x200 Original Base Register 0x200 r0 0x5 Source Register for STR Offset 12 0x20c r1 0x20c Updated Base Register Auto-update form: STR r0,[r1,#12]! Pre-indexed: STR r0,[r1,#12] Post-indexed: STR r0,[r1],#12

43 User Mode Privilege When using post-indexed addressing, there is a further form of Load/Store Word/Byte: LDR{ }{B}T Rd, STR{ }{B}T Rd, When used in a privileged mode, this does the load/store with user mode privilege. Normally used by an exception handler that is emulating a memory access instruction that would normally execute in user mode.

44 Usage of Pre-indexed Addressing Mode Imagine an array, the first element of which is pointed to by the contents of r0. If we want to access a particular element, then we can use pre-indexed addressing: r1 is element we want. LDR r2, [r0, r1, LSL #2] element Memory Offset r0 Pointer to start of array

45 Usage of Post-indexed Addressing Mode If we want to step through every element of the array, for instance to produce sum of elements in the array, then we can use post-indexed addressing within a loop: r1 is address of current element (initially equal to r0). LDR r2, [r1], #4 Use a further register to store the address of final element, so that the loop can be correctly terminated.

46 Effect of Endianess The ARM can be set up to access its data in either little or big endian format. Little endian: bits 0-7  Least significant byte of a word is stored in bits 0-7 of an addressed word. Big endian: bits  Least significant byte of a word is stored in bits of an addressed word. This has no real relevance unless data is stored as words and then accessed in smaller sized quantities (halfwords or bytes). Which byte / halfword is accessed will depend on the endianess of the system involved.

47 Endianess Example Big-endian Little-endian r1 = 0x100 r0 = 0x r2 = 0x44 r2 = 0x11 STR r0, [r1] LDRB r2, [r1] r1 = 0x100 Memory

48 Quiz #4 Write a segment of code that adds together elements x to x+(n-1) of an array, where the element x=0 is the first element of the array. Each element of the array is word sized. The segment should use post-indexed addressing. At the start of your segments, you should assume that: r0 points to the start of the array. r1 = x r2 = n r0 x x + 1 x + (n - 1) Elements { n elements 0

49 Block Data Transfer 1 The LDM/STM instructions allow between 1 and 16 registers to be transferred to or from memory. The transferred registers can be either: Any subset of the current bank of registers (default). ^ Any subset of the user mode bank of registers when in a priviledged mode (postfix instruction with a ‘^’). Cond P U S W L Rn Register list Condition field Base register Load/Store bit 0 = Store to memory 1 = Load from memory Write- back bit 0 = no write-back 1 = write address into base PSR and force user bit 0 = don ’ t load PSR or force user mode 1 = load PSR or force user mode Up/Down bit 0 = Down; subtract offset from base 1 = Up ; add offset to base Pre/Post indexing bit 0 = Post; add offset after transfer, 1 = Pre ; add offset before transfer Each bit corresponds to a particular register. For example: Bit 0 set causes r0 to be transferred. Bit 0 unset causes r0 not to be transferred. At least one register must be transferred as the list cannot be empty.

50 Block Data Transfer 2 Base register used to determine where memory access should occur. 4 different addressing modes allow increment and decrement inclusive or exclusive of the base register location. ! Base register can be optionally updated following the transfer (by appending it with an ‘!’). Lowest register number is always transferred to/from lowest memory location accessed. These instructions are very efficient for Saving and restoring context  For this useful to view memory as a stack. Moving large blocks of data around memory  For this useful to directly represent functionality of the instructions.

51 Stacks A stack is an area of memory which grows as new data is “pushed” onto the “top” of it, and shrinks as data is “popped” off the top. Two pointers define the current limits of the stack. A base pointer  used to point to the “bottom” of the stack (the first location). A stack pointer  used to point the current “top” of the stack. SP BASE BASE SP POP 1 2 Result of pop = 3 BASE SP

52 Stack Operation 1 Traditionally, a stack grows down in memory, with the last “pushed” value at the lowest address. The ARM also supports ascending stacks, where the stack structure grows up through memory. The value of the stack pointer can either: Point to the last occupied address (Full stack)  and so needs pre-decrementing (ie before the push) Point to the next occupied address (Empty stack)  and so needs post-decrementing (ie after the push)

53 Stack Operation 2 The stack type to be used is given by the postfix to the instruction: STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack. STMED / LDMED : Empty Descending stack STMEA / LDMEA : Empty Ascending stack Note: ARM Compiler will always use a Full descending stack.

54 Stack Examples STMFD sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP STMED sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP Old SP r5 r4 r3 r1 r0 STMFA sp!, {r0,r1,r3-r5} SP Old SP 0x400 0x418 0x3e8 STMEA sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP Old SP

55 Stacks and Subroutines 1 One use of stacks is to create temporary register workspace for subroutines. Any registers that are needed can be pushed onto the stack at the start of the subroutine and popped off again at the end so as to restore them before return to the caller: STMFD sp!,{regs,lr} : BL func2 : LDMFD sp!,{regs,pc} func1func2 : BL func1 : MOV pc, lr

56 Stacks and Subroutines 2 See the chapter on the ARM Procedure Call Standard in the SDT Reference Manual for further details of register usage within subroutines. If the pop instruction also had the ‘S’ bit set (using ‘^’) then the transfer of the PC when in a privileged mode would also cause the SPSR to be copied into the CPSR.

57 Direct Block Data Transfer When LDM / STM are not being used to implement stacks, it is clearer to specify exactly what functionality of the instruction is: i.e. specify whether to increment / decrement the base pointer, before or after the memory access. In order to do this, LDM / STM support a further syntax in addition to the stack one: STMIA / LDMIA : Increment After STMIB / LDMIB : Increment Before STMDA / LDMDA : Decrement After STMDB / LDMDB : Decrement Before

58 Example: Block Copy Copy a block of memory, which is an exact multiple of 12 words long from the location pointed to by r12 to the location pointed to by r13. r14 points to the end of block to be r12 points to the start of the source r14 points to the end of the source r13 points to the start of the destination data Loop:LDMIAr12!, load 48 bytes STMIAr13!, and store them CMPr12, check for the end and loop until done This loop transfers 48 bytes in 31 cycles Over 50 Mbytes/sec at 33 MHz

59 Quiz #5 The contents of registers r0 to r6 need to be swapped around thus: r0 moved into r3 r1 moved into r4 r2 moved into r6 r3 moved into r5 r4 moved into r0 r5 moved into r1 r6 moved into r2 Write a segment of code that uses full descending stack operations to carry this out, and requires no use of any other registers for temporary storage.

60 Swap and Swap Byte Instructions Atomic operation of a memory read followed by a memory write which moves byte or word quantities between registers and memory. Syntax: SWP{ }{B} Rd, Rm, [Rn] Thus to implement an actual swap of contents make Rd = Rm. The compiler cannot produce this instruction. Rm Rd Rn temp Memory

61 Software Interrupt (SWI) Causes an exception trap to the SWI hardware vector The SWI handler can examine the SWI number to decide what operation has been requested. By using the SWI mechanism, an operating system can implement a set of privileged operations which applications running in user mode can request. Syntax: SWI{ } Cond SWI number (ignored by processor) 23 Condition Field

62 PSR Transfer Instructions MRS and MSR allow contents of CPSR / SPSR to be transferred to / from a general purpose register. Syntax: MRS{ } Rd, ; Rd = MSR{ },Rm ; = Rm MSR{ },#Immediate = CPSR or SPSR [_fields] = any combination of ‘fsxc’ In User Mode, all bits can be read but only the condition flags (_f) can be written N Z C V Q 2867 I F T mode fsxc U n d e f i n e dJ

63 Using MRS and MSR Currently reserved bits may be used in future, therefore: they must be preserved when altering PSR. the value they return must not be relied upon when testing other bits. Thus read-modify-write strategy must be followed when modifying any PSR: Transfer PSR to register using MRS Modify relevant bits Transfer updated value back to PSR using MSR

64 Quiz #6 Write a short code segment that performs a mode change by modifying the contents of the CPSR The mode you should change to is user mode which has the value 0x10. This assumes that the current mode is a privileged mode such as supervisor mode. This would happen for instance when the processor is reset - reset code would be run in supervisor mode which would then need to switch to user mode before calling the main routine in your application. You will need to use MSR and MRS, plus 2 logical operations.