Advanced RISC Machines

Advanced RISC Machines
ARM Advanced RISC Machines ARM Instruction Set

Content Coverage

Main features of the ARM Instruction Set
All instructions are 32 bits long. Most instructions execute in a single cycle. Every instruction can be conditionally executed. A load/store architecture Data processing instructions act only on registers Three operand format Combined ALU and shifter for high speed bit manipulation Specific memory access instructions with powerful auto-indexing addressing modes. 32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4. Flexible multiple register load and store instructions Instruction set extension via coprocessors

Accessing Registers using ARM Instructions
No breakdown of currently accessible registers. All instructions can access r0-r14 directly. Most instructions also allow use of the PC. Specific instructions to allow access to CPSR and SPSR. Note : When in a privileged mode, it is also possible to load / store the (banked out) user mode registers to or from memory.

Condition Flags Flags Logical Instruction Arithmetic Instruction
Negative No meaning Bit 31 of the result has been set (N=‘1’) Indicates a negative number in signed operations Zero Result is all zeroes Result of operation was zero (Z=‘1’) Carry After Shift operation Result was greater than 32 bits (C=‘1’) ‘1’ was left in carry flag oVerflow No meaning Result was greater than 31 bits (V=‘1’) Indicates a possible corruption of the sign bit in signed numbers N flag SUB r0, r1, r2 where r1<r2 Z flag SUB r0, r1, r2 where r1=r2 (also used for results of logical operations) C flag ADD r0, r1, r2 where r1+r2>0xFFFFFFFF V flag ADD r0, r1, r2 where r1+r2>0x7FFFFFFF (if numbers are signed, ALU sign bit will be corrupted) (0x7FFFFFF+0x =0x ) (answer okay for unsigned but wrong for signed)

The Program Counter (R15)
When the processor is executing in ARM state: All instructions are 32 bits in length All instructions must be word aligned Therefore the PC value is stored in bits [31:2] with bits [1:0] equal to zero (as instruction cannot be half-word or byte aligned). R14 is used as the subroutine link register (LR) and stores the return address when Branch with Link operations are performed, calculated from the PC. Thus to return from a linked branch MOV r15,r14 or MOV pc,lr When used in relation to the ARM Byte means 8 bits Halfword means 16 bits (two bytes) Word means 32 bits (four bytes)

Exception Handling and the Vector Table
When an exception occurs, the core: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits If core implements ARM Architecture 4T and is currently in Thumb state, then ARM state is entered. Mode field bits Interrupt disable flags if appropriate. Maps in appropriate banked registers Stores the “return address” in LR_<mode> Sets PC to vector address To return, exception handler needs to: Restore CPSR from SPSR_<mode> Restore PC from LR_<mode>

The Instruction Pipeline
The ARM uses a pipeline in order to increase the speed of the flow of instructions to the processor. Allows several operations to be undertaken simultaneously, rather than serially. Rather than pointing to the instruction being executed, the PC points to the instruction being fetched. FETCH DECODE EXECUTE Instruction fetched from memory Decoding of registers used in instruction Register(s) read from Register Bank Shift and ALU operation Write register(s) back to Register Bank PC PC - 4 PC - 8 ARM Each instruction is one word (or 32 bits) Thus each stage in pipeline is one word In other words 4 bytes, hence the offsets of 4 and 8 used here. Most instructions execute in a single cycle Helps to keep the pipeline operating efficiently - only stalls if executing instruction takes several cycles. Thus every cycle, processor can be loading one instruction, decoding another, whilst executing a third. Typically the PC can be assumed to be current instruction plus 8 Cases when not case include When exceptions taken the address stored in LR varies - see Exception Handling module for more details. When PC used in some data processing operations value is unpredictable

ARM Instruction Set Format
31 28 27 16 15 8 7 Instruction type Data processing / PSR Transfer Multiply Long Multiply (v3M / v4 only) Swap Load/Store Byte/Word Load/Store Multiple Halfword transfer : Immediate offset (v4 only) Halfword transfer: Register offset (v4 only) Branch Branch Exchange (v4T only) Coprocessor data transfer Coprocessor data operation Coprocessor register transfer Software interrupt Cond I Opcode S Rn Rd Operand2 Cond A S Rd Rn Rs Rm Cond U A S RdHi RdLo Rs Rm Cond B Rn Rd Rm Cond I P U B W L Rn Rd Offset Cond P U S W L Rn Register List Cond P U 1 W L Rn Rd Offset1 1 S H 1 Offset2 Cond P U 0 W L Rn Rd S H 1 Rm Cond L Offset Cond P U N W L Rn CRd CPNum Offset Cond Op1 CRn CRd CPNum Op CRm Cond Op1 L CRn Rd CPNum Op CRm Cond SWI Number Cond Rn

Conditional Execution
Most instruction sets only allow branches to be executed conditionally. However by reusing the condition evaluation hardware, ARM effectively increases number of instructions. All instructions contain a condition field which determines whether the CPU will execute them. Non-executed instructions soak up 1 cycle. Still have to complete cycle so as to allow fetching and decoding of following instructions. This removes the need for many branches, which stall the pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The Time penalty of not executing several conditional instructions is frequently less than overhead of the branch or subroutine call that would otherwise be needed.

The Condition Field cond 27 28 31 1110 AL Always any Opcode [31:28]
27 28 31 Opcode [31:28] Mnemonic extension Interpretation Status flag state for execution 0000 EQ Equal / equals zero Z set 0001 NE Not equal Z clear 0010 CS/HS Carry set / unsigned higher or some C set 0011 CC/LO Carry clear / unsigned lower C clear 0100 MI Minus / negative N set 0101 PL Plus / positive or zero N clear 0110 VS Overflow V set 0111 VC No overflow V clear 1000 HI Unsigned higher C set and Z clear 1001 LS Unsigned lower or same C clear or Z set 1010 GE Signed greater than or equal N equals V 1011 LT Signed less than N is not equal to V 1100 GT Signed greater than Z clear and N equals V 1101 LE Signed less than or equal Z sets or N is not equal to V 1110 AL Always any 1111 NV Never (do not use!) none 4 bit Condition Field refers to the values of the appropriate bits in the CPSR, as indicated on slide. Most instructions assembled with default condition code "always" (1110, AL). This means the instruction will be executed irrespective of the flags in CPSR. The "never" code (1111, NV) is reserved. This family of conditions will be redefined for future use in other ARM devices. Use MOV r0, r0 as NOP operation. Conditional instructions aids code density

Using and updating the Condition Field
To execute an instruction conditionally, simply postfix it with the appropriate condition: For example an add instruction takes the form: ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL) To execute this only if the zero flag is set: ADDEQ r0,r1,r2 ; If zero flag set then… ; ... r0 = r1 + r2 By default, data processing operations do not affect the condition flags (apart from the comparisons where this is the only effect). To cause the condition flags to be updated, the S bit of the instruction needs to be set by post-fixing the instruction (and any condition code) with an “S”. For example to add two numbers and set the condition flags: ADDS r0,r1,r2 ; r0 = r1 + r ; ... and set flags

ARM instructions by instruction class
Data Processing Instructions Branch Instructions Load-Store Instructions Software Interrupt Instruction Program Status Register Instructions

Data Processing Instructions
Move Instructions Barrel Shifter Arithmetic Instructions Using the Barrel Shifter with Arithmetic Instructions Logical Instructions Comparison Instructions Multiply Instructions

The basic encoding format for the instructions, such as Load, Store, Move, Arithmetic, and Logic instructions, is shown below An instruction specifies a conditional execution code (Condition), the OP code, two or three registers (Rn, Rd, and Rm), and some other information

Data Processing Instruction Format
cond 0 0 operand 2 # opcode S Rn Rd 31 28 27 26 25 24 21 20 19 16 15 12 1 destination register first operand register set condition codes arithmetic/logic function 8-bit immediate 8 7 #rot Rm 6 5 4 3 #shift Rs Sh immediate alignment immediate shift length shift type second operand register register shift length

Move Instructions Syntax: <instruction>{<cond>}{S} Rd, N
Move is the simplest ARM instruction. It copies N into a destination register Rd, where N is a register or immediate value. This instruction is useful for setting initial values and transferring data between registers. Syntax: <instruction>{<cond>}{S} Rd, N Usually it is a register Rm or a constant preceded by #. MOV Move a 32-bit value into a register Rd = N MVN move the NOT of the 32-bit value into a register Rd =∼N

Example 1 This example shows a simple move instruction. The MOV instruction takes the contents of register r5 and copies them into register r7, in this case, taking the value 5, and overwriting the value 8 in register r7. PRE r5 = 5 r7 = 8 MOV r7, r5 ; let r7 = r5 (comment) POST r5 = 5 r7 = 5

Loading the constant 0xff00ffff into register r0 using an MVN.
Example 2 Loading the constant 0xff00ffff into register r0 using an MVN. PRE none... MVN r0, #0x00ff0000 POST r0 = 0xff00ffff

Barrel Shifter MOV instruction where N is a simple register. But N can be more than just a register or immediate value; it can also be a register Rm that has been preprocessed by the barrel shifter prior to being used by a data processing instruction. Data processing instructions are processed within the arithmetic logic unit (ALU). A unique and powerful feature of the ARM processor is the ability to shift the 32-bit binary pattern in one of the source registers left or right by a specific number of positions before it enters the ALU. This shift increases the power and flexibility of many data processing operations. Pre-processing or shift occurs within the cycle time of the instruction. This is particularly useful for loading constants into a register and achieving fast multiplies or division by a power of 2.

For example, the contents of bit 0 are shifted to bit 1
For example, the contents of bit 0 are shifted to bit 1. Bit 0 is cleared. The C flag is updated with the last bit shifted out of the register. This is bit (32−y) of the original value, where y is the shift amount. When y is greater than one, then a shift by y positions is the same as a shift by one position executed y times.

Barrel Shifter Operation
Table 1

Barrel Shifter - Left Shift
Shifts left by the specified amount (multiplies by powers of two) e.g. LSL #5 = multiply by 32 Logical Shift Left (LSL) CF Destination Effectively x 2n. bits of operand shifted left by specified number of bits. Zeros are inserted into the right most end of the word last bit removed from the left most end placed in carry flag. LSL #0 does nothing and is the default shift if none is specified.

Barrel Shifter - Right Shifts
Logical Shift Right Shifts right by the specified amount (divides by powers of two) e.g. LSR #5 = divide by 32 Arithmetic Shift Right Shifts right (divides by powers of two) and preserves the sign bit, for 2's complement operations. e.g. ASR #5 = divide by 32 Logical Shift Right Destination CF ...0 Arithmetic Shift Right LSR moves each bit of the register (Rm) to the right by the specified amount. eg / LSR #5 all bits are moved 5 places to right (i.e. into less significant positions). most significant positions then filled with zeros least significant five bits of Rm are removed. The last of these five bits becomes the shifter Carry output, and may set the CPSR's C flag (when the ALU operation is in the logical class). An LSR 0 is translated into an LSR 32, to put bit 31 of the word into the carry flag. ASR similar to LSR except that high bits are filled with bit 31 of Rm (the sign bit) instead of zeros. This preserves the sign in 2's complement notation. eg/ Divide a 2's complement number B (or -126) in Ra by 2. MOV Ra, Ra, ASR #2 From Ra= B we get RA= B (or - 63) An ASR #0 is translated into an ASR #32, to propagate bit 31 of the word into every bit in the word as well as the carry flag. Destination CF Sign bit shifted in

Barrel Shifter - Rotations
Rotate Right (ROR) • Similar to an ASR but the bits wrap around as they leave the LSB and appear as the MSB. e.g. ROR #5 Note the last bit rotated is also used as the Carry Out. Rotate Right Extended (RRX) • This operation uses the CPSR C flag as a 33rd bit. Rotates right by 1 bit. Encoded as ROR #0. Rotate Right Destination CF Rotate Right (ROR), and Rotate Right Extended (RRX) operations used to manipulate the bits in a register without destroying those bits. move each bit of the register (Rm) by the specified amount. eg/ ROR #5 all bits moved 5 places to the right (i.e. into less significant positions). However, rather than being discarded the bits shifted out are then rotated back to other high of the register in ROR the bit previously at [0] is rotated to the MSB, [31] In RRX - Carry flag included as a 33rd bit in the operation. Bit 0 becomes the Carry out, while the C becomes the MSB RRX actually encoded as ROR #0 Example of use: in random number generator given in datasheet . Destination CF Rotate Right through Carry

Using the Barrel Shifter: The Second Operand
Result ALU Barrel Shifter Operand 2 Register, optionally with shift operation applied. Shift value can be either be: 5 bit unsigned integer Specified in bottom byte of another register. Immediate value 8 bit number Can be rotated right through an even number of positions. Assembler will calculate rotate for you from constant.

Second Operand : Shifted Register
The amount by which the register is to be shifted is contained in either: the immediate 5-bit field in the instruction NO OVERHEAD Shift is done for free - executes in single cycle. the bottom byte of a register (not PC) Then takes extra cycle to execute ARM doesn’t have enough read ports to read 3 registers at once. Then same as on other processors where shift is separate instruction. If no shift is specified then a default shift is applied: LSL #0 i.e. barrel shifter has no effect on value in register.

Example 3 We apply a logical shift left (LSL) to register Rm before moving it to the destination register. This is the same as applying the standard C language shift operator to the register. The MOV instruction copies the shift operator result N into register Rd. PRE r5 = 5 r7 = 8 MOV r7, r5, LSL #2 ; let r7 = r5*4 = (r5 << 2) POST r5 = 5 r7 = 20 The example multiplies register r5 by four and then places the result into register r7.

Table 2

PRE cpsr = nzcvqiFt_USER r0 = 0x00000000 r1 = 0x80000004
Example 4 This example of a MOVS instruction shifts register r1 left by one bit. This multiplies register r1 by a value 21. PRE cpsr = nzcvqiFt_USER r0 = 0x r1 = 0x MOVS r0, r1, LSL #1 POST cpsr = nzCvqiFt_USER r0 = 0x As you can see, the C flag is updated in the cpsr because the S suffix is present in the instruction mnemonic.

Arithmetic Instructions
The arithmetic instructions implement addition and subtraction of 32-bit signed and unsigned values. Syntax: <instruction>{<cond>}{S} Rd, Rn, N N is the result of the shifter operation.

Example 5 This simple subtract instruction subtracts a value stored in register r2 from a value stored in register r1. The result is stored in register r0. Syntax: <instruction>{<cond>}{S} Rd, Rn, N PRE r0 = 0x r1 = 0x r2 = 0x SUB r0, r1, r2 POST r0 = 0x

Example 6 This reverse subtract instruction (RSB) subtracts r1 from the constant value #0, writing the result to r0. You can use this instruction to negate numbers. Syntax: <instruction>{<cond>}{S} Rd, Rn, N PRE r0 = 0x r1 = 0x RSB r0, r1, #0 ; Rd = 0x0 - r1 POST r0 = -r1 = 0xffffff89

Example 7 The SUBS instruction is useful for decrementing loop counters. In this example we subtract the immediate value one from the value one stored in register r1. The result value zero is written to register r1. The cpsr is updated with the Z flags being set. Syntax: <instruction>{<cond>}{S} Rd, Rn, N PRE cpsr = nzCvqiFt_USER r1 = 0x SUBS r1, r1, #1 POST cpsr = nZCvqiFt_USER r1 = 0x

Using the Barrel Shifter with Arithmetic Instructions
The wide range of second operand shifts available on arithmetic and logical instructions is a very powerful feature of the ARM instruction set.

Syntax: <instruction>{<cond>}{S} Rd, Rn, N
Example 8 Register r1 is first shifted one location to the left to give the value of twice r1. The ADD instruction then adds the result of the barrel shift operation to register r1. The final result transferred into register r0 is equal to three times the value stored in register r1. Syntax: <instruction>{<cond>}{S} Rd, Rn, N PRE r0 = 0x r1 = 0x ADD r0, r1, r1, LSL #1 POST r0 = 0x f

Logical Instructions Logical instructions perform bitwise logical operations on the two source registers. Syntax: <instruction>{<cond>}{S} Rd, Rn, N r0:= r1ANDr2 r0:= r1ORr2 r0:= r1XORr2 r0:= r1AND(NOT r2) , bit clear

Example 9 This example shows a logical OR operation between registers r1 and r2. r0 holds the result. PRE r0 = 0x r1 = 0x r2 = 0x ORR r0, r1, r2 POST r0 = 0x

Example 10 This example shows a more complicated logical instruction called BIC, which carries out a logical bit clear. PRE r1 = 0b1111 r2 = 0b0101 BIC r0, r1, r2; Rd = Rn AND NOT(N) POST r0 = 0b1010 The logical instructions update the cpsr flags only if the S suffix is present. These instructions can use barrel-shifted second operands in the same way as the arithmetic instructions.

Comparison Instructions
The comparison instructions are used to compare or test a register with a 32-bit value. They update the cpsr flag bits according to the result, but do not affect other registers. After the bits have been set, the information can then be used to change program flow by using conditional execution. You do not need to apply the S suffix for comparison instructions to update the flags.

Syntax: <instruction>{<cond>} Rn, N
The CMP is effectively a subtract instruction with the result discarded; similarly the TST instruction is a logical AND operation, and TEQ is a logical exclusive OR operation. For each, the results are discarded but the condition bits are updated in the cpsr. It is important to understand that comparison instructions only modify the condition flags of the cpsr and do not affect the registers being compared.

PRE cpsr = nzcvqiFt_USER r0 = 4 r9 = 4 CMP r0, r9
Example 11 This example shows a CMP comparison instruction. You can see that both registers, r0 and r9, are equal before executing the instruction. The value of the z flag prior to execution is 0 and is represented by a lowercase z. After execution the z flag changes to 1 or an uppercase Z. This change indicates equality. PRE cpsr = nzcvqiFt_USER r0 = 4 r9 = 4 CMP r0, r9 POST cpsr = nZcvqiFt_USER

Multiply Instructions
The multiply instructions multiply the contents of a pair of registers and, depending upon the instruction, accumulate the results in with another register. The long multiplies accumulate onto a pair of registers representing a 64-bit value. The final result is placed in a destination register or a pair of registers.

32-bit product (Least Significant)
Syntax: MLA{<cond>}{S} Rd, Rm, Rs, Rn MUL{<cond>}{S} Rd, Rm, Rs 64-bit Product Syntax: <instruction>{<cond>}{S} RdLo, RdHi, Rm, Rs

Example 12 This example shows a simple multiply instruction that multiplies registers r1 and r2 together and places the result into register r0. In this example, register r1 is equal to the value 2, and r2 is equal to 2. The result, 4, is then placed into register r0. PRE r0 = 0x r1 = 0x r2 = 0x MUL r0, r1, r2 ; r0 = r1*r2 POST r0 = 0x The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce a 64-bit result. The result is too large to fit a single 32-bit register so the result is placed in two registers labeled RdLo and RdHi. RdLo holds the lower 32 bits of the 64-bit result, and RdHi holds the higher 32 bits of the 64-bit result. Example 3.12 shows an example of a long unsigned multiply instruction.

Example 13 The instruction multiplies registers r2 and r3 and places the result into register r0 and r1. Register r0 contains the lower 32 bits, and register r1 contains the higher 32 bits of the 64-bit result. PRE r0 = 0x r1 = 0x r2 = 0xf r3 = 0x UMULL r0, r1, r2, r3 ; [r1,r0] = r2*r3 POST r0 = 0xe ; = RdLo r1 = 0x ; = RdHi

Branch Instructions A branch instruction changes the flow of execution or is used to call a routine. This type of instruction allows programs to have subroutines, if-then-else structures, and loops. The change of execution flow forces the program counter pc to point to a new address.

Syntax: B{<cond>} label BL{<cond>} label
BX{<cond>} Rm BLX{<cond>} label | Rm The address label is stored in the instruction as a signed pc-relative offset and must be within approximately 32 MB of the branch instruction. T refers to the Thumb bit in the cpsr. When instructions set T, the ARM switches to Thumb state.

Conditional Branch

Unconditional Jump Infinite loop Example 14
This example shows a forward and backward branch. Because these loops are address specific, we do not include the pre- and post-conditions. The forward branch skips three instructions. The backward branch creates an infinite loop. Unconditional Jump Infinite loop B forward ADD r1, r2, #4 ADD r0, r6, #2 ADD r3, r7, #4 forward SUB r1, r2, #4 backward ADD r1, r2, #4 SUB r1, r2, #4 ADD r4, r6, r7 B backward Branches are used to change execution flow. Most assemblers hide the details of a branch instruction encoding by using labels. In this example, forward and backward are the labels. The branch labels are placed at the beginning of the line and are used to mark an address that can be used later by the assembler to calculate the branch offset.

BL subroutine ; branch to subroutine CMP r1, #5 ; compare r1 with 5
Example 15 The branch with link, or BL, instruction is similar to the B instruction but overwrites the link register lr with a return address. It performs a subroutine call. This example shows a simple fragment of code that branches to a subroutine using the BL instruction. To return from a subroutine, you copy the link register to the pc. BL subroutine ; branch to subroutine CMP r1, #5 ; compare r1 with 5 MOVEQ r1, #0 ; if (r1==5) then r1 = 0 : subroutine <subroutine code> MOV pc, lr ; return by moving pc = lr The branch exchange (BX) and branch exchange with link (BLX) are the third type of branch instruction. The BX instruction uses an absolute address stored in register Rm. It is primarily used to branch to and from Thumb code. The T bit in the cpsr is updated by the least significant bit of the branch register. Similarly the BLX instruction updates the T bit of the cpsr with the least significant bit and additionally sets the link register with the return address.

Load-Store Instructions
Three basic forms to move data between ARM registers and memory Single register load and store instruction A byte, a 16-bit half word, a 32-bit word Multiple register load and store instruction To save or restore workspace registers for procedure entry and exit To copy clocks of data Single register swap instruction A value in a register to be exchanged with a value in memory To implement semaphores to ensure mutual exclusion on accesses

Single Register Data Transfer
Word transfer LDR / STR Byte transfer LDRB / STRB Half-word transfer LDRH / STRH Load singled byte or half-word-load value and sign extended to 32 bits LDRSB / LDRSH All of these can be conditionally executed by inserting the appropriate condition code after STR/LDR LDREQB

Single Word and Unsigned Byte Data Transfer instructions
Pre-indexed form LDR|STR{<cond>}{B} Rd, [Rn, <offset>]{!} Post-indexed form LDR|STR{<cond>}{B} Rd, [Rn], <offset> PC-relative form LDR|STR{<cond>}{B} Rd, LABEL LDR: ’load register’; STR: ’store register’ ‘B’ unsigned byte transfer, default is word; <offset> may be # +/-<12-bit immediate> or +/- Rm{, shift} !: auto-indexing T flag selects the user view of the memory translation and protection system

Addressing Modes Register-indirect addressing
Base-plus-offset addressing Base register r0 – r15 Offset, and or subtract an unsigned number Immediate Register (not PC) Scaled register (only available for word and unsigned byte instructions) Stack addressing Block-copy addressing

Register-Indirect Addressing
Use a value in one register (base register) as a memory address LDR r0,[r1] ;r0:= mem32[r1] STR r0,[r1] ;mem32[r1]:= r0 Other forms Adding immediate or register offsets to the base address Base Register

Base-plus-offset Addressing (1/2)
Pre-indexing LDR r0,[r1,#4] ;r0:=mem32[r1+4] Offset up to 4K, added or subtracted, (# -4) Post-indexing LDR r0,[r1],#4 ;r0:=mem32[r1], r1:=r1+4 Equivalent to a simple register-indirect load, but faster, less code space Auto-indexing / Prep-indexing with write-back LDR r0, [r1,#4]! ;r0:=mem32[r1+4],r1:=r1+4 No extra time, auto-indexing performed while the data is being fetched from memory

Pre-indexing Post-indexing LDR r0,[r1,#4]; r0:=mem32[r1+4]
PRE(Before Execution) r0 = 0X192 r1 = 0X 0X C 0X003 0X 0X045 offset 0X 0X023 r0 = 0X023 #4 r1 (base register) 0X 0X 0X040 Post-indexing LDR r0,[r1],#4 ;r0:=mem32[r1], r1:=r1+4 PRE(Before Execution) r0 = 0X192 r1 = 0X 0X C 0X003 0X 0X045 offset 0X 0X023 #4 r1 (base register) 0X 0X 0X040 r0 = 0X040

Base-plus-offset Addressing (2/2)

Example 16 Preindex with writeback calculates an address from a base register plus address offset and then updates that address base register with the new address. In contrast, the preindex offset is the same as the preindex with writeback but does not update the address base register. Postindex only updates the address base register after the address is used. The preindex mode is useful for accessing an element in a data structure. The postindex and preindex with writeback modes are useful for traversing an array. PRE r0 = 0x r1 = 0x mem32[0x ] = 0x mem32[0x ] = 0x LDR r0, [r1, #4]! ; Preindexing with writeback POST(1) r0 = 0x r1 = 0x LDR r0, [r1, #4] ; Preindexing POST(2) r0 = 0x r1 = 0x LDR r0, [r1], #4 ; Postindexing POST(3) r0 = 0x r1 = 0x

Single-register load-store addressing, word or unsigned byte.
The addressing modes available with a particular load or store instruction depend on the instruction class. Table shows the addressing modes available for load and store of a 32-bit word or an unsigned byte. A signed offset or register is denoted by “+/−”, identifying that it is either a positive or negative offset from the base address register Rn. The base address register is a pointer to a byte in memory, and the offset specifies a number of bytes. Immediate means the address is calculated using the base address register and a 12-bit offset encoded in the instruction. Register means the address is calculated using the base address register and a specific register’s contents. Scaled means the address is calculated using the base address register and a barrel shift operation.

Multiple Register Data Transfer
Load-store multiple instructions can transfer multiple registers between memory and the processor in a single instruction. The transfer occurs from a base address register Rn pointing into memory. These instruction are very efficient for Moving block of data around memory Saving and restoring context – stack

Load-store multiple instructions can increase interrupt latency
Load-store multiple instructions can increase interrupt latency. ARM implementations do not usually interrupt instructions while they are executing. For example, on an ARM7 a load multiple instruction takes Nt cycles, where N is the number of registers to load and t is the number of cycles required for each sequential access to memory. If an interrupt has been raised, then it has no effect until the load-store multiple instruction is complete.

Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{
Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{ˆ} Any subset of the current bank of registers can be transferred to memory or fetched from memory. The base register Rn determines the source or destination address for a load-store multiple instruction. This register can be optionally updated following the transfer. This occurs when register Rn is followed by the ! character, similiar to the single-register load-store using preindex with writeback.

LDMIA r0!, {r1-r3} or LDMIA r0!,{r1,r2,r3} POST r0 = 0x0008001c
Example 17 In this example, register r0 is the base register Rn and is followed by !, indicating that the register is updated after the instruction is executed. You will notice within the load multiple instruction that the registers are not individually listed. Instead the “-” character is used to identify a range of registers. In this case the range is from register r1 to r3 inclusive. Each register can also be listed, using a comma to separate each register within “{” and “}” brackets. PRE mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x r1 = 0x r2 = 0x r3 = 0x LDMIA r0!, {r1-r3} or LDMIA r0!,{r1,r2,r3} POST r0 = 0x c r1 = 0x r2 = 0x r3 = 0x

Let us see graphical representation
The base register r0 points to memory address 0x80010 in the PRE condition. Memory addresses 0x80010, 0x80014, and 0x80018 contain the values 1, 2, and 3 respectively. After the load multiple instruction executes registers r1, r2, and r3 contain these values. The base register r0 now points to memory address 0x8001c after the last loaded word. Pre-condition for LDMIA instruction.

Post-condition for LDMIA instruction.
Post-condition for LDMIB instruction.

Load-store multiple pairs when base update used.
Above Table shows a list of load-store multiple instruction pairs. If you use a store with base update, then the paired load instruction of the same number of registers will reload the data and restore the base address pointer. This is useful when you need to temporarily save a group of registers and restore them later.

Example 18 This example shows an STM increment before instruction followed by an LDM decrement after instruction. PRE (1) r0 = 0x r1 = 0x r2 = 0x r3 = 0x STMIB r0!, {r1-r3} MOV r1, #1 MOV r2, #2 MOV r3, #3 LDMDA r0!, {r1-r3} POST r0 = 0x r1 = 0x r2 = 0x r3 = 0x PRE(2) r0 = 0x c r1 = 0x r2 = 0x r3 = 0x The STMIB instruction stores the values 7, 8, 9 to memory. We then corrupt register r1 to r3. The LDMDA reloads the original values and restores the base pointer r0.

Example 19 We illustrate the use of the load-store multiple instructions with a block memory copy example. This example is a simple routine that copies blocks of 32 bytes from a source address location to a destination address location. The example has two load-store multiple instructions, which use the same increment after addressing mode. ; r9 points to start of source data say 0x ; r10 points to start of destination data ; r11 points to end of the source loop LDMIA r9!, {r0-r7}; load 32 bytes from source and update r9 pointer STMIA r10!, {r0-r7} ; store 32 bytes to destination and update r10 pointer and store them ; have we reached the end CMP r9, r11 BNE loop

LDMIA r9!, {r0-r7} r9 0X008 0X00002020 INCREMENT AFTER FORMULAE 0X083
Starting Address = Rn i.e. r9 here(Base register) 0x Ending Address = Rn + 4*N -4 = (0x ) ∗𝑁 − = (0x 𝐶) 16 0X083 0X C 0X045 0X 0X023 0X 0X040 0X r0 <- mem32[r9] = 0X040 r1 <- mem32[r9] + 4 = 0X023 r2 <- mem32[r9] + 8 = 0X055 r3 <- mem32[r9] + 12 = 0X003 r4 <- mem32[r9] + 16 = 0X040 r5 <- mem32[r9] + 20 = 0X023 r6 <- mem32[r9] + 24 = 0X045 r7 <- mem32[r9] + 28 = 0X083 0X003 0X C 0X055 0X 0X023 0X 0X040 0X Source Memory Block Now as we have r9! So it is auto updated by formulae Rn + 4* N i.e. r9 = 0x

STMIA r10!, {r0-r7} r10 0X008 0X 0X083 0X C INCREMENT AFTER FORMULAE Let destination address : 0x Starting Address = Rn i.e. r10 here(Base register) 0x Ending Address = Rn + 4*N -4 = (0x ) ∗𝑁 − = (0x 𝐶) 16 0X045 0X 0X023 0X 0X040 0X 0X003 0X C 0X055 0X r0 -> mem32[r10] = 0X r1 -> mem32[r10] + 4 = 0X r2 -> mem32[r10] + 8 = 0X r3 -> mem32[r10] + 12 = 0X C r4 -> mem32[r = 0X r5 -> mem32[r10] + 20 = 0X r6 -> mem32[r10] + 24 = 0X r7 -> mem32[r10] + 28 = 0X C 0X023 0X 0X040 0X Destination Memory Block Now as we have r9! So it is auto updated by formulae Rn + 4* N i.e. r10 = 0x

This routine relies on registers r9, r10, and r11 being set up before the code is executed.
Registers r9 and r11 determine the data to be copied, and register r10 points to the destination in memory for the data. LDMIA loads the data pointed to by register r9 into registers r0 to r7. It also updates r9 to point to the next block of data to be copied. STMIA copies the contents of registers r0 to r7 to the destination memory address pointed to by register r10. It also updates r10 to point to the next destination location. CMP and BNE compare pointers r9 and r11 to check whether the end of the block copy has been reached. If the block copy is complete, then the routine finishes; otherwise the loop repeats with the updated values of register r9 and r10. The BNE is the branch instruction B with a condition mnemonic NE (not equal). If the previous compare instruction sets the condition flags to not equal, the branch instruction is executed.

Figure : Block memory copy in the memory map

Stack Processing A stack is usually implemented as a linear data structure which grows up (an ascending stack) or down (a descending stack) memory A stack pointer holds the address of the current top of the stack, either by pointing to the last valid data item pushed onto the stack (a full stack), or by pointing to the vacant slot where the next data item will be placed (an empty stack) ARM multiple register transfer instructions support all four forms of stacks Full ascending: grows up; base register points to the highest address containing a valid item empty ascending: grows up; base register points to the first empty location above the stack Full descending: grows down; base register points to the lowest address containing a valid data empty descending: grows down; base register points to the first empty location below the stack

The ARM architecture uses the load-store multiple instructions to carry out stack operations.
The pop operation (removing data from a stack) uses a load multiple instruction; similarly, the push operation (placing data onto the stack) uses a store multiple instruction. When using a stack you have to decide whether the stack will grow up or down in memory. A stack is either ascending (A) or descending (D). Ascending stacks grow towards higher memory addresses; in contrast, descending stacks grow towards lower memory addresses. When you use a full stack (F), the stack pointer sp points to an address that is the last used or full location (i.e., sp points to the last item on the stack). In contrast, if you use an empty stack (E) the sp points to an address that is the first unused or empty location (i.e., it points after the last item on the stack). There are a number of load-store multiple addressing mode aliases available to support stack operations (see Table). Next to the pop column is the actual load multiple instruction equivalent.

For example, a full ascending stack would have the notation FA appended to the load multiple instruction—LDMFA. This would be translated into an LDMDA instruction.

Example 20 The STMFD instruction pushes registers onto the stack, updating the sp. Figure shows a push onto a full descending stack. You can see that when the stack grows the stack pointer points to the last full entry in the stack. PRE r1 = 0x r4 = 0x sp = 0x STMFD sp!, {r1,r4} POST r1 = 0x sp = 0x c NOTE : Stack pointer points to the last full entry in the stack.

Example 21 In contrast, Next figure shows a push operation on an empty stack using the STMED instruction. The STMED instruction pushes the registers onto the stack but updates register sp to point to the next empty location. PRE r1 = 0x r4 = 0x sp = 0x STMED sp!, {r1,r4} POST r1 = 0x sp = 0x NOTE : SP to point to the next empty location.

Block Copy Addressing

Stack Examples r5 r4 r3 r1 r0 SP 0x400 0x418 0x3e8 r5 r4 r3 r1 r0 SP
STMFD sp!, {r0,r1,r3-r5} STMED sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP Old SP STMFA sp!, {r0,r1,r3-r5} 0x400 0x418 0x3e8 STMEA sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 Old SP Lowest register mapped to lowest memory address. ‘!’ causes stack pointer updated in all these cases. SP

Single Register Swap Instruction
The swap instruction is a special case of a load-store instruction. It swaps the contents of memory with the contents of a register. This instruction is an atomic operation—it reads and writes a location in the same bus operation, preventing any other instruction from reading or writing to that location until it completes. Swap cannot be interrupted by any other instruction or any other bus access. We say the system “holds the bus” until the transaction is complete.

Syntax: SWP{B}{<cond>} Rd,Rm,[Rn]
Rd <- [Rn], [Rn] <- Rm Rm Rd Rn 3 2 1 temp Memory

Example 21 The swap instruction loads a word from memory into register r0 and overwrites the memory with register r1. PRE mem32[0x9000] = 0x r0 = 0x r1 = 0x r2 = 0x SWP r0, r1, [r2] POST mem32[0x9000] = 0x r0 = 0x This instruction is particularly useful when implementing semaphores and mutual exclusion in an operating system. You can see from the syntax that this instruction can also have a byte size qualifier B, so this instruction allows for both a word and a byte swap.

SWP r0, r1, [r2] PRE mem32[0x9000] = 0x12345678 0X54233083 0X00009008
r0 = 0x r1 = 0x r2 = 0x 0X 0X 0X 0X 0X 0X 0X r0 0X 0X 0X r1 0X r2 STORE SWP r0, r1, [r2] LOAD POST mem32[0x9000] = 0x r0 = 0x r1 = 0x r2 = 0x 0X 0X 0X 0X 0X 0X 0x r0 0X 0X 0X r1 0X r2

Concept of SEMAPHORE In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment. A semaphore, in its most basic form, is a protected integer variable that can facilitate and restrict access to shared resources in a multi-processing environment. The two most common kinds of semaphores are counting semaphores and binary semaphores. Counting semaphores represent multiple resources, while binary semaphores, as the name implies, represents two possible states (generally 0 or 1; locked or unlocked).

A semaphore can only be accessed using the following operations: wait() and release().
wait() is called when a process wants access to a resource. This would be equivalent to the arriving customer trying to get an open table. If there is an open table, or the semaphore is greater than zero, then he can take that resource and sit at the table. If there is no open table and the semaphore is zero, that process must wait until it becomes available. signal() is called when a process is done using a resource, or when the patron is finished with his meal. The following is an implementation of this counting semaphore (where the value can be greater than 1):

In this implementation, a process wanting to enter its critical section it has to acquire the binary semaphore which will then give it mutual exclusion until it signals that it is done. For example, we have semaphore s, and two processes, P1 and P2 that want to enter their critical sections at the same time. P1 first calls wait(s). The value of s is decremented to 0 and P1 enters its critical section. While P1 is in its critical section, P2 calls wait(s), but because the value of s is zero, it must wait until P1 finishes its critical section and executes signal(s). When P1 calls signal, the value of s is incremented to 1, and P2 can then proceed to execute in its critical section (after decrementing the semaphore again). Mutual exclusion is achieved because only one process can be in its critical section at any time.

SWP r3, r2, [r1] ; hold the bus until complete CMP r3, #1 BEQ loop
Example 22 This example shows a simple data guard that can be used to protect data from being written by another task. The SWP instruction “holds the bus” until the transaction is complete. loop MOV r1, =semaphore MOV r2, #1 SWP r3, r2, [r1] ; hold the bus until complete CMP r3, #1 BEQ loop The address pointed to by the semaphore either contains the value 0 or 1. When the semaphore equals 1, then the service in question is being used by another process. The routine will continue to loop around until the service is released by the other process—in other words, when the semaphore address location contains the value 0.

Software Interrupt Instruction
Introduction The software interrupt instruction is used for calls to the operating system and is often called a 'supervisor call'. It puts the processor into supervisor mode and begins executing instructions from address 0x08. Binary encoding COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE 24 23

24-BIT (INTERPRETED) IMMEDIATE
Binary encoding COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE 24 23 Description The 24-bit immediate field does not influence the operation of the instruction but may be interpreted by the system code. If the condition is passed the instruction enters supervisor mode using the standard ARM exception entry sequence. In detail, the processor actions are: Save the address of the instruction after the SWI in r14_svc. Save the CPSR in SPSR_svc. Enter supervisor mode and disable IRQs (but not FIQs) by setting CPSR[4:0] to and CPSR[7] tol. Set the PC to 𝟎𝟖 𝟏𝟔 and begin executing the instructions there. To return to the instruction after the SWI the system routine must not only copy r14_svc back into the PC, but it must also restore the CPSR from SPSR_svc.

Syntax: SWI{<cond>} SWI_number

Example 23 Here we have a simple example of an SWI call with SWI number 0x123456, used by ARM toolkits as a debugging SWI. Typically the SWI instruction is executed in user mode. PRE cpsr = nzcVqift_USER pc = 0x lr = 0x003fffff; lr = r14 r0 = 0x12 0x SWI 0x123456 POST cpsr = nzcVqIft_SVC spsr = nzcVqift_USER pc = 0x lr = 0x Since SWI instructions are used to call operating system routines, you need some form of parameter passing. This is achieved using registers. In this example, register r0 is used to pass the parameter 0x12. The return values are also passed back via registers. Code called the SWI handler is required to process the SWI call. The handler obtains the SWI number using the address of the executed instruction, which is calculated from the link register lr.

Data Processing Instructions Branch Instructions Load-Store Instructions Software Interrupt Instruction Program Status Register Instructions(MSR, MRS) (Self Study!!!) Refer Steve Furber

Byte organizations Little-endian mode: Big-endian mode:
- with the lowest-order byte residing in the low-order bits of the word Big-endian mode: - the lowest-order byte stored in the highest bits of the word

Byte organizations

Thumb Mode Thumb is a 16-bit instruction set
Optimized for code density from C code Improved performance form narrow memory Subset of the functionality of the ARM instruction set Core has two execution states – ARM and Thumb Switch between them using BX instruction Thumb has characteristic features: Most Thumb instruction are executed unconditionally Many Thumb data process instruction use a 2-address format Thumb instruction formats are less regular than ARM instruction formats, as a result of the dense encoding.

Thumb has higher code density !
Code density: it is define as the space taken up in memory by an executable program. On average, a Thumb implementation of the same code takes up around 30% less memory than the equivalent ARM implementation. Figure 4.1 shows the same divide code routine implemented in ARM and Thumb assembly code. Even though the Thumb implementation uses more instructions, the overall memory footprint is reduced. Code density was the main driving force for the Thumb instruction set.

Thumb implementation uses more instructions, the overall memory footprint is reduced.
Code density was the main driving force for the Thumb instruction set. Because it was also designed as a compiler target, rather than for hand-written assembly code, we recommend that you write Thumb-targeted code in a high-level language like C or C++.

Thumb Register Usage In Thumb state, you do not have direct access to all registers. Only the low registers r0 to r7 are fully accessible. The higher registers r8 to r12 are only accessible with MOV, ADD, or CMP instructions. CMP and all the data processing instructions that operate on low registers update the condition flags in the cpsr.

Thumb Instruction Set (1/3)

Thumb Instruction Entry and Exit
T bit, bit 5 of CPSR If T = 1, the processor interprets the instruction stream as 16-bit Thumb instruction If T = 0, the processor interprets if as standard ARM instructions Thumb Entry ARM cores startup, after reset, execution ARM instructions Executing a branch and Exchange instruction (BX) Set the T bit if the bottom bit of the specified register was set Switch the PC to the address given in the remainder of the register Thumb Exit Executing a thumb BX instruction

ARM-Thumb Interworking
ARM-Thumb interworking is the name given to the method of linking ARM and Thumb code together for both assembly and C/C++. To call a Thumb routine from an ARM routine, the core has to change state. This state change is shown in the T bit of the cpsr. The BX and BLX branch instructions cause a switch between ARM and Thumb state while branching to a routine. The BX lr instruction returns from a routine, also with a state switch if necessary.

There are two versions of the BX or BLX instructions: an ARM instruction and a Thumb equivalent.
The ARM BX instruction enters Thumb state only if bit 0 of the address in Rn is set to binary 1; otherwise it enters ARM state. The Thumb BX instruction does the same. Syntax: BX Rn BLX Rn | label

Interworking Instructions
Interworking is achieved using the Branch Exchange instructions In Thumb state BX Rn In ARM state (on Thumb-aware cores only) BX<condition> Rn Where Rn can be any registers (R0 to R15) The performs a branch to an absolute address in 4GB address space by copying Rn to the program counter Bit 0 of Rn specifies the state to change to

Switching between States

Example 24 ;Start off in ARM state CODE32
ADR r0,Into_Thumb+1 ;generate branch target ;address & set bit 0 ;hence arrive Thumb state BX r0 ;branch exchange to Thumb … CODE16 ;assemble subsequent as Thumb Into_Thumb … ADR r5,Back_to_ARM ;generate branch target to ;word-aligned address, ;hence bit 0 is cleared. BX r5 ;branch exchange to ARM CODE32 ;assemble subsequent as ARM Back_to_ARM …

Summary

ARM data instructions ADD ADC SUB SBC RSB RSC MUL MLA Add
Add with carry Subtract Subtract with carry Reverse subtract ,RSB r0,r1,r2, r0=r2 – r1 Reverse subtract with carry Multiply Multiply and accumulate MLA r0,rl,r2,r3 ,r0=r1 x r2 + r3

ARM data instructions BIC r0,r1,r2 sets r0 to r1 and not r2
- uses the second source operand as a mask, a bit in mask is 1, the corresponding bit in first source operand is cleared AND ORR EOR BIC Bit-wise and Bit-wise or Bit-wise exclusive-or Bit clear

ARM data instructions LSL LSR ASL ASR ROR RRX
Logical shift left (zero fill) Logical shift right (zero fill) Arithmetic shift left Arithmetic shift right, copies the sign bit Rotate right Rotate right extended with C, performs a 33-bit rotate

ARM comparison instructions
only set the values of the NZCV bits CMP CMN TST TEQ Compare Negated compare, uses an addition to set the status bits Bit-wise test, a bit-wise AND Bit-wise negated test, an exclusive-or

ARM move instructions MOV MVN Move MOV r0,r1 ; r0=r1 Move negated
Mvn r0,r1 ; r0=not(r1)

ARM load-store instructions
LDR STR LDRH STRH LDRSH LDRB STRB ADR Load Store Load half-word Store half-word Load half-word signed Load byte Store byte Set register to address

C Assignments in ARM Instructions
x = (a + b) - c; using r0 for a, r1 for b, r2 for c, and r3 for x. registers for indirect addressing. Indirect r4 load values of a, b, and c into registers store value of x back to memory

C Assignments in ARM Instructions x = (a + b) - c;
ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b, using r4 LDR r1,[r4] ; load value of b ADD r3,r0,r1 ; set result for x to a + b ADR r4,c ; get address for c LDR r2,[r4] ; get value of c SUB r3,r3,r2 ; complete computation of x ADR r4,x ; get address for x STR r3,[r4] ; store x at proper location

y = a*(b + c); using r0 for both a and b, r1 for c, and r2 for y use r4 to store addresses for indirect addressing

C Assignments in ARM Instructions y = a*(b + c);
ADR r4,b ; get address for b LDR r0,[r4] ;get value of b ADR r4,c ; get address for c LDR r1,[r4] ; get value of c ADD r2,r0,r1 ; compute partial result of y=b+c ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MUL r2,r2,r0 ; compute final value of y=a*(b+c) ADR r4,y ; get address for y STR r2,[r4] ; store value of y at proper location

z = (a « 2) | (b & 15); using r0 for a and z, r1 for b, r4 for addresses

C Assignments in ARM Instructions z = (a « 2) | (b & 15);
ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,LSL 2 ; perform shift (a « 2) ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform logical AND (b & 15) ORR r1,r0,r1 ; compute final value of z ADR r4,z ; get address for z STR r1,[r4] ; store value of z

Advanced RISC Machines

Similar presentations

Presentation on theme: "Advanced RISC Machines"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced RISC Machines

Similar presentations

Presentation on theme: "Advanced RISC Machines"— Presentation transcript:

Similar presentations

About project

Feedback