ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION – ARM
The ARM architecture processors popular in Mobile phone systems
ARM (Advanced RISC Machine) Features ARM has 32-bit architecture but supports 16 bit or 8 bit data types also. ARM is programmable as little endian or big endian data alignment in memory. ARM provides the advantage of using a CISC in terms of functionality, along with the advantage of an RISC in terms of faster program implementation as well as reduced code lengths. ARM processor has an RISC core for processing Combination of RISC and CISC features - ARM supports to a complex addressing modes based instruction set
In-built compilation unit Compiles the CISC instructions into RISC formats, which are then implemented by the RISC core of the processor. Internally the implementation for many instructions is like in an RISC (without the micro-programmed unit) Jazelle technology Faster Java codes execution
ARM Thumb 16-bit instructions Thumb Set designed for 16-bit word lengths and instructions, which internally executes by same 32-bit core. Instruction fetch of 2 bytes in Thumb mode in place of 4 bytes in ARM mode. Data alignment at steps of 2 bytes in Thumb mode in place of 4 bytes in ARM mode Memory savings of up to 35%, over the equivalent 32-bit code, while retaining all the benefits of a 32-bit system (such as access to a full 32-bit address space). Enables 32-bit performance at the 8/16-bit system cost in terms of memory needs.
Thumb and 32-bit ARM modes Switch from one mode to another No overheads (in terms of time and memory) in moving between Thumb and the normal ARM state of the codes. Two states are compatible on a normal basis. Gives code designer complete control over performance and code-size optimization.
ARM7 versions ARM7TDMI® (Integer Core) ARM7TDMI-S™ (Synthesisable version of ARM7TDMI) ARM7EJ-S™ (Synthesisable core with DSP and Jazelle technology) ARM720T™ (cached processor macrocell , 8K Cached Core with Memory Management Unit (MMU) supporting operating systems including Windows CE, Palm OS, Symbian OS and Linux) 130 MIPS using Dhrystone 2.1 benchmark in typical 0.13μm process
ARM9 versions ARM920T (Dual 16k caches with MMU support multiple OSs. ARM922T (Dual 8k caches for applications support multiple OSs. ARM940T™ (Dual 4k caches for embedded control applications running a RTOS) 32-bit RISC processor core Super scaling 5-stage integer pipeline. 8-entry write buffers to avoid blocking the processor on external memory writes Achieves 1.1 MIPS/MHz, 300 MIPS (Dhrystone 2.1) in a typical 0.13μm process
ARM11 versions Families with ARMv6 instruction set architecture that includes the Thumb® extensions for code density, Jazelle™ technology for Java™ acceleration, ARM DSP extensions, and SIMD media processing extensions. MMU supporting operating systems and palm OS 32-bit RISC processor core with 8-stage integer pipeline, static and dynamic branch prediction, and separate load-store and arithmetic pipelines to maximize instruction throughput Targets a performance range of Dhrystone MIPS 400 to 1200
Memory Architecture ARM7 has Princeton memory architecture. ARM9 processor has Harvard architecture
Faster implementation and Reduced code lengths Due to the instant availability of the register word to the execution-unit. Reduced code lengths─ Most instructions use registers as operands. Few bits in the instruction specify a register as operand. 8, 16 or 32 bits specify a memory address as operand and the displacement bits in the instruction
ARM registers R0 to R15. R15 also function as program counter. R14 function as link register. R13 may be used as stack pointer. CPSR (current program status register). SPSR (saved program status register).
Processor Modes User : unprivileged mode under which most tasks run The ARM has seven basic operating modes: User : unprivileged mode under which most tasks run FIQ : entered when a high priority (fast) interrupt is raised IRQ : entered when a low priority (normal) interrupt is raised Supervisor : entered on reset and when a Software Interrupt instruction is executed Abort : used to handle memory access violations Undef : used to handle undefined instructions System : privileged mode using the same registers as user mode The Programmers Model can be split into two elements - first of all, the processor modes and secondly, the processor registers. So let’s start by looking at the modes. Now the typical application will run in an unprivileged mode know as “User” mode, whereas the various exception types will be dealt with in one of the privileged modes : Fast Interrupt, Supervisor, Abort, Normal Interrupt and Undefined (and we will look at what causes each of the exceptions later on). NB - spell out the word FIQ, otherwise you are saying something rude in German! One question here is what is the difference between the privileged and unprivileged modes? Well in reality very little really - the ARM core has an output signal (nTRANS on ARM7TDMI, InTRANS, DnTRANS on 9, or encoded as part of HPROT or BPROT in AMBA) which indicates whether the current mode is privileged or unprivileged, and this can be used, for instance, by a memory controller to only allow IO access in a privileged mode. In addition some operations are only permitted in a privileged mode, such as directly changing the mode and enabling of interrupts. All current ARM cores implement system mode (added in architecture v4). This is simply a privileged version of user mode. Important for re-entrant exceptions because no exceptions can cause system mode to be entered.
The ARM Register Set Current Visible Registers Banked out Registers r15 (pc) cpsr r13 (sp) r14 (lr) spsr r8 r9 r10 r11 r12 Current Visible Registers Banked out Registers User IRQ SVC Undef Abort FIQ Mode SVC Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers User FIQ IRQ Undef Abort Abort Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers User FIQ IRQ SVC Undef Undef Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers User FIQ IRQ SVC Abort r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr spsr FIQ IRQ SVC Undef Abort User Mode Current Visible Registers Banked out Registers IRQ Mode r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r15 (pc) cpsr r13 (sp) r14 (lr) spsr Current Visible Registers Banked out Registers User FIQ SVC Undef Abort This animated slide shows the way that the banking of registers works. On the left the currently visible set of registers are shown for a particular mode. On the right are the registers that are banked out whilst in that mode. Each key press will switch mode: user -> FIQ ->user -> IRQ -> user ->SVC -> User -> Undef -> User -> Abort and then back to user. The following slide then shows this in a more static way that is more useful for reference
The Registers 1 dedicated program counter ARM has 37 registers all of which are 32-bits long. 1 dedicated program counter 1 dedicated current program status register 5 dedicated saved program status registers 30 general purpose registers The current processor mode governs which of several banks is accessible. Each mode can access a particular set of r0-r12 registers a particular r13 (the stack pointer, sp) and r14 (the link register, lr) the program counter, r15 (pc) the current program status register, cpsr Privileged modes (except System) can also access a particular spsr (saved program status register) The ARM architecture provides a total of 37 registers, all of which are 32-bits long. However these are arranged into several banks, with the accessible bank being governed by the current processor mode. We will see this in more detail in a couple of slides. In summary though, in each mode, the core can access: a particular set of 13 general purpose registers (r0 - r12). a particular r13 - which is typically used as a stack pointer. This will be a different r13 for each mode, so allowing each exception type to have its own stack. a particular r14 - which is used as a link (or return address) register. Again this will be a different r14 for each mode. r15 - whose only use is as the Program counter. The CPSR (Current Program Status Register) - this stores additional information about the state of the processor: And finally in privileged modes, a particular SPSR (Saved Program Status Register). This stores a copy of the previous CPSR value when an exception occurs. This combined with the link register allows exceptions to return without corrupting processor state.
Program Status Registers 27 31 N Z C V Q 28 6 7 I F T mode 16 23 8 15 5 4 24 f s x c U n d e f i n e d J Condition code flags N = Negative result from ALU Z = Zero result from ALU C = ALU operation Carried out V = ALU operation oVerflowed Sticky Overflow flag - Q flag Architecture 5TE/J only Indicates if saturation has occurred J bit Architecture 5TEJ only J = 1: Processor in Jazelle state Interrupt Disable bits. I = 1: Disables the IRQ. F = 1: Disables the FIQ. T Bit Architecture xT only T = 0: Processor in ARM state T = 1: Processor in Thumb state Mode bits Specify the processor mode Green psr bits are only in certain versions of the ARM architecture ALU status flags (set if "S" bit set, implied in Thumb state). Sticky overflow flag (Q flag) is set either when saturation occurs during QADD, QDADD, QSUB or QDSUB, or the result of SMLAxy or SMLAWx overflows 32-bits Once flag has been set can not be modified by one of the above instructions and must write to CPSR using MSR instruction to cleared PSRs split into four 8-bit fields that can be individually written: Control (c) bits 0-7 Extension (x) bits 8-15 Reserved for future use Status (s) bits 16-23 Reserved for future use Flags (f) bits 24-31 Bits that are reserved for future use should not be modified by current software. Typically, a read-modify-write strategy should be used to update the value of a status register to ensure future compatibility. Note that the T/J bits in the CPSR should never be changed directly by writing to the PSR (use the BX/BXJ instruction to change state instead). However, in cases where the processor state is known in advance (e.g. on reset, following an interrupt, or some other exception), an immediate value may be written directly into the status registers, to change only specific bits (e.g. to change mode). New ARM V6 bits now shown.
Program Counter (r15) All instructions are 32 bits wide When the processor is executing in ARM state: All instructions are 32 bits wide All instructions must be word aligned Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte aligned). When the processor is executing in Thumb state: All instructions are 16 bits wide All instructions must be halfword aligned Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned). When the processor is executing in Jazelle state: All instructions are 8 bits wide Processor performs a word access to read 4 instructions at once ARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary, halfword accesses must be on a halfword address boundary. This includes instruction fetches. Point out that strictly, the bottom bits of the PC simply do not exist within the ARM core - hence they are ‘undefined’. Memory system must ignore these for instruction fetches. In Jazelle state, the processor doesn’t perform 8-bit fetches from memory. Instead it does aligned 32-bit fetches (4-byte prefetching) which is more efficient. Note we don’t mention the PC in Jazelle state because the ‘Jazelle PC’ is actually stored in r14 - this is technical detail that is not relevant as it is completely hidden by the Jazelle support code.
ARM Codes ARM Codes─ Forward compatible with higher versions. ARM7 codes ─ Forward compatible with ARM9, ARM9E and ARM10 processors as well as Intel XScale micro-architecture. ARM9E and ARM 10 families use a Vector Floating Point (VFP) ARM coprocessor, which adds full floating point operands. VFP also provides fast development in SoC design when using tools like MatLab®. Applications are in image processing (scaling), 2D and 3D transformations, font generation and digital filters.
ARM Intelligent Energy Manager (IEM) technology Advanced algorithms to optimally balance processor workload and energy consumption. Maximizes system responsiveness. IEM works with the operating system and mobile OS. Application running on a mobile phone dynamically adjusts the required CPU performance level.
ARM processors AHB (AMBA Advanced High Performance Bus) interface AMBA an established open source specification for on-chip interconnects. AMBA serves as a framework for SoC designs and development of the IP library. AHB support in all new ARM cores. Provides a high-performance and fully synchronous back plane. (Back plane means additional set of controllers, which can access another common bus, which is distinct from system bus in a multilevel buses in the system.) Multi-layer AHB in version ARM926EJ-S and all members of the ARM10 family represents a significant advancement. It reduces access latencies and increases the bandwidth available to multi-master systems
ARM Instruction Set Features Two Instruction Sets─ 16-bit Thumb and 32-bit ARM mode instructions Operations on 8-bit or 16-bit or 32-bit data types Data alignment in memory: Two byte words in Thumb set and Four in 32-bit ARM mode
ARM7 instruction set: Data Transfer Instructions Register-load a byte (LDRB). Register- byte store (STRB). Register Half Word store (STRH). [A word in ARM is of 32 bits]. Register-load Half Word as such or signed (LDRH or LDRSH). Instructions for transfer between the register memories. The memory address is as per a register used as index or index-relative or post auto-index addressing mode. Register-load a word (LDR). Register-word stores a word (STR). Set a memory address into a register (ADR). Address is of 12 bits. [Alternative for 16 bits address setting in a register is using any register or r15 in an arithmetic operation].
Word transfer between registers Move (MOV). Move reverse (MVR). Load or move or store instruction conditionally implementation Conditions─ signed number LT(Less Than), GT(Greater Than), LE(Less or Equal), EQ(Equal), NE (not equal), VS (overflow), VC (no overflow), GE Conditions─ unsigned number HI (higher), LS (lower), PL (plus, nor Negative), MI (minus), CC (carry bit reset), and CS (carry bit set). Example: MOVLT r3, #10. Immediate operand 10 to r3 provided a previous instruction for comparison showed the first source as less than the second.
Bit Transfer or Manipulation Instructions Register- bits Logical Left Shift (LSL). Register- bits Logical Left arithmetic Shift (ASL). Register- bits Logical Right Shift (LSR). Register- bits Logical Right arithmetic Shift (ASR). Register- bits Rotate Right (ROR). Register bits Rotate Right with carry also extended for rotating (RRX).
Arithmetical Instructions Three operands from the registers. One source may however, be by immediate operand addressing in addition and subtraction . Add without carry two words and the result is in the third operand (ADD). Add with carry two words and the result is in the third operand (ADC). Subtract without carry two words and the result is in the third operand (SUB). [Carry bit used as borrow.] Subtract with carry two words and the result is in the third operand (SBC).
Arithmetical Instructions Subtract reverse (second source with the first) without carry two words and the result is in the third operand (RSB). [Carry bit used as borrow.] Subtract reverse with carry two words and the result is in the third operand (RSC). Multiply two different registers and the result is in the destined register (MUL). Multiply two source registers and add the result with the third source register and accumulate the new result in a destined register. (MLA).
Logic Instructions Bit wise OR two words and the result is in the third operand. (ORR). Bit wise AND two words and the result is in the third operand. (AND). Bit wise Exclusive OR two words and the result is in the third operand. (EOR). Clear a Bit (BIC). [There is one source for the bits; a second source for the mask and the result is at the third operand.]
Arithmetical or logical instruction conditional implementation Example, SUBGE r1, r3, r5. The operand from r3 is subtracted from r5 if the GE condition resulted earlier (N and V status bits equal on comparison of two signed numbers). Conditions can be the results of a comparison or test
Compare and Test Instructions The result destines to CPSR, which stores the four condition bits, N, V, C, and Z. Bit wise Test two words (TST). Bit wise Negated Test between two words (TEQ). Compare two words and the result is at the CPSR condition bits (CMP). Compare two negative words and the result is at the CPSR condition bits (CMN).
Program-Flow Control Instructions Branching (B) or Branch conditional operations. Branch to an address relative to PC word in r15 (B) 'B #1A8' means add in PC 1A8 and change the program flow. 'BGE #100' means that if a GE condition resulted on a compare 0 test, add in PC 1A8. Similar instructions for different conditions of the processor status flags
Software Interrupt instruction SWI has 8-bit opcode and remaining bits are not used by processor Give single vector address of the ISR for SWI. Remaining bits in SWI backtracked by programmer to compute ISR and ISR parameter pointers This unique feature permits handling large number of SWIs required in the OS and application functions or threads or tasks