ARM System - On - Chip Architecture INTRODUCTION ARM is a RISC processor. It is used for small size and high performance applications. Simple architecture – low power consumption. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture TIMELINE (1/2) 1985: Acorn Computer Group manufactures the first commercial RISC microprocessor. 1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A.R.M.). 1991: ARM6, First embeddable RISC microprocessor. 1992 – 1994: Various companies use ARM (Sharp, Shamsung), while in 1993 ARM7, the first multimedia microprocessor is introduced. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture TIMELINE (2/2) 1995: Introduction of Thumb and ARM8. 1996 – 2000: Alcatel, Huindai, Philips, Sony, use ΑRM, while in 1999 η ARM cooperates with Erickson for the development of Bluetooth. 2000 – 2002: ARM’s share of the 32 – bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced. ARM System - On - Chip Architecture
THE ARM ARCHITECTURE
ARM System - On - Chip Architecture GENERAL INFO (1/2) AIM: Simple design Load – store architecture 32 bit data bus 3 addressing modes ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Γενικά (2/2) Simple architecture + Simple instruction set + Code density Small size Low power consumption ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Registers 32 general purpose registers 7 modes of operation Different set of visible registers and different cpsr control level in each mode. ARM System - On - Chip Architecture
Οι ορατοί καταχωρητές του ARM usable in user mode r1 r2 r3 system modes only r4 r5 r6 r7 r8_fiq r8 r9_fiq r9 r10_fiq r10 r1 1_fiq r1 1 r13_und r12_fiq r13_irq r12 r13_abt r13_svc r14_und r13_fiq r14_irq r13 r14_svc r14_abt Abort mode: Όταν απέτυχε η προσπάθεια να γίνει πρόσβαση σε μνήμη Supervisor mode: Αμέσως μετά το reset, λειτουργικό σύστημα kernel System mode: Special user mode, full access to cpsr Undefined mode: When undefined instruction, or instruction not supported by implementation r14_fiq r14 r15 (PC) SPSR_und SPSR_abt SPSR_irq CPSR SPSR_svc SPSR_fiq fiq svc abort irq undefi ned user mode mode mode mode mode mode
ARM System - On - Chip Architecture CPSR N: Negative Z: Zero C: Carry V: Overflow Q: Saturation (for enhanced DSP instructions) ARM CPSR format ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Memory Organization Address bus: 32 – bits 1 word = 32 – bits ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Instruction Set Three instruction types Data processing Data transfer Control flow ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Supervisor mode In user mode the operating system handles operations outside user privileges. Using “supervisor calls”, the user goes to system level and can perform system functions. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture I/O System ARM handles peripherals as “memory mapped devices with interrupt support”. Interrupts: IRQ: normal interrupt FIQ: fast interrupt ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Exceptions Exceptions: Interrupts Supervisor Call Traps When an exception takes place: The value of PC is copied to r14_exc The operating mode changes into the respective exception mode. The PC takes the exception handler vector address. Exception: Internally generated error Trap: Software interrupt (supervisor mode) ARM System - On - Chip Architecture
ARM programming model usable in user mode system modes only fiq svc r8_fiq r8 r9_fiq r9 r10_fiq r10 r1 1_fiq r1 1 r13_und r12_fiq r13_irq r12 r13_abt r13_svc r14_und r13_fiq r14_irq r13 r14_svc r14_abt r14_fiq r14 r15 (PC) SPSR_und SPSR_abt SPSR_irq CPSR SPSR_svc SPSR_fiq fiq svc abort irq undefi ned user mode mode mode mode mode mode
THE ARM INSTRUCTION SET
Data Processing Instructions (1/2) Arithmetic Operations ADD r0, r1, r2 ; r0:= r1+r2 and don’t update flags ADDS r0, r1, r2 ; r0:= r1+r2 and update flags Logical Operations AND r0, r1, r2 ; r0:= r1 AND r2 Register Movement MOV r0, r2 Comparison CMP r1, r2 ARM System - On - Chip Architecture
Data Processing Instructions (2/2) Operands: Immediate operands ADD r3, r3, #1 Shifted register operands: ADD r3, r2, r1, LSL #3 Miscellaneous data processing instructions: Multiplication: MUL r4, r3, r2 SMULL, UMULL ARM System - On - Chip Architecture
Data transfer instructions Load and store instructions: LDR r0, [r1] STR r0, [r1] Offset: LDR r0, [r1,#4] Post – indexed: LDR r0, [r1], #16 Auto – indexed: LDR r0, [r1,#16]! Multiple data transfers: LDMIA r1, {r0,r2,r5} ARM System - On - Chip Architecture
Control flow instructions Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label BL loop … … Loop … … MOV PC, r14 ; επιστροφή ARM System - On - Chip Architecture
ARM ORGANIZATION AND IMPLEMENTATION
3 – Stage Pipeline (ARM7 – 80MHz) Fetch Decode Execute Throughput: 1 instruction / cycle
ARM System - On - Chip Architecture 5 – stage pipeline (1/2) Program execution time: Ways to reduce : Increase Logic simplification Reduce CPI reduce the number of multicycle instructions. ARM System - On - Chip Architecture
5 – stage pipeline (ARM9-150MHz) (2/2) Fetch Decode Execute Buffer / Data Write - Back ARM10 – 6 STAGE 260 MHz ARM11 – 8 STAGE 335 MHz
ARM coprocessor interface O ARM supports upto 16 coprocessors, which can be software emulated. Each coprocessor has upto 16 general-purpose registers ARM is a load and store architecture. Coprocessors usually handle on – chip functions, such as cache and memory management. ARM System - On - Chip Architecture
ARCHITECTURAL SUPPORT FOR HIGH – LEVEL LANGUAGES
Floating - point accelerator (1/2) For floating-point operations, ARM has the FPE software emulator and the FPA 10 hardware floating – point accelerator. FPA 10 includes: Coprocessor interface Load / store unit Register bank ( 8 registers 80 – bit ) ALU (adder, mult, div) ARM System - On - Chip Architecture
Floating - point accelerator (2/2) ARM System - On - Chip Architecture
ARM System - On - Chip Architecture APCS (1/2) APCS (ARM Procedure Call Standard) is a set of rules concerning C procedure input and output. Specific use of general purpose registers. (r0 – r4: arguments, r4 – r8 variables, r10 stack limit, etc. ) Procedure I/O: BL Loop … Loop … MOV pc, lr ARM System - On - Chip Architecture
ARM System - On - Chip Architecture APCS (2/2) C code void f1(int a) { f2(a); } Assembly code f1 LDR r0, [r13] STR r13!, [r14] STR r13!, [r0] BL f2 SUB r13,#4 LDR r13!, r15 16 8 4 Stack pointer ARM System - On - Chip Architecture
THUMB PROGRAMMER’S MODEL
ARM System - On - Chip Architecture General information Thumb objective: Code density. Thumb has a 16 – bit instruction set. A subset of the ARM instruction set is coded to a 16–bit space With appropriate use great benefits can be achieved in terms of Power efficiency Enhanced performance ARM System - On - Chip Architecture
Going in and out of Thumb mode Using the BX instruction, in ARM state: e.g. ΒΧ r0 Commands are assembled as 16 – bit instructions with the appropriate directive If r0[0] is 1, the T bit in the CPSR becomes 1 and the PC is set to the address obtained from the remaining bits of r0. Using the BX instruction from Thumb state, we return to ARM state. ARM System - On - Chip Architecture
The Thumb programmer’s model Thumb registers ARM System - On - Chip Architecture
ARM System - On - Chip Architecture ARM vs. Thumb (1/3) Thumb Upto 70% code size reduction 40% more instructions. 45% faster code with 16-bit memory Requires about 30% less external memory ARM 40% faster code when coupled with a 32-bit memory ARM System - On - Chip Architecture
ARM System - On - Chip Architecture ARM vs. Thumb (2/3) If performance is critical: ARM If cost and power consumption are critical: Thumb ARM System - On - Chip Architecture
ARM and Τhumb interaction A 32 – bit ARM system can go into Thumb mode for specific routines, in order to meet power and memory constraints. A 16 – bit system: Can use an on – chip, 32 – bit memory for ARM state routines, and a 16-bit off – chip memory and Thumb code for the rest of the application. ARM System - On - Chip Architecture
ARCHITECTURAL SUPPORT FOR SYSTEM DEVELOPMENT
The ARM memory interface A basic ARM memory system
ARM System - On - Chip Architecture AMBA (1/4) Advanced Microcontroller Bus Architecture Advanced High – Performance Bus Advanced System Bus Advanced Peripheral Bus AMBA objectives: Technology – independence To encourage modular system design Supports CPUs, memories and peripherals integrated in a SoC. AHB: pipelining, burst transfers, multiple masters APB: bridge required ARM System - On - Chip Architecture
ARM System - On - Chip Architecture AMBA (2/4) A typical AMBA – based system ARM System - On - Chip Architecture
ARM System - On - Chip Architecture AMBA (3/4) AHB bus Burst transaction Split transaction Data bus 64 – 128 bit ARM System - On - Chip Architecture
ARM System - On - Chip Architecture AMBA (4/4) AMBA Design Kit (ADK) An environment that assists designers in developing ΑΜΒΑ based components και SoC designs. ARM System - On - Chip Architecture
Signal Processing Support (1/2) Piccolo DSP coprocessor. Various data memories for maximizing throughput. ARM System - On - Chip Architecture
Signal Processing Support (2/2) Piccolo
MEMORY HIERARCHY
ARM System - On - Chip Architecture Memory hierarchy Larger size Lower speed Memory type Size Speed Registers 32 – bit A few nsec On – chip cache 8 – 32kbytes 10 nsec Off – chip cache 100 – 200 kbytes 10 – 30 nsec RAM Mbytes 100 nsec ARM System - On - Chip Architecture
ARM System - On - Chip Architecture On – chip memory Necessary for performance Some system prefer RAM to on – chip cache. Simpler, cheaper and less power-hungry. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Cache types Cache types: Unified cache. Separate instruction and data caches. Performance: hit rate – miss rate Compulsory miss: first time and address is accessed Capacity miss: When cache full Conflict miss: Two addresses compete for the same place in the cache Ταυτόχρονα ζητείται το δεδομένο και από cache και κυρίως μνήμη. Μειονέκτημα cache: Προβλεψιμότητα χρόνου εκτέλεσης ARM System - On - Chip Architecture
Replacement policy -implementation Least Recently Used (LRU) Least Frequently Used (LFU) Data prediction Fully-associative Direct-mapped Set-associative ARM System - On - Chip Architecture
Direct – mapped cache (1/2) A line of data stored in a tag of memory ARM System - On - Chip Architecture
Direct – mapped cache (2/2) Each memory location has a specific place in the cache. Tag and data can be accessed at the same time. Tag RAM smaller than data RAM and has a smaller access time allowing the comparison to complete before accessing the data RAM. ARM System - On - Chip Architecture
2 – way set – associative cache. (1/3)
Set associative cache (2/3) A set – associative cache has a number of sets yielding n – way associative cache. Two addresses that would be competing for the same spot in a direct mapped cache, can be stored in different locations and accessed independently. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Set associative (3/3) Set selection: Random allocation Least recently used (LRU) Round – robin (cyclic) ARM System - On - Chip Architecture
Fully associative (1/2)
ARM System - On - Chip Architecture Write strategies Write – through All write operations are passed to main memory Write – through with buffered write Write operations are passed to main memory through the write buffer Copy – back (write – back) Write operations update only the cache. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Cache feature summary ARM System - On - Chip Architecture
‘Perfect’ cache performance ARM System - On - Chip Architecture
ARM System - On - Chip Architecture MMU (1/3) Two memory management approaches: Segmentation Paging MMU χαρτογραφούν την μνήμη (αντιστοιχούν ιδεατές διευθύνσεις σε φυσικές διευθύνσεις). ARM System - On - Chip Architecture
ARM System - On - Chip Architecture MMU (2/3) Segmented memory management: ARM System - On - Chip Architecture
ARM System - On - Chip Architecture MMU (3/3) Paging memory management: ARM System - On - Chip Architecture
ARCHITECTURAL SUPPORT FOR OPERATING SYSTEMS
ARM System - On - Chip Architecture CP15 On – chip coprocessor for MMU, cache, protection unit control. Control takes place through registers with instructions executed in supervisor mode. ARM System - On - Chip Architecture
ARM System - On - Chip Architecture Protection Unit Simpler alternative to the MMU. Requires simpler software and hardware. Does not use translation tables, but 8 protection regions instead. ARM System - On - Chip Architecture
ARM DEVELOPER SUITE
ARM System - On - Chip Architecture ARMULATOR (1/2) Armulator: Emulator of various ARM processors. Allows project development in C, C++ or Assembly. It includes debugger, compilers, assembler and this entire set is called ARM Developer Suite (ADS). ARM System - On - Chip Architecture
ARM System - On - Chip Architecture ARMULATOR (2/2) Possible project options: ARM and Thumb Interworking Mixing C, C++ and Assembly Code for ROM Exception handlers MM ARM System - On - Chip Architecture
ARM System - On - Chip Architecture ARMULATOR TUTORIAL CODEWARRIOR ENVIRONMENT ARM System - On - Chip Architecture
ARM System - On - Chip Architecture
ARM System - On - Chip Architecture
ARM System - On - Chip Architecture
ARM System - On - Chip Architecture
ARM System - On - Chip Architecture