Presentation is loading. Please wait.

Presentation is loading. Please wait.

ARM Programming CMPE 450/490 ©2010 Elliott, Durdle, Minderman

Similar presentations


Presentation on theme: "ARM Programming CMPE 450/490 ©2010 Elliott, Durdle, Minderman"— Presentation transcript:

1 ARM Programming CMPE 450/490 ©2010 Elliott, Durdle, Minderman
Portions courtesy of ARM, Greenhill

2 Integrated Development Environment
Greenhill’s MULTI IDE Integrated Development Environment

3 MULTI MULTI is a complete Integrated Development Environment (IDE)
Designed especially for embedded systems engineers To assist them in analyzing, editing, compiling, optimizing, and debugging embedded applications. The MULTI IDE includes graphical tools for each part of the software development process. IDE launcher MULTI Launcher -- The gateway to the MULTI IDE, Launch any of the primary MULTI tools, access open windows, and manage MULTI workspaces Editing tools MULTI Editor, Checkout Browser, Diff Viewer, Hex Editor Building tools MULTI Builder -- A graphical interface for managing and building projects CodeBalance -- A graphical interface for optimizing an executable for size or speed INTEGRATE -- A graphical utility for configuring tasks, connections, and kernel objects across multiple address spaces Linker Directives File Editor -- A graphical editor for creating and modifying linker directives files Debugging tools MULTI Debugger (multi) -- A graphical source-level debugger EventAnalyzer -- A graphical viewer for monitoring the complex real-time interactions ResourceAnalyzer -- A graphical viewer for monitoring the CPU and memory usage Script Debugger -- A graphical debugger for writing, recording, and debugging scripts Serial Terminal -- A serial terminal emulator for connecting to serial ports on embedded devices Miscellaneous and administrative tools

4 Launcher

5 MULTI Debugger (I) A powerful graphical debugger that supports source, assembly, and mixed-language debugging. Allows you to perform the following tasks quickly and easily: Browse, view, and search all aspects of your program code Download, execute, control, and debug embedded applications written in C, C++, FORTRAN, assembly, or a combination of these languages View and edit variables, pointers, structures, registers, and memory ranges Create, view, edit, and remove conditional breakpoints, View performance profiling, function profiling, memory allocation, code coverage, and stack trace information Interface seamlessly with the MULTI Editor and the MULTI Builder, or with third-party editors and compilers Perform multiprocess debugging through a single JTAG connection, even when those processes are running on multiple processors Perform non-intrusive field debugging of live systems Develop board setup scripts

6 MULTI Debugger (II)

7 Main Debugger Window (I)

8 Main Debugger Window (II)

9 Introduction to the ARM7 Microprocessor Architecture

10 ARM7 RISC CPU Architecture
Load/Store architecture Large Register Bank Typically thirty two 32 bit registers Fixed size for all instructions 32 bits long Pipelined execution Single cycle execution Orthogonal Instruction Set Hardwired instruction decode logic

11 ARM7 32-bit RISC Architecture
Von Neumann Enhanced RISC Architecture Three Stage Pipeline Fetch, Decode & Execute Conditional execution of every instruction 32-bit flat address space (4GB memory map) Most instructions execute in a single cycle. Combined ALU and shifter for high speed bit manipulation

12 ARM7 RISC Architecture (cont.)
Powerful multiple load and store instructions combined with auto-indexing addressing modes Block Copy Stack Manipulation Open instruction set extension via coprocessors

13 ARM Powered Products iPOD, Gameboy, Toshiba PDA, Samsung Video Recorder, etc.

14 ARM7 Block Diagram Von Neumann Architecture 3-stage pipeline
fetch, decode, execute 32-bit Data Bus 32-bit Address Bus 37 32-bit registers 32-bit ARM instruction set 16-bit THUMB instruction set 32x8 Multiplier Barrel Shifter

15 Pipeline Organization
3-stage pipeline: Fetch – Decode - Execute Three-cycle latency, one instruction per cycle throughput instruction i Fetch Decode Execute i+1 Fetch Decode Execute i+2 Fetch Decode Execute cycle t t+1 t+2 t+3 t+4

16 Pipeline Organization (2)
Pipeline flushed and refilled on branch, causing execution to slow down Special features in instruction set eliminate small jumps in code to obtain the best flow through pipeline

17 Operating Modes Seven operating modes: User Privileged:
System FIQ IRQ Abort Undefined Supervisor exception modes

18 Operating Modes (2) User mode: Exception modes:
Normal program execution mode System resources unavailable Mode changed by exception or software interrupt (trap instruction) Exception modes: Entered upon exception Full access to system resources Mode changed freely

19 Table 1 - Exception types, sorted by Interrupt Vector addresses
Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0x Undefined instruction Undefined 6 0x Software interrupt 0x Prefetch Abort Abort 5 0x C Data Abort 2 0x Interrupt IRQ 4 0x Fast interrupt FIQ 3 0x C Table 1 - Exception types, sorted by Interrupt Vector addresses

20 ARM Register Organization
General Purpose Registers User Mode FIQ Mode IRQ Mode Supervisor Mode Abort Mode Undef Mode

21 Thumb Code Compression
Thumb Code Example In C: Int iabs(intx) { if (x>=0) return x; else return -x; } In ARM Code CMP r0, #0 RSBLT r0, r0, #0 MOV PC, lr (12 bytes) In Thumb code BGE return NEG r0, r0 return MOV PC, lr (8 bytes 67%)

22 ARM7TM Block Diagram Thumb Features Thumb addresses code density
All Thumb instructions are 16 bits long Thumb may be viewed as a compressed form of a subset of the 32-bit ARM instruction set. Implementations of Thumb use dynamic compression in an ARM instruction pipeline. This logic translates the 16-bit Thumb instruction into its equivalent 32-bit ARM instruction. Decompression logic added without compromising cycle time or pipe line latency-Original ARM7 pipe line did very little work in phase one of the decode cycle. Programmer’s Model - r0-r7, r13, r15

23 Thumb Applications A typical early embedded system, e.g. a mobile phone, will include a small amount of fast 32-bit memory (to store speed-critical DSP code) and 16-bit off-chip memory to store the control code. Thumb code requires 70% of the space of the ARM code Thumb code uses 40% more instructions than ARM code With 32-bit memory, the ARM code is 40% faster than Thumb code With 16-bit memory, the Thumb code is 45% faster than ARM code Thumb code uses 30% less external memory power than ARM code

24 ARM7 Family MIPS

25 Code Examples

26 Example 1

27 Basic Arithmetic Operations
ADD r0, r1, r2 ;r0:= r1 + r2 ADC r0, r1, r2 ;r0:= r1 + r2 +C SUB r0, r1, r2 ;r0:= r1 - r2 SBC r0, r1, r2 ;r0:= r1 - r2 + C - 1 RSB r0, r1, r2 ;r0:= r2 – r1 RSC r0, r1, r2 ;r0:= r2 – r1 + C - 1

28 E.g. Add two 64 bit numbers X and Y and store in Z
Extended Precision E.g. Add two 64 bit numbers X and Y and store in Z Store X in r1:r0 and Y in r3:r2 and Z in r5:r4 ADDS r4, r0, r2 ;add least sig. word, result in r4 ADC r5, r1, r3 ; add most sig. word, result in r5

29 Operations with Shifts
ADD r3, r2, r1, LSL #3 ADD r5, r5, r3, LSL r2 ;Types of shift LSR, LSL, ASR, ROR, RRX

30 ARM Instructions I

31 Two instruction sets: Instruction Set ARM THUMB
Standard 32-bit instruction set THUMB 16-bit compressed form Code density better than most CISC Dynamic decompression in pipeline

32 Features: ARM Instruction Set Load / Store architecture
3-address data processing instructions Conditional execution Load / Store multiple registers Shift & ALU operation in single clock cycle

33 Conditional execution:
ARM Instruction Set (2) Conditional execution: Each data processing instruction prefixed by condition code Result – smooth flow of instructions through pipeline 16 condition codes: EQ equal MI negative HI unsigned higher GT signed greater than NE not equal PL positive or zero LS unsigned lower or same LE signed less than or equal CS unsigned higher or same VS overflow GE signed greater than or equal AL always CC unsigned lower VC no overflow LT signed less than NV special purpose

34 ARM Instruction Set (3)

35 Data Processing Instructions
Arithmetic and logical operations 3-address format: Two 32-bit operands (op1 is register, op2 is register or immediate) 32-bit result placed in a register Barrel shifter for operand2 allows full 32-bit shift within instruction cycle

36 Data Processing Instructions (2)
Arithmetic operations: ADD, ADDC, SUB, SUBC, RSB, RSC Bit-wise logical operations: AND, EOR, ORR, BIC Register movement operations: MOV, MVN Comparison operations: TST, TEQ, CMP, CMN

37 Data Processing Instructions
Conditional codes + Data processing instructions Barrel shifter = Powerful tools for efficient coded programs

38 Data Processing Instructions
e.g.: if (z==1) R1=R2+(R3*4) compiles to EQADDS R1,R2,R3, LSL #2 ( SINGLE INSTRUCTION ! )

39 Data Transfer Instructions
Load/store instructions Used to move signed and unsigned Word, Half Word and Byte to and from registers Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte

40 Block Transfer Instructions
Load / Store Multiple instructions (LDM / STM) Whole register bank or a subset copied to memory or restored with single instruction Mi Mi+1 Mi+2 Mi+14 Mi+15 LDM R0 R1 R2 R14 R15 STM

41 ARM Addressing Modes

42 Addressing Modes Immediate Addressing Absolute Addressing
The desired value is a binary value in the instruction Absolute Addressing The instruction contains the full binary address Indirect addressing The instruction contains the binary address of a memory location containing the binary address Base relative addressing Plus offset Plus index Plus scaled index Stack addressing During the normal flow of program execution, it is common for an event to occur requiring the microcontroller to stop what it’s doing and perform another task, such as read an A/D register, program a timer, or respond to an external event. The event that can cause program execution to stop and direct the core's attention to another task is called an interrupt. The microcontroller then redirects program flow to an Interrupt Service Routine, which is a piece of software written to respond to the specific interrupt event. A microcontroller may be executing the main loop of software, running tasks. When an interrupt occurs, program execution stops and the present state of the microcontroller, including the instruction being executed, are saved. When the interrupt service routine is finished, the program retrieves the state of the microcontroller that was stored before and resumes program execution at where it left off.

43 Used to load an immediate 8-bit value into a register
Immediate Addressing Used to load an immediate 8-bit value into a register e.g. mov r0, #0xFF Used to control the operation of the barrel shifter on the 3rd operand e.g. add r3, r2, r1 LSL#3 ;r3 := r2 + 8 x r1 External interrupts can occur from any source. Pins on the microcontroller, called interrupt pins, can alert the microcontroller of an event by a transition on the pin. On Atmel AT91 microcontrollers, an interrupt can be initiated by a high or low level on the interrupt pin, or by the pin changing from high to low, or from low to high. The interrupt signal can originate from an external peripheral or system.

44 Absolute Addressing To load an absolute address into a register
example: start: ldr r1, =address ldr r0, [r1] address: .word 0x Internal interrupts usually originate from one of the on-chip peripherals. Some examples are shown here. An A/D converter can interrupt the core when it is finished converting, so that the software may read and act on the data. A timer may generate an interrupt after it has completed measuring a period of time. There is also a special class of internal interrupt called a software interrupt.

45 Indirect Addressing ldr r0, [r1] ;r0:= mem32[r1]
str r0, [r1] ;mem32[r1] :=r0 The ARM7TDMI processor implements two physically independent sources of interrupt: The FIQ (Fast Interrupt Request) is designed to support a data transfer and has sufficient private registers to remove the need for register saving (thus minimising the overhead of context switching). FIQ may be disabled by setting the CPSR’s F flag. The IRQ (Interrupt Request) is a normal interrupt. IRQ has a lower priority than FIQ and is masked out when a FIQ sequence is entered. It may be disabled at any time by setting the I bit in the CPSR.

46 Base Plus Offset Addressing
ldr r0, [r1, #4] r1 is not altered Another form is ldr r0, [r1, #4]! ;r0:= mem32[r1+4] !==update ;r1 := r1+4 And another ldr r0, [r1], #4 ;r0 := mem32[r1] ;r1= r1+4 The AT91 microcontroller features the Advanced Interrupt Controller (AIC), an 8-level priority, individually maskable, vectored interrupt controller. Internal sources are programmed to be level sensitive or edge triggered. External sources can be programmed to be positive or negative edge triggered or high or low level sensitive.

47 Base Plus Index Addressing
ldr r1, =base ;load r1 with base address ldr r2, =index ;load r2 with and index ldr r0, [r1,r2] ;get data record into r0 The interrupt controller is connected to the NFIQ (fast interrupt request) and the NIRQ (standard interrupt request) inputs of the ARM7TDMI processor. The processor's NFIQ line can only be asserted by the external fast interrupt request input: FIQ. The NIRQ line can be asserted by the interrupts generated by the on-chip peripherals and the external interrupt sources.

48 Base Plus Scaled Index Addressing
ldr r1, =base ;load r1 with base address ldr r2, =index ;load r2 with and index ldr r0, [r1,r2, LSL #2] ;r0:= mem32[r1+4*r2] The Advanced Interrupt Controller (AIC) can have up to 32 interrupt sources. The interrupt sources are listed in this table.

49 Direct functionality of Block Data Transfer
When LDM / STM are not being used to implement stacks, it is clearer to specify exactly what functionality of the instruction is: i.e. specify whether to increment / decrement the base pointer, before or after the memory access. In order to do this, LDM / STM support a further syntax in addition to the stack one: STMIA / LDMIA : Increment After : int *p; t = p++; STMIB / LDMIB : Increment Before : ++p STMDA / LDMDA : Decrement After : p-- STMDB / LDMDB : Decrement Before: --p

50 Example: Block Copy Copy a block of memory, which is an exact multiple of 12 words long from the location pointed to by r12 to the location pointed to by r13. r14 points to the end of block to be copied. ; r12 points to the start of the source data ; r14 points to the end of the source data ; r13 points to the start of the destination data loop LDMIA r12!, {r0-r11} ; load 48 bytes STMIA r13!, {r0-r11} ; and store them CMP r12, r14 ; check for the end BNE loop ; and loop until done This loop transfers 48 bytes in 31 cycles Over 50 Mbytes/sec at 33 MHz r13 r14 r12 Increasing Memory Using r12, r13 + r14 as pointers leaves r0-r11 for usage in the block copy Would need to have stored r0-r12 plus r14 onto the stack (so can restore original values when copy finished. Also store r13 to known word in memory so can restore that also. Using Increment After addressing 4 bytes per register (12 registers) => 48 bytes per iteration LDM - 14 cycles STM - 13 cycles CMP - 1 cycle BNE - 3 cycles Total = 31 cycles to move 48 bytes

51 Stacks A stack is an area of memory which grows as new data is “pushed” onto the “top” of it, and shrinks as data is “popped” off the top. Two pointers define the current limits of the stack. A base pointer used to point to the “bottom” of the stack (the first location). A stack pointer used to point the current “top” of the stack. PUSH {1,2,3} 1 2 3 BASE SP POP Result of pop = 3 BASE SP 1 2 SP BASE

52 Stack Operation Traditionally, a stack grows down in memory, with the last “pushed” value at the lowest address. The ARM also supports ascending stacks, where the stack structure grows up through memory. The value of the stack pointer can either: Point to the last occupied address (Full stack) and so needs pre-decrementing (i.e. before the push) Point to the next occupied address (Empty stack) and so needs post-decrementing (i.e. after the push) The stack type to be used is given by the postfix to the instruction: STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack. STMED / LDMED : Empty Descending stack STMEA / LDMEA : Empty Ascending stack

53 Stack Examples 0x418 0x400 0x3e8 SP SP r5 r5 r4 r3 r4 r1 r3 r1 r0 r5
STMFD sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP Old SP STMED sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP Old SP r5 r4 r3 r1 r0 STMFA sp!, {r0,r1,r3-r5} SP Old SP 0x400 0x418 0x3e8 STMEA sp!, {r0,r1,r3-r5} r5 r4 r3 r1 r0 SP Old SP Lowest register mapped to lowest memory address. ‘!’ causes stack pointer updated in all these cases.

54 Stacks and Subroutines
One use of stacks is to create temporary register workspace for subroutines. Any registers that are needed can be pushed onto the stack at the start of the subroutine and popped off again at the end so as to restore them before return to the caller : STMFD sp!,{r0-r12, lr} ; stack all registers ; and the return address LDMFD sp!,{r0-r12, pc} ; load all the registers ; and return automatically


Download ppt "ARM Programming CMPE 450/490 ©2010 Elliott, Durdle, Minderman"

Similar presentations


Ads by Google