Embedded Systems Architecture


Similar presentations
ECE 353 Introduction to Microprocessor Systems Michael G. Morrow, P.E. Week 4.

CPU Structure and Function
Computer Science Education
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller.
ARM versions ARM architecture has been extended over several versions.
Embedded Systems Programming
Appendix D The ARM Processor
Overheads for Computers as Components 2nd ed.
THUMB Instructions: Branching and Data Processing
Multiplication – Microprocessor
© 2000 Morgan Kaufman Overheads for Computers as Components ARM instruction set zARM versions. zARM assembly language. zARM programming model. zARM memory.
Chapter 2 Instruction Sets 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides)
Machine Instructions Operations
1 ECE 5465 Advanced Microcomputers Group 11: Brian Knight Benjamin Moore Alex Williams.
Embedded System Design Center ARM7TDMI Microprocessor Data Processing Instructions Sai Kumar Devulapalli.
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Load and store instruction.
COMP3221 lec9-logical-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 9: C/Assembler Logical and Shift - I
ARM Microprocessor “MIPS for the Masses”.
Embedded Systems Programming ARM assembler. Creating a binary from assembler source arm=linux-as Assembler Test1.S arm-linux-ld Linker Arm-boot.o Executable.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
ARM programmer’s model and assembler Embedded Systems Programming.
Topics covered: ARM Instruction Set Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
ARM 7 Datapath. Has “BIGEND” input bit, which defines whether the memory is big or little endian Modes: ARM7 supports six modes of operation: (1) User.
Embedded Systems Programming
Prardiva Mangilipally
ARM Instructions I Prof. Taeweon Suh Computer Science Education Korea University.
The ARM Programmer’s Model
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Thumb Instruction Set.
Topic 8: Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Winter 2010 Prof. Ryan Kastner Dept. of Computer Science and.
Machine Instruction Characteristics
Lecture 2: Basic Instructions CS 2011 Fall 2014, Dr. Rozier.
Lecture 4. ARM Instructions #1 Prof. Taeweon Suh Computer Science Education Korea University ECM586 Special Topics in Embedded Systems.
Lecture 4. ARM Instructions Prof. Taeweon Suh Computer Science & Engineering Korea University COMP427 Embedded Systems.
ARM7TDMI Processor. 2 The ARM7TDMI processor is a member of the Advanced RISC machine family of general purpose 32-bit microprocessor What does mean ARM7TDMI.
1 Chapter 4 ARM Assembly Language Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
Lecture 2: Advanced Instructions, Control, and Branching EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer.
Unit-2 Instruction Sets, CPUs
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Assembly Variables: Registers Unlike HLL like C or Java, assembly cannot use variables – Why not? Keep Hardware Simple Assembly Operands are registers.
Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations –VAX architecture had an instruction.
Ch 5. ARM Instruction Set  Data Type: ARM processors supports six data types  8-bit signed and unsigned bytes  16-bit signed and unsigned half-words.
ARM Shifts, Multiplies & Divide??. MVN Pseudo Instructions Pseudo Intruction: Supported by assembler, not be hardware.
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
ARM Instruction Set Computer Organization and Assembly Languages Yung-Yu Chuang with slides by Peng-Sheng Chen.
Lecture 6: Decision and Control CS 2011 Spring 2016, Dr. Rozier.
Displacement (Indexed) Stack
Chapter 4: Introduction to Assembly Language Programming
Introduction to the ARM Instruction Set
ARM Registers Register – internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has.
ECE 3430 – Intro to Microcomputer Systems
The Cortex-M3/m4 Embedded Systems: Cortex-M3/M4 Instruction Sets
William Stallings Computer Organization and Architecture 8th Edition
ECM586 Special Topics in Embedded Systems Lecture 4. ARM Instructions
Chapter 8 Central Processing Unit
The ARM Instruction Set
Branching instructions
ARM Introduction.
Overheads for Computers as Components 2nd ed.
Computer Architecture
Multiply Instructions
Immediate data Immediate operands : ADD r3, r3, #1 valid ADD r3, #1,#2 invalid ADD #3, r1,r2 invalid ADD r3, r2, #&FF ( to represent hexadecimal immediate.
Introduction to Assembly Chapter 2
An Introduction to the ARM CORTEX M0+ Instructions
Arithmetic and Logic Chapter 3
Presentation transcript:

Embedded Systems Architecture ARM Processor

ARM Processor

Why ARM? As of 2007, about 98% of the more than one billion mobile phones sold each year use at least one ARM processor. As of 2009, ARM processors account for approximately 90% of all embedded 32-bit RISC processors source: http://en.wikipedia.org/wiki/ARM_architecture

History ARM was developed at Acron Computer Limited of Cambridge, England between 1983 and 1985 RISC concept introduced in 1980 at Stanford and Berkley ARM Limited founded in 1990 ARM Cores Licensed to partners to develop and fabricate new micro-controllers Soft-core

ARM Architecture Based upon RISC Architecture with enhancements to meet requirements of embedded applications A large uniform register file Load-store architecture, where data processing operations operate on register contents only Uniform and fixed length instructions 32-bit processor Instructions are 32-bit long Good Speed/Power Consumption Ratio High Code Density

Enhancement to Basic RISC Features Variable cycle execution for certain instructions load-store-multiple instructions Inline barrel shifter leading to more complex instructions Preprocessing one of the input registers before use Thumb 16-bit instruction set Code density improved by 30% over 32-bit instructions Enhanced DSP instructions Support fast 16x16 multiplier operations

Enhancement to Basic RISC Features Auto-increment and auto-decrement addressing modes to optimize program loops Load and Store Multiple instructions to maximize data throughput Conditional Execution of instruction to maximize execution throughput

ARM Architecture Versions 26 bit addressing, no multiply or co-processor Version 2 Includes 32-bit result multiply co-processor Version 3 32 bit addressing Version 4 Add signed, unsigned half-word and signed byte load and store instructions Version 4T 16-bit Thumb compressed form of instruction introduced

ARM Architecture Versions Version 5T Superset of 4T adding new instructions Version 5TE Add signal processing signal extension Examples: ARM 6: v3 ARM 7: v3, ARM7TDMI: v4T StrongARM: v4 ARM 9E-S: v5TE

Overview: Core Data Path Data items are placed in register file No data processing instructions directly manipulate data in memory Instructions typically use two source registers and single result or destination registers A Barrel shifter on the data path can pre-process data before it enters ALU Increment/decrement logic can update register content for sequential access independent of ALU

Basic ARM Organization

Registers General Purpose registers hold either data or address All registers are of 32 bits In user mode 16 data registers and 2 status registers are visible Data registers: r0 to r15 Three registers r13, r14, r15 perform special functions r13: stack pointer r14: link register r15: program counter

Registers (2) Depending upon context, registers r13 and r14 can also be used as GPR Any instruction which use r0 can as well be used with any other GPR (r1-r13) (Orthogonal) In addition, there are two status registers CPSR: current program status register SPSR: saved program status register

Status Registers CPSR: monitors and controls internal operations

CPSR: Example

ARM Status Bits Every arithmetic, logical, or shifting operation sets CPSR bits: N (negative), Z (zero), C (carry), V (overflow). Example: -1 + 1 = 0: NZCV = 0110.

Processor Modes Processor modes determine Which registers are active, and Access rights to CPSR register itself Each processor mode is either Privileged: full read-write access to the CPSR Non-privileged: read-only access to the control field of CPSR but read-write access to the condition flags

Processor Modes (2) ARM has seven modes Privileged: abort, fast interrupt request, interrupt request, supervisor, system and undefined Non-privileged: user User mode is used for programs and applications

Privileged Modes Abort when there is a failed attempt to access memory Fast Interrupt Request (FIQ) & interrupt request correspond to interrupt levels available on ARM Supervisor mode state after reset and generally the mode in which OS kernel executes

Privileged Modes (2) System mode special version of user mode that allows full read-write access of CPSR Undefined when processor encounters an undefined instruction

Processor Modes

Processor Modes

Banked Registers Register file contains in all 37 registers 20 registers are hidden from program at different times These registers are called banked registers Banked registers are available only when the processor is in a particular mode Processor modes (other than system mode) have a set of associated banked registers that are subset of 16 registers Maps one-to-one onto a user mode register

Register Banking

SPSR Each privileged mode (except system mode) has associated with it, a Save Program Status Register or SPSR This SPSR is used to save the state of CPSR (Current Program Status Register) when the privileged mode is entered in order that the user state can be fully restored when the user process is resumed

Mode Changing Mode changes by writing directly to CPSR or by hardware when the processor responds to exception or interrupt To return to user mode a special return instruction is used that instructs the core to restore the original CPSR and banked registers

Mode Changing

ARM Instruction Set

Instructions Instructions process data held in registers and access memory with load and store instructions Classes of instructions: Data processing Branch instructions Load-store instructions Software interrupt instructions Program status register instructions

Features of ARM instruction set 3-address data processing instructions Conditional execution of every instruction Load and store multiple registers Shift, ALU operation in a single instruction

ARM data instructions Basic format: ADD r0,r1,r2 Computes r1+r2, stores in r0. Immediate operand: ADD r0,r1,#2 Computes r1+2, stores in r0.

Data Processing Manipulate data within registers MOVE instructions Arithmetic instructions Logical instructions Comparison instructions Suffix S on data processing instructions updates flags in CPSR

Data Processing Instructions Operands are 32-bit wide; come from registers or specified as literal (immediate operands) in the instruction itself Second operand sent to ALU via barrel shifter 32-bit result placed in register; long multiply instruction produces 64 bit result

Move instruction MOV Rd, N MVN Rd, N Rd: destination register N: can be an immediate value or source register Example: mov r7, r5 MVN Rd, N Move into Rd not (inverse) of the 32-bit value from source

Using Barrel Shifter Enables shifting 32-bit operand in one of the source registers left or right by a specific number of positions Basic Barrel shifter operations Shift left, shift right, rotate right Facilitates fast multiply, division and increases code density Example: mov r7, r5, LSL # 2 Multiplies content of r5 by 4 and puts result in r7

Using Barrel Shifter

Barrel Shift Instructions LSL, LSR : logical shift left/right fills with zeroes. ASL, ASR : arithmetic shift left/right fills with ones. ROR : rotate right RRX : rotate right extended with C performs 33-bit rotate, including C bit from CPSR above sign bit.

Barrel Shift with Carry

Arithmetic Instructions Implements 32 bit addition and subtraction 3-operand form Examples SUB r0, r1, r2 Subtract value stored in r2 from that of r1 and store in r0 SUBS r1, r1, #1 Subtract 1 from r1 and store result in r1 and update Z and C flags

Arithmetic Instructions ADD, ADC add (with carry) SUB, SBC subtract (with carry) RSB, RSC reverse subtract (with carry) MUL, MLA multiply (and accumulate)

Multiply Instructions Multiply contents of a pair of registers Long multiply generates 64 bit result Examples: MUL r0, r1, r2 Contents of r1 and r2 multiplied and put in r0 UMULL r0, r1, r2, r3 Unsigned multiply with result stored in r0 and r1 Number of cycles taken for execution of multiply instruction depends upon processor implementation

Multiply and Accumulate Result of multiplication can be accumulated with content of another register MLA Rd, Rm, Rs, Rn Rd = (Rm * Rs) + Rn UMLAL Rdlo, Rdhi, Rm, Rs [Rdhi, Rdlo] = [Rdhi, Rdlo] + (Rm * Rs)

Logical Instructions Bit-wise logical operations on the two source registers Operators: AND, OR, EOR (Ex-OR), BIC (bit clear) Example: BIC r0, r1, r2 r2 contains a binary pattern where every binary 1 in r2 clears a corresponding bit location in register r1 Useful in manipulating status flags and interrupt masks

With Barrel Shifter Use of barrel shifter with arithmetic and logical instructions increases the set of possible available operations Example: ADD r0, r1, r1 LSL # 1 Register r1 is shifted to the left by 1, then it is added with r1 and the result (3 times r1) is stored in r0.

Compare Instructions Enables comparison of 32 bit values Examples Updates CPSR flags but do not affect other registers Examples CMP r0, r9 Flags set as a result of r0 – r9 TEQ r0, r9 Flags set as a result r0 ex-0r r9 TST r0, r9 Flags as a result of r0 & r9

Compare Instructions CMP : compare TST : bit-wise test TEQ : XOR These instructions set only the NZCV bits of CPSR.

Load-Store Instructions Transfers data between memory and processor registers Single register transfer Data types supported are signed and unsigned words (32 bits), half-words, bytes Multiple-register transfer Transfer multiple registers between memory and the processor in a single instruction Swap Swaps content of a memory location with the contents of a register

Single Transfer Instructions Load & Store data on a boundary alignment LDR, LDRH, LDRB: Load (word, half-word, byte) STR, STRH, STRB Store (word, half-word, byte) Supports different addressing modes: 3 primary addressing modes Preindex with writeback, Preindex, Postindex Almost 9 derived addressing modes Immediate, Register, Scaled register, …

Addressing Modes (1) LDR r0, [r1, #4]! Preindex with writeback Updates the address base register with new address

Addressing Modes (2) LDR r0, [r1, #4] Preindex (Immediate Offset) 12-bit offset added to the base register

Addressing Modes (3) LDR r0, [r1], #4 Postindex Updates the address register after address is used

Example (1) Initial: Preindexing with writeback: LDR r0, [r1, #4]! r0 = 0x00000000 r1 = 0x00009000 mem32 [0x00009000] = 0x01010101 mem32 [0x00009004] = 0x02020202 Preindexing with writeback: LDR r0, [r1, #4]! r0 = 0x02020202 r1 = 0x00009004 Preindexing: LDR r0, [r1, #4]

Example (2) Initial: Postindexing: LDR r0, [r1], #4 r0 = 0x00000000 mem32 [0x00009000] = 0x01010101 mem32 [0x00009004] = 0x02020202 Postindexing: LDR r0, [r1], #4 r0 = 0x01010101 r1 = 0x00009004

Derived Addressing Modes Register indirect: LDR r0, [r1] Register operation: LDR r0, [r1, -r2] Calculated Address uses base register and another register Scaled: LDR r0, [r1, r2, LSL #2] Address is calculated using the base address register and a barrel shift operation

More Examples

Example: C assignments x = (a + b) - c; Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b, reusing r4 LDR r1,[r4] ; get value of b ADD r3,r0,r1 ; compute a+b ADR r4,c ; get address for c LDR r2[r4] ; get value of c

C assignment, cont’d. SUB r3,r3,r2 ; complete computation of x ADR r4,x ; get address for x STR r3[r4] ; store value of x

Example: C assignment C: Assembler: y = a*(b+c); ADR r4,b ; get address for b LDR r0,[r4] ; get value of b ADR r4,c ; get address for c LDR r1,[r4] ; get value of c ADD r2,r0,r1 ; compute partial result ADR r4,a ; get address for a LDR r0,[r4] ; get value of a

C assignment, cont’d. MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y

Example: C assignment C: Assembler: z = (a << 2) | (b & 15); ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,LSL 2 ; perform shift ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform AND ORR r1,r0,r1 ; perform OR

C assignment, cont’d. ADR r4,z ; get address for z STR r1,[r4] ; store value for z

Multiple Register Transfer Load-store multiple instructions transfer multiple register contents between memory and the processor in a single instruction More efficient – for moving blocks of memory and saving and restoring context and stack These instructions can increase interrupt latency Usually instruction executions are not interrupted by ARM On ARM 7: 2 + Nt cycles N: number of registers to load t: number of cycles required for each sequential access to memory.

Multiple Byte Load-Store Any subset of current bank of registers can be transferred to memory or fetched from memory LDM STM Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{^} The base register Rn determines source or destination address

Address Modes (load-store multiple)

Load/Store Multiple Addressing

SWAP Instruction Special case of load store instruction SWP: swap a word between memory and register SWPB: swap a byte between memory and register

SWAP Instruction Useful for implementing synchronization primitives like semaphore

Control Flow Instructions Branch Instructions Conditional Branches Conditional Execution Branch and Link Instructions Subroutine Return Instructions

Branch Instruction Branch instruction: B label Example: B forward Address label is stored in the instruction as a signed pc-relative offset Conditional Branch: B<cond> label Example: BNE loop Branch has a condition associated with it and executed if condition codes have the correct value

Example: Block memory copy Loop LDMIA r9!, {r0-r7} STMIA r10!, {r0-r7} CMP r9, r11 BNE Loop r9 points to source of data, r10 points to start of destination data, r11 points to end of the source

Conditional Execution An unusual feature of ARM instruction set is that conditional execution applies not only to branches but to all ARM instructions Example: ADDEQ r0, r1, r2 Instruction will only be executed when the zero flag is set to 1

Advantages Reduces the number of branches Increases code density Reduces the number of pipeline flushes Improves performance of the code Increases code density Thumb Rule: Whenever the conditional sequence is 3 instructions or fewer (smaller and faster), exploit conditional execution than to use a branch

Branch & Link Instruction Perform a branch, save the address following the branch in the link register, r14 Example: BL subroutine For nested subroutine, push r14 and some work registers required to be saved onto a stack in memory Example: BL sub1 ……. STMFD r13!, {r0-r2, r14} BL sub2

Subroutine return instructions No specific instructions Example (1): sub …… MOV PC, r14 Example (2): when return address has been pushed to stack sub2 ….. LDMFD r13!, {r0-r12, PC}

Thumb Thumb encodes a subset of the 32 bit instruction set into a 16-bit subspace Thumb has higher performance than ARM on a processor with a 16-bit data bus Thumb has higher code density For memory constrained embedded system On average, a Thumb implementation takes 30% less memory than the equivalent ARM implementation. (source: ARM System Developer’s Guide)

Code density

Thumb Instruction Decoding Each Thumb instruction is related to a 32-bit ARM instruction.

ARMv5E Extensions Extensions to facilitate signal processing operations Supports Signed multiply accumulate instruction Greater flexibility and efficiency when manipulating 16 bit values for applications such as 16 bit digital audio processing.

Summary We have studied instruction set of ARM processors We discussed the use of barrel shifters We studied various addressing modes We have examined Thumb mode of operation