Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.

Slides:



Advertisements
Similar presentations
ARM versions ARM architecture has been extended over several versions.
Advertisements

COMP3221 lec16-function-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 16 : Functions in C/ Assembly - II
1 ARM Movement Instructions u MOV Rd, ; updates N, Z, C Rd = u MVN Rd, ; Rd = 0xF..F EOR.
Lab III Real-Time Embedded Operating System for a SoC System.
Run-time Environment for a Program different logical parts of a program during execution stack – automatically allocated variables (local variables, subdivided.
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Load and store instruction.
Introduction to Embedded Systems Intel Xscale® Assembly Language and C Lecture #3.
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
CONDITION CODE AND ARITHMETIC OPERATIONS – Microprocessor Asst. Prof. Dr. Choopan Rattanapoka and Asst. Prof. Dr. Suphot Chunwiphat.
ARM Microprocessor “MIPS for the Masses”.
Multiple data transfer instructions ARM also supports multiple loads and stores: ldm/ldmia/ldmfd: load multiple registers starting from [base register],
COMP3221 lec08-arith.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 8: C/Assembler Data Processing
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
ARM programmer’s model and assembler Embedded Systems Programming.
Topics covered: ARM Instruction Set Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
ARM 7 Datapath. Has “BIGEND” input bit, which defines whether the memory is big or little endian Modes: ARM7 supports six modes of operation: (1) User.
ARM Instructions I Prof. Taeweon Suh Computer Science Education Korea University.
The ARM Programmer’s Model
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Thumb Instruction Set.
Exception and Interrupt Handling
Lecture 4. ARM Instructions #1 Prof. Taeweon Suh Computer Science Education Korea University ECM586 Special Topics in Embedded Systems.
Lecture 4. ARM Instructions Prof. Taeweon Suh Computer Science & Engineering Korea University COMP427 Embedded Systems.
1 Chapter 4 ARM Assembly Language Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
ARM Assembly Programming II Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/26 with slides by Peng-Sheng Chen.
AT91 C-startup. 2 For reasons of modularity and portability most application code for an embedded application is written in C The application entry point.
1 TM 1 Embedded Systems Lab./Honam University r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp)
Assembly Variables: Registers Unlike HLL like C or Java, assembly cannot use variables – Why not? Keep Hardware Simple Assembly Operands are registers.
Introduction to ARM processor. Intro.. ARM founded in November 1990 Advanced RISC Machines Company headquarters in Cambridge, UK Processor design centers.
ARM7 TDMI INTRODUCTION.
Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations –VAX architecture had an instruction.
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
ARM Instruction Set Computer Organization and Assembly Languages Yung-Yu Chuang with slides by Peng-Sheng Chen.
Multiple data transfer instructions ARM also supports multiple loads and stores: When the data to be copied to the stack is known to be a multiple of 4.
Chapter 12 Processor Structure and Function. Central Processing Unit CPU architecture, Register organization, Instruction formats and addressing modes(Intel.
Writing Functions in Assembly
Smruti Ranjan Sarangi, IIT Delhi Chapter 4 ARM Assembly Language
Chapter 4: Introduction to Assembly Language Programming
Data in Memory variables have multiple attributes symbolic name
Assembly language.
ECE 3430 – Intro to Microcomputer Systems
Timer and Interrupts.
William Stallings Computer Organization and Architecture 8th Edition
Introduction to the ARM Instruction Set
ARM Registers Register – internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has.
ECE 3430 – Intro to Microcomputer Systems
The Cortex-M3/m4 Embedded Systems: Cortex-M3/M4 Instruction Sets
Chapter 4 Addressing modes
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
William Stallings Computer Organization and Architecture 8th Edition
Writing Functions in Assembly
The University of Adelaide, School of Computer Science
Chapter 7 Subroutines Dr. A.P. Preethy
ARM Assembly Programming
Instructions - Type and Format
Real-Time Embedded Operating System for a SoC System
The University of Adelaide, School of Computer Science
Chapter 8 Central Processing Unit
ARM Load/Store Instructions
Computer Architecture
UCSD ECE 111 Prof. Farinaz Koushanfar Fall 2018
Optimizing ARM Assembly
ARM Introduction.
Overheads for Computers as Components 2nd ed.
Computer Architecture
10/6: Lecture Topics C Brainteaser More on Procedure Call
Multiply Instructions
Introduction to Assembly Chapter 2
An Introduction to the ARM CORTEX M0+ Instructions
Presentation transcript:

Intel Xscale® Assembly Language and C

The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will be using the Little Endian format the lowest numbered byte of a word is considered the word’s least significant byte, and the highest numbered byte is considered the most significant byte. Instruction Length –All instructions are 32-bits long. (ARM instructions) Data Types –8-bit bytes and 32-bit words. Processor Modes (of interest) –User: the “normal” program execution mode. –IRQ: used for general-purpose interrupt handling. –Supervisor: a protected mode for the operating system.

The Intel Xscale® Programmer’s Model (2) The Intel Xscale® Register Set –Registers R0-R15 + CPSR (Current Program Status Register) –R13 : Stack Pointer –R14 : Link Register –R15 : Program Counter where bits 0:1 are ignored Program Status Registers –CPSR (Current Program Status Register) holds info about the most recently performed ALU operation –contains N (negative), Z (zero), C (Carry) and V (oVerflow) bits controls the enabling and disabling of interrupts sets the processor operating mode –SPSR (Saved Program Status Registers) used by exception handlers Exceptions –reset, undefined instruction, SWI, IRQ.

Intro to Intel Xscale® Assembly Language “Load/store” architecture 32-bit instructions 32-bit and 8-bit data types 32-bit addresses 37 registers (30 general-purpose registers, 6 status registers and a PC) –only a subset is accessible at any point in time Load and store multiple instructions No instruction to move a 32-bit constant to a register (why?) Conditional execution Barrel shifter –scaled addressing, multiplication by a small constant, and ‘constant’ generation Co-processor instructions (we will not use these)

Intel Xscale® Assembly Language Basics Conditional Execution The Intel Xscale® Barrel Shifter Loading Constants into Registers Loading Addresses into Registers Jump Tables Using the Load and Store Multiple Instructions Check out Chapters 1 through 5 of the ARM Architecture Reference Manual

Generating Assembly Language Code from C Use the command-line option –S. –When you compile a.c file, you get a.s file –This.s file contains the assembly language code generated by the compiler When assembled, this code can potentially be linked and loaded as an executable

Register Names and Use Register #APCS NameAPCS Role R0 a1 argument 1 R1 a2 argument 2 R2 a3 argument 3 R3 a4 argument 4 R4..R8 v1..v5 register variables R9 sb/v6 static base/register variable R10 sl/v7 stack limit/register variable R11 fp frame pointer R12 ip scratch reg/ new­sb in inter­link­unit calls R13 sp low end of current stack frame R14 lr link address/scratch register R15 pc program counter

“Frame Pointer” foo: MOV ip, sp STMDB sp!,{a1­a3, fp, ip, lr, pc} LDMDB fp,{fp, sp, pc} pc lr ip fp address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 fp 1 a3 a2 a1 1 ipSP frame pointer (fp) points to the top of stack for function

The Frame Pointer fp points to top of the stack area for the current function –Or zero if not being used By using the frame pointer and storing it at the same offset for every function call, it creates a singly­linked list of activation records Creating the stack “backtrace” structure MOV ip, sp STMFD sp!,{a1­a4,v1­ v5,sb,sl,fp,ip,sp, lr,pc} SUB fp, ip, #4 pc lr SP before address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 0x6c 0x68 0x64 0x60 0x5c 0x58 0x54 0x50 ip fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 SP after FP after sp

The Frame Pointer fp points to top of the stack area for the current function –Or zero if not being used By using the frame pointer and storing it at the same offset for every function call, it creates a singly­linked list of activation records –The fp register points to the stack backtrace structure for the currently executing function. –The saved fp value is (zero or) a pointer to a stack backtrace structure created by the function which called the current function. –The saved fp value in this structure is a pointer to the stack backtrace structure for the function that called the function that called the current function; and so on back until the first function. (saved) pc (saved) lr ( saved) sb SP before address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 0x6c 0x68 0x64 0x60 0x5c 0x58 0x54 0x50 (saved) ip (saved) fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 SP current FP current

Example Backtrace (saved) pc (saved) lr ( saved) sb (saved) ip (saved) fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 (saved) pc (saved) lr ( saved) sb (saved) ip (saved) fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 (saved) pc (saved) lr ( saved) sb (saved) ip (saved) fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 fp bar ’s frame foo ’s frame main ’s frame If main calls foo which calls bar

Creating the “backtrace” structure MOV ip, sp STMFD sp!,{a1­a4,v1­v5,sb,fp,ip,lr,pc} SUB fp, ip, #4 … LDMFD fp, {fp,sp,sb,pc} (saved) pc (saved) lr ( saved) sb SP before address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 0x6c 0x68 0x64 0x60 0x5c 0x58 0x54 0x50 (saved) ip (saved) fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 SP current FP after

How Does STM Place Things into Memory ? STM sp!, {r0­r15} The XScale processor uses a bit-vector to represent each register to be saved The architecture places the lowest number register into the lowest address Default STM == STMDB pc lr sp SP before address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 0x6c 0x68 0x64 0x60 0x5c 0x58 0x54 0x50 ip fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 SP after

Example 1: A Simple Program int a,b; int main() { a = 3; b = 4; } /* end main() */.text /*section declaration*/.align 2.global main /*export entry point*/.type main, %function main: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 ldr r2,.L2 mov r3, #3 str r3, [r2, #0] /* a=3 */ ldr r2,.L2+4 mov r3, #4 str r3, [r2, #0] /* b=4 */ mov r0, r3 ldmfd sp, {fp, sp, pc} /*return*/.L3:.align 2.L2:.word a.word b STMFD ­ store multiple, full descending sp  sp ­ 4 mem[sp] = pc ; program counter sp  sp – 4 mem[sp] = lr ; link register sp  sp – 4 mem[sp] = ip ; new stack base sp  sp – 4 mem[sp] = fp ; frame pointer LDMFD ­ load multiple, full descending fp = mem[sp] (fp) ; frame pointer sp  sp + 4 sp = mem[sp] (ip) ; stack pointer sp  sp + 4 pc = mem[sp] (lr) ; program counter

Example 2: Calling A Function int tmp, a, b; void swap(int a, int b); int main() { a = 3; b = 4; swap(a,b); } /* end main() */ void swap(int a,int b) { tmp = a; a = b; b = tmp; } /* end swap() */.global main /*export entry point*/.type main, %function main: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 ldr r2,.L2 mov r3, #3 str r3, [r2, #0] /* a=3 */ ldr r2,.L2+4 mov r3, #4 str r3, [r2, #0] /* b=4 */ ldr r3,.L2 ldr r2,.L2+4 ldr r0, [r3, #0] /* a */ ldr r1, [r2, #0] /* b */ bl swap /* function call */ mov r0, r3 ldmfd sp, {fp, sp, pc} /*return*/.L3:.align 2.L2:.word a.word b

Example 2: Calling A Function (Cont’d).global swap.type swap, %function swap: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 sub sp, sp, #8 str r0, [fp, #-16] /* a */ str r1, [fp, #-20] /* b */ ldr r2,.L5 /* r2 = &tmp */ ldr r3, [fp, #-16] /* r3 = a */ str r3, [r2, #0] /* tmp = a */ ldr r2, [fp, #-20] /* r2 = b */ str r3, [fp, #-16] /* a */ ldr r3,.L5 ldr r3, [r3, #0] /* tmp */ ldr r3, [fp, #-20] /* r3 = b */ str r3, [fp, #-16] /* a = b */ ldr r3,.L5 ldr r3, [r3, #0] /* tmp */ str r3, [fp, #-20] /* b = tmp */ sub sp, fp, #12 ldmfd sp, {fp, sp, pc} /*return*/.L6:.align 2.L5:.word tmp void swap(int a,int b) { tmp = a; a = b; b = tmp; } /* end swap() */

Example 3: Manipulating Pointers int tmp; int a, b; void swap (int *a, int *b); int main() { a = 3; b = 4; swap(&a, &b); } /* end main() */ void swap(int *a,int *b) { tmp = *a; *a = *b; *b = tmp; } /* end swap() */.global main /*export entry point*/.type main, %function main: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 ldr r2,.L2 mov r3, #3 str r3, [r2, #0] /* a=3 */ ldr r2,.L2+4 mov r3, #4 str r3, [r2, #0] /* b=4 */ ldr r3,.L2 ldr r2,.L2+4 bl swap /* function call */ mov r0, r3 ldmfd sp, {fp, sp, pc} /*return*/.L3:.align 2.L2:.word a.word b

Example 3 (cont’d).global swap.type swap, %function swap: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 sub sp, sp, #8 str r0, [fp, #-16] /* &a */ str r1, [fp, #-20] /* &b */ ldr r2,.L5 /* r2 = &tmp */ ldr r3, [fp, #-16] /* r3 = &a */ ldr r3, [r3, #0] /* r3 = a */ str r3, [r2, #0] /* tmp = a */ ldr r2, [fp, #-16] /* r2 = &a */ ldr r3, [fp, #-20] /* r3 = &b */ ldr r3, [r3, #0] /* r3 = b */ str r3, [r2, #0] /* a = b */ ldr r2, [fp, #-20] /* r2 = &b */ ldr r3,.L5 ldr r3, [r3, #0] /* r3 = tmp */ str r3, [r2, #0] /* b = tmp */ sub sp, fp, #12 ldmfd sp, {fp, sp, pc} /*return*/.L6:.align 2.L5:.word tmp void swap(int *a,int *b) { tmp = *a; *a = *b; *b = tmp; } /* end swap() */

Example 4: Dealing with Lots of Arguments int tmp; void test(int a, int b, int c, int d, int *e); int main() { int a, b, c, d, e; a = 3; b = 4; c = 5; d = 6; e = 7; test(a, b, c, d, &e); } /* end main() */ void test(int a,int b, int c, int d, int *e) { tmp = a; a = b; b = tmp; c = b; b = d; *e = d; } /* end test() */ main: mov ip, sp stmfd sp!,{fp,ip,lr,pc} sub fp, ip, #4 sub sp, sp, #24 mov r3, #3 str r3, [fp, #-16] mov r3, #4 str r3, [fp, #-20] mov r3, #5 str r3, [fp, #-24] mov r3, #6 str r3, [fp, #-28] mov r3, #7 str r3, [fp, #-32] sub r3, fp, #32 str r3, [sp, #0] /* &e */ ldr r0, [fp, #-16] /* a */ ldr r1, [fp, #-20] /* b */ ldr r2, [fp, #-24] /* c */ ldr r3, [fp, #-28] /* d */ bl test mov r0, r3 sub sp, fp, #12 ldmfd sp, {fp, sp, pc}

Example 4 (cont’d) test: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 sub sp, sp, #16 str r0, [fp, #-16] str r1, [fp, #-20] str r2, [fp, #-24] str r3, [fp, #-28] ldr r2,.L3 /* tmp */ ldr r3, [fp, #-16] str r3, [r2, #0] /* tmp = a */ ldr r3, [fp, #-20] str r3, [fp, #-16] /* a = b */ ldr r3,.L3 ldr r3, [r3, #0] str r3, [fp, #-20] /* b = tmp */ ldr r3, [fp, #-20] str r3, [fp, #-24] /* c = b */ ldr r3, [fp, #-28] str r3, [fp, #-20] /* b = d */ ldr r2, [fp, #4] ldr r3, [fp, #-28] str r3, [r2, #0] /* *e = d */ sub sp, fp, #12 ldmfd sp, {fp, sp, pc} d c b a fp ip lr pc e fp ip sp 88 9c c c

Mixing C and Assembly Language XScale Assembly Code C Library C Source Code XScale Executable Compiler Linker Assembler

Interfacing C and Assembly Language ARM (the has developed a standard called the “ARM Procedure Call Standard” (APCS) which defines: –constraints on the use of registers –stack conventions –format of a stack backtrace data structure –argument passing and result return –support for ARM shared library mechanism Compiler­generated code conforms to the APCS –It's just a standard ­ not an architectural requirement –Cannot avoid standard when interfacing C and assembly code –Can avoid standard when just writing assembly code or when writing assembly code that isn't called by C code

Multiply Multiply instruction can take multiple cycles –Can convert Y * Constant into series of adds and shifts –Y * 9 = Y * 8 + Y * 1 –Assume R1 holds Y and R2 will hold the result ADD R2, R2, R1, LSL #3 ; multiplication by 9: (Y * 8) + (Y * 1) RSB R2, R1, R1, LSL #3 ; multiplication by 7: (Y * 8) - (Y * 1) (RSB: reverse subtract - operands to subtraction are reversed) Another example: Y * 105 –105 = 128 ­ 23 = 128 ­ (16 + 7) = 128 ­ (16 + (8 ­ 1)) RSB r2, r1, r1, LSL #3 ; r2 <­­ Y*7 = Y*8 ­ Y*1(assume r1 holds Y) ADD r2, r2, r1, LSL #4 ; r2 <­­ r2 + Y * 16 (r2 held Y*7; now holds Y*23) RSB r2, r2, r1, LSL #7 ; r2 <­­ (Y * 128) ­ r2 (r2 now holds Y*105) Or Y * 105 = Y * (15 * 7) = Y * (16 ­ 1) * (8 ­ 1) RSB r2,r1,r1,LSL #4 ; r2 <­­ (r1 * 16)­ r1 RSB r3, r2, r2, LSL #3 ; r3 <­­ (r2 * 8)­ r2