Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller.

Slides:



Advertisements
Similar presentations
Computer Science Education
Advertisements

SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
ARM versions ARM architecture has been extended over several versions.
Embedded Systems Programming
Appendix D The ARM Processor
Cosc 2150: Computer Organization
The LC-3 – Chapter 5 COMP 2620 Dr. James Money COMP 2620.
Overheads for Computers as Components 2nd ed.
OPTIMIZING C CODE FOR THE ARM PROCESSOR Optimizing code takes time and reduces source code readability Usually done for functions that are critical for.
Stored Program Architecture
Assembly Language.
Embedded Systems Architecture
MIPS Assembly Tutorial
Passing by-value vs. by-reference in ARM by value C code equivalent assembly code int a;.section since a is not assigned an a:.skip initial.
F28PL1 Programming Languages Lecture 3: Assembly Language 2.
Multiplication – Microprocessor
ITEC 352 Lecture 13 ISA(4).
Microprocessors.
© 2000 Morgan Kaufman Overheads for Computers as Components ARM instruction set zARM versions. zARM assembly language. zARM programming model. zARM memory.
Chapter 2 Instruction Sets 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides)
Control Structures in ARM Implementation of Decisions Similar to accumulator instructions One instruction sets the flags, followed by another instruction.
CONDITION CODE AND ARITHMETIC OPERATIONS – Microprocessor Asst. Prof. Dr. Choopan Rattanapoka and Asst. Prof. Dr. Suphot Chunwiphat.
2.3) Example of program execution 1. instruction  B25 8 Op-code B means to change the value of the program counter if the contents of the indicated register.
ARM Microprocessor “MIPS for the Masses”.
Intel Computer Architecture Presented By Jessica Graziano.
Lecture 5: Decision and Control CS 2011 Fall 2014, Dr. Rozier.
COMP3221 lec06-numbers-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 6: Number Systems - II
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Assembly Language and Computer Architecture Using C++ and Java
Assembly Language for Intel-Based Computers Chapter 2: IA-32 Processor Architecture Kip Irvine.
COMP3221: Microprocessors and Embedded Systems--Lecture 4 1 COMP3221: Microprocessors and Embedded Systems Lecture 4: Number Systems (II)
Chapters 5 - The LC-3 LC-3 Computer Architecture Memory Map
ARM programmer’s model and assembler Embedded Systems Programming.
Topics covered: ARM Instruction Set Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
ARM Instructions I Prof. Taeweon Suh Computer Science Education Korea University.
Topic 8: Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Winter 2010 Prof. Ryan Kastner Dept. of Computer Science and.
Basic Operational Concepts of a Computer
Assembly Programming on the TI-89 Created By: Adrian Anderson Trevor Swanson.
Lecture 2: Basic Instructions CS 2011 Fall 2014, Dr. Rozier.
Lecture 4. ARM Instructions #1 Prof. Taeweon Suh Computer Science Education Korea University ECM586 Special Topics in Embedded Systems.
Stack Stack Pointer A stack is a means of storing data that works on a ‘Last in first out’ (LIFO) basis. It reverses the order that data arrives and is.
Lecture 2: Basic Instructions EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Lecture 4 ( Assembly Language).
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
N, Z, C, V in CPSR with Adder & Subtractor Prof. Taeweon Suh Computer Science Education Korea University.
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
COMPILERS CLASS 22/7,23/7. Introduction Compiler: A Compiler is a program that can read a program in one language (Source) and translate it into an equivalent.
An Adder A Subtractor. A and B are the inputs of the adder/ subtractor R is the output of the adder/ subtractor F is the control to tell it to add or.
EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Topics Today: – ARM Addressing Endianness, Loading, and Storing Data – Data Layout Struct Packing.
Comparison Instructions Test instruction –Performs an implied AND operation between each of the bits in 2 operands. Neither operand is modified. (Flags.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Add & Subtract. Addition Add bit by bit from right to left Ex 5+6 or (5)0111 (7) (6)OR (3) 1011 (11)1010 (10)
Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations –VAX architecture had an instruction.
Lecture 6: Decision and Control CS 2011 Spring 2016, Dr. Rozier.
GCSE COMPUTER SCIENCE Computers 1.5 Assembly Language.
Chapter 15: Higher Level Constructs
ARM Registers Register – internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has.
Assembly Language Assembly Language
The Cortex-M3/m4 Embedded Systems: Cortex-M3/M4 Instruction Sets
Processor Instructions set. Learning Objectives
COMP3221: Microprocessors and Embedded Systems
ARM Control Structures
Architecture CH006.
Figure 2- 1: ARM Registers Data Size
ARM Control Structures
Branching instructions
Overheads for Computers as Components 2nd ed.
Immediate data Immediate operands : ADD r3, r3, #1 valid ADD r3, #1,#2 invalid ADD #3, r1,r2 invalid ADD r3, r2, #&FF ( to represent hexadecimal immediate.
Branch & Call Chapter 4 Sepehr Naimi
An Introduction to the ARM CORTEX M0+ Instructions
Presentation transcript:

Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller Complete understanding of code execution requires 1.Understanding the compiler 2.Understanding the computer architecture C code Machine code Compilerucontroller

Slides created by: Professor Ian G. Harris ARM Instruction Set  An instruction set is the set of all machine instructions supported by the architecture Load-Store Architecture Data processing occurs in registers Load and store instructions move data between memory and registers [] indicate an address Ex. LDR r0, [r1] moves data into r0 from memory at address in r1 STR r0, [r1] moves data from r0 into memory at address in r1

Slides created by: Professor Ian G. Harris Data Processing Instructions  Move Instructions MOV r0, r1 moves the contents of r1 into r0 MOV r0, #3 moves the number 3 into r0  Shift Instructions – inputs to operations can be shifted MOV r0, r1, LSL #2 moves (r1 << 2) into r0 MOV r0, r1, ASR #2 moves (r1 >> 2) into r0, sign extend  Arithmetic Instructions ADD r3, r4, r5 places (r4 + r5) in r3

Slides created by: Professor Ian G. Harris Condition Flags  Current Program Status Register (CPSR) contains the status of comparison instructions and some arithmetic instructions  N – negative, Z – zero, C – unsigned carry, V – overflow, Q - saturation  Flags are set as a result of a comparison instruction or an arithmetic instruction with an 'S' suffix Ex. CMP r0, r1 – sets status bits as a result of (r0 – r1) ADDS r0, r1, r2 – r0 = r1 + r2 and status bits set ADD r0, r1, r2 – r0 = r1 + r2 but no status bits set

Slides created by: Professor Ian G. Harris Conditional Execution  All ARM instructions can be executed conditionally based on the CPSR register  Appropriate condition suffix needs to be added to the instruction  NE – not equal, EQ – equal, CC – less than (unsigned), LT less than (signed) Ex. CMP r0, r1 ADDNE r3, r4, r5 BCC test  ADDNE is executed if r0 not equal to r1  BCC is executed if r0 is less than r1

Slides created by: Professor Ian G. Harris Variable Types and Casting int checksum_v1 (int *data) { char i; int sum=0; for (i=0; i<64; i++) { sum += data[I]; } return sum; }  Program computes the sum of the first 64 elts in the data array  Variable i is declared as a char to save space  i always less than 8 bits long  May use less register space and/or stack space  i as a char does NOT save any space  All stack entries and registers are 32 bits long

Slides created by: Professor Ian G. Harris Declaring Shorter Variables int test (void) { char i=255; int j=255; i++; // i = 0 j++; // j = 256 }  Shorter variables may save space in the heap, but not the stack (data)  Compiler needs to mimic the behavior of a short variable with a long variable  If i is a char, its value overflows after 255  i is contained in a 32 bit register  Compiler must make i’s 32 bit register overflow after 255

Slides created by: Professor Ian G. Harris Assembly Code for Checksum checksum_v1 MOV r2, r0; r2 = data MOV r0, #0; sum = 0 MOV r1, #0; i = 0 checksum_v1_loop LDR r3, [r2, r1, LSL #2]; r3 = data[I] ADD r1, r1, #1; r1 = i+1 AND r1, r1, #0xff; i = (char)r1 CMP r1, #0x40; compare i, 64 ADD r0, r3, r0; sum += r3 BCC checksum_v1_loop; if i<64 goto loop MOV pc, r14 Argument, *data, passed in r0 Return address stored in r14 Stack avoided to reduce delay LSL needed to increment by 4 Highlighted instruction needed to mimic char 17% instruction overhead  Declaring i as an unsigned int would fix the problem

Slides created by: Professor Ian G. Harris Shorter Variable Example 2 int checksum_v1 (short *data) { unsigned int i; short sum=0; for (i=0; i<64; i++) { sum = (short) (sum + data[i]); } return sum; }  Data is an array of shorts, not ints  Type cast is needed because + only takes 32-bit args Problems: 1. sum is a short, not int 2. Loading a halfword (16-bits) is limited

Slides created by: Professor Ian G. Harris Assembly Code for Example 2 checksum_v1 MOV r2, r0; r2 = data MOV r0, #0; sum = 0 MOV r1, #0; i = 0 checksum_v1_loop ADD r3, r2, r1, LSL #1; r3 = &data[i] LDRH r3, [r3, #0]; r3 = data[i] ADD r1, r1, #1; r1 = i+1 CMP r1, #0x40; compare i, 64 ADD r0, r3, r0; sum += r3 MOV r0, r0, LSL #16 MOV r0, r0, ASR #16; sum = (short)r0 BCC checksum_v1_loop; if i<64 goto loop MOV pc, r14; return sum  LDRH cannot take shifted operands, so the ADD is needed  Sum is signed, so ASR is needed to sign extend

Slides created by: Professor Ian G. Harris Shorter Variable Example 3 int checksum_v1 (short *data) { unsigned int i; int sum=0; for (i=0; i<64; i++) { sum += *(data++); } return (short) sum; }  sum is an int  data is incremented, i is not used as an array index  Incrementing data can be part of the LDR instruction

Slides created by: Professor Ian G. Harris Assembly Code for Example 3 checksum_v1 MOV r2, #0; sum = 0 MOV r1, #0; i = 0 checksum_v1_loop LDRSH r3, [r0], #2; r3 = *(data++) ADD r1, r1, #1; r1 = i+1 CMP r1, #0x40; compare i, 64 ADD r2, r3, r2; sum += r3 BCC checksum_v1_loop; if i<64 goto loop MOV r0, r2, LSL #16 MOV r0, r0, ASR #16; r0 = (short)sum MOV pc, r14; return sum  *data is incremented as part of LDRSH instruction  Cast to short occurs once, outside of the loop

Slides created by: Professor Ian G. Harris Loops, Fixed Iterations checksum_v1 MOV r2, #0; sum = 0 MOV r1, #0; i = 0 checksum_v1_loop LDRSH r3, [r0], #2; r3 = *(data++) ADD r1, r1, #1; r1 = i+1 CMP r1, #0x40; compare i, 64 ADD r2, r3, r2; sum += r3 BCC checksum_v1_loop; if i<64 goto loop MOV pc, r14; return sum  A lot of time is spent in loops  Loops are a common target for optimization  3 instructions implement loop: add, compare, branch  Replace them with: subtract/compare, branch  Result of the subtract can be used to set condition flags

Slides created by: Professor Ian G. Harris Condensing a Loop  Current loop counts up from 0 to 64  i is compared to 64 to check for loop termination  Optimized loop can count down from 64 to 0  i does not need to be explicitly compared to 0 – Add the 'S' suffix to the subtract so is sets condition flags Ex. SUBS r1, r1, #1 BNE loop  BNE checks Zero flag in CPSR  No need for a compare instruction

Slides created by: Professor Ian G. Harris Loops, Counting Down checksum MOV r2, r0; r2 = data MOV r0, #0; sum = 0 MOV r1, #0x40; i = 64 checksum_loop LDR r3, [r2], #4; r3 = *(data++) SUBS r1, r1, #1; i-- and set flags ADD r0, r3, r0; sum += r3 BCC checksum_loop; if i!=0 goto loop MOV pc, r14; return sum  One comparison instruction removed from inside the loop  Possible because ARM always compares to 0

Slides created by: Professor Ian G. Harris Loop Unrolling  Loop overhead is the performance cost of implementing the loop – Ex. SUBS, BCC  For ARM, overhead is 4 clock cycles – SUBS = 1 clk, BCC = 3 clks  Overhead can be avoided by unrolling the loop – Repeating the loop body many times  Fixed iteration loops, unrolling can reduce overhead to 0  Variable iteration loops, overhead is greatly reduced

Slides created by: Professor Ian G. Harris Unrolling, Fixed Iterations checksum MOV r2, r0; r2 = data MOV r0, #0; sum = 0 MOV r1, #0x40; i = 32 checksum_loop SUBS r1, r1, #1; i-- and set flags LDR r3, [r2], #4; r3 = *(data++) ADD r0, r3, r0; sum += r3 LDR r3, [r2], #4; r3 = *(data++) ADD r0, r3, r0; sum += r3 BCC checksum_loop; if i!=0 goto loop MOV pc, r14; return sum  Only 32 iterations needed, loop body duplicated  Loop overhead cut in half

Slides created by: Professor Ian G. Harris Unrolling Side Effects Advantages: – Reduces loop overhead, improves performance Disadvantages: – Increases code size – Displaces lines from the instruction cache – Degraded cache performance may offset gains

Slides created by: Professor Ian G. Harris Register Allocation  Compiler must choose registers to hold all data used - i, data[i], sum, etc.  If number of vars > number of registers, stack must be used - very slow  Try to keep number of local variables small - approximately 12 available registers in ARM - 16 total registers but some may be used (SP, PC, etc.)

Slides created by: Professor Ian G. Harris Function Calls, Arguments  ARM passes the first 4 arguments through r0, r1, r2, and r3  Stack is only used if 5 or more arguments are used  Keep number of arguments <= 4  Arguments can be merged into structures which are passed by reference typedef struct { float x; float y; float z; } Point; float distance (point *a, point *b) { float t1, t2; t1 = (a->x – b->x)^2; t2 =(a->y – b->y)^2; return(sqrt(t1 + t2)); }  Pass two pointers rather than six floats

Slides created by: Professor Ian G. Harris Preserving Registers  Caller must preserve registers that the callee might corrupt  Registers are preserved by writing them to memory and reading them back later Example: – Function foo() calls function bar() – Both foo() and bar() use r4 and r5 – Before the call, foo() writes registers to memory (STR) – After the call, foo() reads memory back (LDR)  If foo() and bar() are in different.c files, compiler will preserve all corruptible registers  If foo() and bar() are in the same file, compiler will only save corrupted registers

Slides created by: Professor Ian G. Harris Function Calls, Inlining  Code for a called function can be inserted into the code of the caller int foo(int x) { int z; z = bar(x); return(z*2); } int bar(int y) { return(y+3); } int foo_inline(int x){ int z; z = x + 3; return(z*2); }  Machine code is inlined, not the C code  Code size is increased, works well for small functions