Chapter 13 Inline Code.

Slides:



Advertisements
Similar presentations
There are two types of addressing schemes:
Advertisements

The Assembly Language Level
Introduction to C Programming
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
Introduction to C Programming
Topic 8: Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Winter 2010 Prof. Ryan Kastner Dept. of Computer Science and.
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
Implementation of a Stored Program Computer ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides2.ppt Modification date: Oct 16,
CHAPTER 4: CONTROL STRUCTURES - SEQUENCING 10/14/2014 PROBLEM SOLVING & ALGORITHM (DCT 1123)
Chapter 2 Overview of C++. 2 Overview  2.1 Language Elements  2.2 Reserved Words & Identifiers  2.3 Data Types & Declarations  2.4 Input/Output 
CHAPTER 4 GC 101 Data types. DATA TYPES  For all data, assign a name (identifier) and a data type  Data type tells compiler:  How much memory to allocate.
Chapter 2 Variables.
Operators and Expressions. 2 String Concatenation  The plus operator (+) is also used for arithmetic addition  The function that the + operator performs.
Embedding Assembly Code in C Programs תרגול 7 שילוב קוד אסמבלי בקוד C.
Chapter 4: Variables, Constants, and Arithmetic Operators Introduction to Programming with C++ Fourth Edition.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
Computers’ Basic Organization
Writing Functions in Assembly
Chapter 9: Value-Returning Functions
Chapter 2 Variables.
Topics Designing a Program Input, Processing, and Output
Assembly language.
Machine dependent Assembler Features
Data Types and Expressions
Assembly Language Lab 9.
Format of Assembly language
Chapter 12 Variables and Operators
Chapter 13 Inline Code.
Chapter 5 Integer Arithmetic.
Chapter 2 - Introduction to C Programming
ARM Registers Register – internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has.
The Cortex-M3/m4 Embedded Systems: Cortex-M3/M4 Instruction Sets
Variables, Expressions, and IO
Chapter 4 Addressing modes
Writing Functions in Assembly
Multiplication and Division Revisited
Chapter 2 - Introduction to C Programming
Introduction to Intel x86-64 Assembly, Architecture, Applications, & Alliteration Xeno Kovah – 2014 xkovah at gmail.
Chapter 12 Variables and Operators
Code Generation.
Chapter 2 - Introduction to C Programming
Topics Introduction to File Input and Output
Introduction to C++ Programming
CSCE Fall 2013 Prof. Jennifer L. Welch.
Chapter 2 - Introduction to C Programming
Chapter 2 Variables.
Chapter 2 - Introduction to C Programming
Introduction to C++ Programming
Expressions and Assignment
CSCE Fall 2012 Prof. Jennifer L. Welch.
Instruction Set Principles
Topics Designing a Program Input, Processing, and Output
Overheads for Computers as Components 2nd ed.
Topics Designing a Program Input, Processing, and Output
Homework Finishing Chapter 2 of K&R. We will go through Chapter 3 very quickly. Not a lot is new. Questions?
Chapter 2 - Introduction to C Programming
Primitive Types and Expressions
Unit 3: Variables in Java
Chapter 2 Variables.
Using C++ Arithmetic Operators and Control Structures
OPERATORS in C Programming
Topics Introduction to File Input and Output
Introduction to C Programming
OPERATORS in C Programming
An Introduction to the ARM CORTEX M0+ Instructions
Presentation transcript:

Chapter 13 Inline Code

Our Goal: Performance What’s Slow? Division and Branching Remedies we’ve used: Replace division by a constant Replace conditional branch with IT block But also consider: We often choose to write small functions in assembly to improve performance – especially when that function is called frequently The call and return are each a branch, and can be a significant portion of the function execution time

INLINE CODE Goal: Optimize execution time Issue: Call-Return overhead of BL and BX Solutions: (1) Inline Functions (2) Inline Assembly (3) Combining (1) and (2)

Is NOT an inline function INLINE FUNCTIONS Function Call Is NOT an inline function IS an inline function int32_t Add1(int32_t a) { return a + 1 ; } static inline int32_t Add1(int32_t a) y = Add1(x) ; LDR R0,x BL Add1 STR R0,y ADD R0,R0,1 Calling a regular function requires time to execute the BL and BX instructions, but for large functions, it saves memory at the expense of speed. Making the function inline eliminates the BL and BX instructions and their execution, but replicates the function code everywhere a call (BL) would have appeared, saving time at the possible expense of memory.

INLINE FUNCTIONS Facts about Inline Functions: Inline functions are replicated, not called An inline function is written in C Easier to use than inline assembly Independent of the target processor Must be defined and called in same file More appropriate for small functions Compiler makes decision to inline or not

INLINE ASSEMBLY A way to insert assembly language source code statements directly into a C program Operations not easily implemented in C: Rotates and arithmetic shift right Bit-field insertion and extraction Bit and byte reversals Double length products Eliminates the need to call an assembly language function.

INLINE ASSEMBLY Two forms: Basic asm: Gives specific instruction parameters that the compiler cannot modify. Extended asm: More difficult to write than basic asm, but sometimes necessary to interact correctly with the compiler’s optimizer.

INLINE ASSEMBLY Option 1: Basic asm Syntax: asm ( AssemblerInstructions ) ; Example: uint32_t x, y ; ... // Rotate x right by 1 bit: "LDR R0,x \n\t" "ROR R0,R0,1 \n\t" "STR R0,y " One or more lines of code separated by whitespace. All but the last must end with \n\t. References to labels like “x” and “y” require that they be global (declared outside all functions)

INLINE ASSEMBLY Option 2: Extended asm Syntax: asm { volatile } ( AssemblerTemplate : OutputOperands { : InputOperands { : Clobbers } } ) ; Curly braces surround optional components; the curly braces are NOT part of what is written Colons separate components Do not write instructions to load an input variable or to store a result. The compiler figures out how to do that from InputOperands and OutputOperands. AssemblerTemplate: Instruction templates w/operand placeholders & options OutputOperands: Specifies C variables to be used as destination operands InputOperands: Specifies C variables to be used as source operands Clobbers: A list of other items modified by side effects of the code

INLINE ASSEMBLY Option 2: Extended asm Example: uint32_t x, y ; // Rotate x right by 1 bit: asm( "ROR %[dst],%[src],1" // Template : [dst] "=r" (y) // Output operand : [src] "r" (x) // Input operand ) ; Code to load and store the operands is generated by the compiler. Registers to use are chosen by the compiler.

AssemberTemplate Component asm { volatile } ( AssemblerTemplate : OutputOperands { : InputOperands { : Clobbers } } ) ; One or more instruction templates, written as strings, separated by whitespace. Instruction operands may be specified explicitly: "MOV R0,0 \n\t" Or by reference to the OutputOperands or InputOperands: "MOV %[identifier],0 \n\t"

Input/Output Operand Components asm { volatile } ( AssemblerTemplate : OutputOperands { : InputOperands { : Clobbers } } ) ; [ identifier ] constraint ( expression), ... [ identifier ] constraint ( expression) One or more entries, separated by commas. Identifier: an input or output operand used in the AssemblerTemplate. Constraint: specifies what kind of values (variable, integer constant, floating-point constant, etc.) may be used. Expression: a C expression – usually just a variable name.

INLINE ASSEMBLY EXAMPLE int result, value, numbits ; ... value = ... ; numbits = ... ; asm ( "ASR %[dst],%[src],%[shift]" : [dst] "=r" (result) : [src] "r" (value), [shift] "ri" (numbits) ) ; printf("%d ASR %d = %d\n", value, numbits, result) ; Operand Constraints AssemblerTemplate OutputOperand InputOperands

INLINE ASSEMBLY Operand Constraints Operand is allowed to be ... r One of the core registers (R0-R15) w One of the floating-point registers (S0-S31) i An integer constant X Any kind of operand is allowed lowercase OperandConstraints are specified as a string containing one or more option letters, possibly preceded by modifiers (next).

INLINE ASSEMBLY Constraint Modifiers When used as a prefix to 'r' or 'w' in the constraint string of an OutputOperand = The first use of the register is as the write-only output of an instruction. The register may be used by a subsequent instruction as an input or reused as an output. + The first use of the register is to provide an input value to an instruction, but is used again later as an output – either by the same or a subsequent instruction. & An output register that should not be a reused input register – usually because an instruction later in the same asm statement needs one of the input operands. Only used in the constraint string of an OutputOperand Only used with "=", as in "=&r" InputOperands are read-only.

“temp” is a variable in C that may not otherwise be used Constraint Modifier "=" “=” tells the compiler that the register’s original contents are irrelevant and may be overwritten. asm ( "MOV %[reg],0 \n\t" "MSR APSR_nzcvq,%[reg] " : [reg] "=r" (temp) : // No InputOperands : "cc" ) ; 1st used as an output Used 2nd as an input “temp” is a variable in C that may not otherwise be used

Constraint Modifier "+" “+” tells the compiler that the operand’s original contents are needed, but are not preserved. asm ( "CMP %[reg],100 \n\t" "IT HI \n\t" "MOVHI %[reg],100 " : [reg] "+r" (score) : // No InputOperands ) ; 1st used as an input Used 2nd as an output C variable “score” will be compared to 100 and possibly assigned the value 100.

Constraint Modifier "&" OutputOperand “quot” must not reuse the same register as used for “dvnd” or “dvsr” since both are needed later in the MLS. “&” tells the compiler that it shouldn’t reuse a input register whose contents have already been used. asm ( "SDIV %[quot],%[dvnd],%[dvsr] \n\t" "MLS %[rem],%[quot],%[dvsr],%[dvnd] " : [quot] "=&r" (quotient), [rem] “=r” (remainder) : [dvnd] “r” (dividend), [dvsr] “r” (divisor) ) ;

INLINE ASSEMBLY Example: Constraints and Modifiers static inline int32_t ASR(int32_t value, uint32_t numbits) { int32_t result ;   asm ("ASR %[dst],%[src],%[shift]" : [dst] "=r" (result) // OutputOperands : [src] "r" (value), // InputOperands [shift] “ir" (numbits) ) ; return result ; } The "=" is required because this is an output. The constraint “ir" allows this operand to be either a register or an integer constant

The Clobbers Component A comma-separated list of strings, each specifying a resource (a register or the flags) modified by the template as a side-effect and which is not listed as an OutputOperand. asm { volatile } ( AssemblerTemplate : OutputOperands { : InputOperands { : Clobbers } } ) ; asm ( "MOV R0,0 \n\t" "MSR APSR_nzcvq,R0 " : // No OutputOperands : // No InputOperands : "cc", "r0" // Clobbers flags and R0 ) ; Using a specific register hinders the optimizer The optimizer only knows about registers that the compiler has assigned – not any that you specify literally, like R0 here. In the clobbers list, registers must be specified using lower case.

The Clobbers Component uint32_t newflags ; ... newflags = 0 ; asm ( "MSR APSR_nzcvq,%[src]" : // No OutputOperands : [src] "r" (newflags) // InputOperand : "cc" // Clobbers flags ) ; The compiler chooses which register to use for "newflags" and generates instructions to load it with a zero.

Combining INLINE FUNCTIONS with INLINE ASSEMBLY static inline int32_t ASR(int32_t value, uint32_t numbits) { int32_t result ;   asm ("ASR %[dst],%[src],%[shift]" : [dst] "=r" (result) // OutputOperands : [src] "r" (value), // InputOperands [shift] "ir" (numbits) ) ; return result ; } Function Call Constraint = "ir" Constraint = "r" y = ASR(x, 5) ; LDR R0,x ASR R0,R0,5 STR R0,y LDR R1,=5 ASR R0,R0,R1 "ir" allows ASR to use a register or a constant for its 3rd operand.

64-bit Operands int64_t dst, src; ... // 64-bit arithmetic right shift asm ( "ASRS %[dstHi],%[srcHi],1 \n\t" "RRX %[dstLo],%[srcLo] " : [dstLo] "=r" (((uint32_t *) &dst)[0]), [dstHi] "=r" (((uint32_t *) &dst)[1]) : [srcLo] "r" (((uint32_t *) &src)[0]), [srcHi] "r" (((uint32_t *) &src)[1]) : "cc" ) ;

The Optional Volatile Keyword Prevents compiler optimizations that may modify, move, or even discard your code. Only use when needed because it may prevent legitimate optimizations that are beneficial. asm { volatile } ( AssemblerTemplate : OutputOperands { : InputOperands { : Clobbers } } ) ;

The Optional Volatile Keyword int k, temp ; ... for (k = 0; k < 10000; k++) { asm volatile ( "MOV %[register],%[register]" : [register] "+r" (temp) ) ; } Suppose this asm statement was used to simply create a delay. Without the volatile keyword, the optimizer might consider it useless and remove it (and thus the loop as well).

Resolving Dependencies (An Ineffective Solution) Used to prevent optimizer from moving other code relative to your asm statement. float series, term ; uint32_t temp ; ... // Change rounding mode to truncate asm volatile ( "VMRS %[reg],FPSCR \n\t" "ORR %[reg],%[reg],0x3 << 22 \n\t" "VMSR FPSCR,%[reg] " : [reg] "=r" (temp) ) ; series += term ; Programmer wants to change the rounding mode BEFORE performing the floating-point addition (series += term). Prevents optimizer from moving the asm statement, but does NOT prevent it from moving the floating-point addition!

Resolving Dependencies (An effective solution) Used to prevent optimizer from moving other code relative to your asm statement. float series, term ; uint32_t temp ; ... // Change rounding mode to truncate asm volatile ( "VMRS %[reg],FPSCR \n\t" "ORR %[reg],%[reg],0x3 << 22 \n\t" "VMSR FPSCR,%[reg] " : [reg] "=r" (temp), "=X" (series) // artificial dependency ) ; series += term ; Remove the volatile keyword. It doesn’t solve this dependency problem. This OutputOperand component creates an “artificial dependency” that tells the optimizer that the asm statement changes the value of "series" (although it actually does not).

COMBINING INLINE FUNCTIONS AND INLINE ASSEMBLY An asm statement simply generates code wherever it appears. Encapsulate it in a function to be able to use it more than once. Use an inline function to eliminate the call/return overhead. Use extended asm to allow the compiler to choose the registers.