Download presentation
Presentation is loading. Please wait.
1
Performance Optimization for Embedded Software
Presented by: Yingjun Lyu
2
What is Software Optimization?
The process of modifying a software system —> work more efficiently or use fewer resources
3
Do you Optimize your Program?
4
When to Optimize? A better approach: design first, code from design, profile the code Keep performance goals in mind
5
Levels of Optimization
Design level Algorithms and data structures Source code level while(1) vs for(;;) Build level Compile level Assembly level Run time
6
The Code Optimization Process
Build —> Optimize —> Check outputs Build —> Generate tests —> Optimize —> Check outputs
7
Basic C Optimization Techniques
Choose the right data type Example: a processor does not support a 32-bit multiplication. Use of a 32-bit type in a multiply—> A sequence of 16-bit operations What if only a 16-bit precision is needed? Solution: Use intrinsics to leverage embedded processor features.
8
An intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler.
9
Function calling conventions Definition: an implementation-level (low-level) scheme for how callees receive parameters from their caller and how they return a result. Stack-based or Register-based?
10
Restrict and point aliasing Compiler knows pointers do not alias—>Parallelism
11
Loops Communicate loop count information: specify the loop count bounds to the compiler Example: Hardware loop: keep the loop body in a buffer or prefetching
12
General Loop Transformation
Loop unrolling Multisampling Partial summation Software pipelining
13
Loop unrolling: A loop body is duplicated one or more times
Loop unrolling: A loop body is duplicated one or more times. The loop count is then reduced by the same factor to compensate.
14
Multisampling: independent output values that have an overlap in input source data values
15
Partial Summation: The computation for one output sum is divided into multiple smaller, or partial, sums.
16
Software pipelining: A sequence of instructions is transformed into a pipeline of several copies of that sequence
17
Is there any cost for performance optimization?
18
Example: Loop Unrolling
19
Code Size Optimization
Why? Code Size —> The amount of space in memory the code will occupy at program run-time and the potential reduction in the amount of instruction cache needed by the device.
20
Compiler flags (configure the compiler)
Optimize code size Example: command line option -Os in the GNU GCC compiler Optimize performance O3Os? Critical code is optimized for speed and the bulk of the code may be optimized for size
21
“Premium encodings”: The most commonly used instructions can be represented in a reduced binary footprint Example: integer add instructions in a 32-bit device are represented with a premium 16-bit encoding Drawback: Performance Degration
22
Tuning the ABI for code size ABI: application binary interface, an interface between a given program and the OS, system libraries, etc. To reduce code size, there are two areas of interest: calling convention and alignment
23
Fewer instructions are required for setting up parameters to be passed via registers than for those to be passed via the stack. Calling Convention
24
Increase cache misses and register pressure
Space-time Tradeoff Depend on the unrolling factor Increase cache misses and register pressure
25
Space-time Tradeoff
26
Improve Performance through memory layout optimization
Vectorization of loops Computation performed across multiple loop iterations can be combined into single vector instructions.
27
An important concern for vectorizing:
Loop Dependence Analysis: array access, data modification, conditional statement, etc Challenge: Pointer aliasing Solution: Place restrict keyword
28
Array-of-structures or Structure-of-arrays
Array-of-structures or Structure-of-arrays? Hint: Memory is most efficiently accessed sequentially.
29
Source Code Level Optimization
Performance bug: Bugs that cause significant performance degradation PerfChecker: a performance bug detection tool for mobile applications (static analysis)
32
GUI lagging becomes the most dominant bug types(75.7%)
Long running operations in main threads
33
View holder design pattern
34
[1] Oshana and Kraeling. Software Engineering for Embedded Systems: Methods, Practical Techniques, and Applications - Chapter 11: Optimizing Embedded Software for Performance [2] Oshana and Kraeling. Software Engineering for Embedded Systems: Methods, Practical Techniques, and Applications - Chapter 12: Optimizing Embedded Software for Memory [3] Heydemann, K., Bodin, F., Knijnenburg, P. M. W. and Morin, L. (2006), UFS: a global trade-off strategy for loop unrolling for VLIW architectures. Concurrency Computat.: Pract. Exper., 18: 1413– doi: /cpe.1014 [4] Yepang Liu, Chang Xu, and Shing-Chi Cheung Characterizing and detecting performance bugs for smartphone applications. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, DOI= [5]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.