Download presentation
Presentation is loading. Please wait.
Published byAnnabelle Ellis Modified over 8 years ago
1
Embedded Systems Programming Writing Optimised C code for ARM
2
Why write optimised C code? For embedded system size and/or speed are of key importance The compiler optimisation phase can only do so much In order to write optimal C code you need to know details of the underlying hardware and the compiler
3
What compilers can’t do void memclr( char * data, int N) { for (; N > 0; N--) { *data=0; data++; } Is N == on first loop? –0 – 1 is dangerous! Is data array 4 byte aligned? –Can store using int Is N a multiple of 4? –Could do 4 word blocks at a time Compilers have to be conservative!
4
An example Program The program might seem fine – even resource friendly Using a char saves space for loops make good assembler Lets look at the assembler code /* program showing inefficient * variable and loop * usage craig Nov 04 */ int checksum_1(int *data) { char i; int sum = 0; for (i =0; i < 64; i++) sum += data[i]; return sum; }
5
.text.align2.globalchecksum_1.type checksum_1,function checksum_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, current_function_anonymous_args = 0 movip, sp stmfdsp!, {fp, ip, lr, pc} subfp, ip, #4 movr1, r0 movr0, #0@ sum = 0 movr2, r0@ i = 0.L6: ldrr3, [r1, r2, asl #2]@ data[i] addr0, r0, r3@ sum = data[i] addr3, r2, #1@ i ++ andr2, r3, #255 cmpr2, #63@ i < 64 bls.L6 ldmeafp, {fp, sp, pc}.Lfe1:.size checksum_1,.Lfe1-checksum_1
6
What is wrong? The use of char means that the compiler has to cast to look at 8 bits – using –andr2, r3, #255 The loop variable requires a register and initialisation If the loop is called often then the tests and branch is quite an overhead
7
Variable sizes In general the compiler will use 32bit registers for local variables but will have to cast them when used as 8 or 16 bit values If you can, use unsigned ints, if you can’t explicitly cast Using signed shorts can be quite a problem for compilers
8
Watch your shorts! The above C code turns into the rather nasty assembler The gnu C compiler is very cautious when confronted with short variables short add( short a, short b) { return a + (b >> 1); } movip, sp stmfdsp!, {fp, ip, lr, pc} subfp, ip, #4 movr1, r1, asl #16 movr0, r0, asl #16 movr0, r0, asr #16 addr0, r0, r1, asr #17 movr0, r0, asl #16 movr0, r0, asr #16 ldmeafp, {fp, sp, pc Becomes ….
9
Loops #1 As well as using a char for a loop counter the loop counter could be redundant Terminate loops by counting down to 0 the reduces register usage and means no initialisation Use do..while instead of for loops
10
*/ * Program to show efficient use of * variables and loops */ int checksum_2(int *data) { int sum = 0, i = 64; do { sum += *(data++); } while ( --i != 0 ); return sum; } Efficient loop C
11
Efficient loop assembler checksum_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, current_function_anonymous_args = 0 movip, sp stmfdsp!, {fp, ip, lr, pc} subfp, ip, #4 movr1, r0 movr0, #0@ sum = 0 movr2, #64@ i = 64.L6: ldrr3, [r1], #4@ *(data++) addr0, r0, r3@ sum = *(data++) subsr2, r2, #1@ --i bne.L6 ldmeafp, {fp, sp, pc}
12
Loop unrolling If a loop is going to be repeated often then the test and branch can be quite an overhead If the loop is a multiple of 4 and is done quite a lot then the loop can be unrolled This increases code a size but is more speed efficient Sizes that are not multiples of 4 can be done but are less efficient.
13
An unrolled loop * Program to show efficient use of * variables and loops & loop unrolling */ int checksum_2(int *data) { int sum = 0, i = 64; do { sum += *(data++); i -= 4; } while ( i != 0 ); return sum; }
14
checksum_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, current_function_anonymous_args = 0 movip, sp stmfdsp!, {fp, ip, lr, pc} subfp, ip, #4 movr2, r0 movr0, #0 movr1, #64.L6: ldrr3, [r2], #4 addr0, r0, r3 ldrr3, [r2], #4 addr0, r0, r3 ldrr3, [r2], #4 addr0, r0, r3 ldrr3, [r2], #4 addr0, r0, r3 subsr1, r1, #4 bne.L6 ldmeafp, {fp, sp, pc}
15
/* Program to show use of * loop unrolling */ int checksum_2(int *data, unsigned int N) { int sum = 0; unsigned int i; for ( i = N/4; i != 0; i--) { sum += *(data++); } for ( i = N&3; i != 0; i--) sum += *(data++); return sum; } Loop unrolling ! = 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.