Download presentation
Presentation is loading. Please wait.
1
Code Tuning Techniques
Relatively small coding changes to improve efficiency Often a tradeoff with code size, memory, and readability Often depends on compiler optimizations These rules don’t always work May need to test them Reference: Code Complete, Second Edition, Steve McConnell, 2004
2
Stop When You Know the Answer
negInputFound = FALSE; for (i=0; i<count; i++){ if (input[i] < 0) { negInputFound = TRUE; break; } Break from the loop when the item is found 14% time savings in C++
3
Loops, Unswitching Same comparison is performed each iteration
for (i=0; i<count; i++){ if (sumType == SUMTYPE_NET) netSum += amount[i]; else grossSum += amount[i]; } Same comparison is performed each iteration
4
Loops, Unswitching Pull comparison out of the loop
if (sumType == SUMTYPE_NET) for (i=0; i<count; i++) netSum += amount[i]; else grossSum += amount[i]; Pull comparison out of the loop 19% time savings in C++
5
Loops, Unrolling Perform two iterations in one Reduce loop overhead
while(i < cnt){ a[i] = i; i++; } i = 0; while(i < cnt - 1){ a[i] = i; a[i+1] = i+1; i += 2; } if (i == cnt-1) a[cnt-1] = cnt-1; Perform two iterations in one Reduce loop overhead 34% improvement in C++
6
Minimize Work in Loops Dereferencing takes time
for (i=0; i<max; i++) a[i] = b[i] + str1->str2->str3->str4; x = str1->str2->str3->str4; a[i] = b[i] + x; Dereferencing takes time 19% time savings in C++
7
Sentinel Values Add a marker to an array to indicate the end
Reduces loop overhead while ((!found) && (i<count)) if (item[i] == testVal) found = TRUE else i++; 3 comparisons needed item[count] = testVal; while (item[i] != testVal) i++;
8
Busy Loop Inside Reduce the number of loop condition checks by putting the long loop inside the small loop for (col=0; col<100; col++) for (row=0; row<5; row++) sum += table[row][col]; 500 inner outer = 600 loop checks for (row=0; row<5; row++) for (col=0; col<100; col++) sum += table[row][col]; 500 inner + 5 outer = 505 loop checks 33% time savings in C++
9
Strength Reduction Replacing complex instructions with simple instructions Multiplication with addition for (i=0; i<saleCount; i++) comm[i]=(i+1) * rev * base * disc; incComm = rev * base * disc; cumComm = incComm; for (i=0; i<saleCount; i++){ comm[i] = cumComm; cumComm += incComm; } 12% performance improvement in C++
10
Int vs. Float Use integer operations rather than float
A form of strength reduction Loss of accuracy float i; for (i=0; i++; i < 99) a[i] = 0; Change i to int 71% performance improvement in C++
11
Explicit Caching Remember results which will be reused in the future
Dynamic Programming double hypotenuse (double A, double B){ return(sqrt(A*A + B*B)); } double hypotenuse (double A, double B){ if ((A==oldA) && (B==oldB)) return (oldHyp); else { oldA=A; oldB=B; oldHyp = sqrt(A*A+B*B); } } 74% performance improvement in C++
12
Initialize at Compile Time
Don’t call library functions when you can use a constant unsigned int log2(unsigned int x) { return (unsigned int) (log(x)/log(2)); } log(2) never changes const double LOG2 = ; unsigned int log2(unsigned int x) { return (unsigned int) (log(x)/LOG2); } 38% performance improvement in C++ May need tables in order to generalize
13
Avoid Slow Library Functions
Rewrite the function as a table lookup Loss of accuracy unsigned int log2(unsigned int x) { if (x<2) return 0; if (x<3) return 1; if (x<4) return 2; … if (x< ) return 30; return 31; } 93% performance improvement in C++
14
Precompute Results Store results in a table. Lookup at runtime
Need to index the table return (loanAmt / ( (1.0 – exp((1.0 + interestRate / 12.0)), - months)) / (interestRate / 12.0))); Need to compute integer index for the table int interestInd = (interestRate – MINRATE) * 100; return (loanAmt / loanTable[interestInd][months]);
15
Operating Systems Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled Car with a camera Car is controlled via the internet Car has its own webserver ( Web interface allows user to control car and see camera images Car also has “auto brake” feature to avoid collisions Fwd Back Left Right Web interface view
16
Multiple Tasks Assume that one microcontroller is being used
At least four different tasks must be performed Send video data - This is continuous while a user is connected Service motion buttons - Whenever button is pressed, may last seconds Detect obstacles - This is continuous at all times Auto brake - Whenever obstacle is detected, may last seconds Detect and Auto brake cannot occur together 3 tasks may need to occur concurrently
17
Prioritized Task Scheduling
Sending Video Data and Detecting Obstacles must happen concurrently Both tasks never complete Servicing Motion Buttons must be concurrent with Sending Video Data Video should not stop when car moves CPU must switch between tasks quickly Some tasks must take priority Auto Brake must have highest priority
18
Sharing Global Resources
Global resources may be required by mulitple tasks ADC, comparators, timers, I/O pins Shared access must be controlled to avoid interference Ex. Task 1 and Task 2 need to use the ADC They cannot use the ADC at the same time One task must wait for the other Operating system guarantees that resource conflicts are resolved
19
Layered OS Architecture
Library Functions System Calls Application Microconrtoller Microconrtoller Application OS provides an abstraction to hide details of hardware Ex. delay(int) library function might setup a timer-based interrupt Using Library functions incurrs overhead
20
Processes vs. Threads Context of a task is its register values, program counter, and stack All tasks have their own context Context switch is when on task stops and the next starts - Must save the old context and load the new - This is time consuming OS typically gives tasks access to memory (i.e malloc) Processes each have their own private memory - Requires memory protection Threads share memory RTOS usually implement tasks as threads
21
Memory Management Programs can request memory dynamically with malloc(); int valarr[10]; int *valarr; valarr = (int *) malloc(10 * sizeof(int)); Dynamically allocated memory must be explicitly released - statically allocated memory is released on function return free(valarr); Dynamic memory allocation is flexible but harder to deal with - Must free the memory manually - Cannot access freed memory
22
OS Memory Management A program cannot know the dynamic memory allocation - Which memory locations are used and which are available? Operating system keeps tables describing which memory locations are available The program must request memory from the OS - OS may deny request if there is no memory available OS also protects memory - Enforce memory access permissions
23
Scheduler OS manages the execution state of each task 3 main states
1. Running – The task is currently running 2. Ready – The task is not running but it is ready to run 3. Blocked – The task is not ready because it is waiting for an event Only one task can be running at a time A task can only run if it is first ready (not blocked) Scheduler must keep track of the state of each task Scheduler must decide which ready task should run
24
Preemption A non-preemptive scheduler allows a task to run until it gives up control of the CPU - Task may call a library function (sleep) to quit - Needs to be awakened by an event, like an interrupt - Not much flexibility for OS to meet deadlines A preemptive scheduler allows the OS to stop a running task and start another task - OS has the power to influence the completion of tasks - OS must be awakened periodically to make scheduling decisions - May implement the OS kernel as a high priority timer-based interrupt
25
Scheduling Algorithms
Round-Robin: Scheduler keeps an ordered list of ready tasks First task is assigned a fixed-size time slice to execute After time slice is done, task is placed at the end of the list and next task executes for its time slice Very simple, no priorities Context switch time Task execution Task 1 Task 2
26
Prioritized Scheduling
Fixed Priority Preemptive: Scheduler keeps an ordered list of ready tasks, ordered by priority First task is assigned a fixed-size time slice to execute After time slice is done, scheduler chooses highest priority ready task for next time slice Next task might be the same as the previous task, if it is high priority Low priority High Priority Starvation may occur
27
Atomic Updates Tasks may need to share global data and resources
For some data, updates must be performed together to make sense Ex. Our system samples the level of water in a tank tank_level is level of water time_updated is last update time tank_level = // Result of computation time_updated = // Current time These updates must occur together for the data to be consistent Interrupt could see new tank_level with old time_updated
28
Mutual Exclusion While one task updates the shared variables, another task cannot read them Task 1 Task 2 tank_level = ?; time_updated = ?; printf (“%i %i”, tank_level, time_updated); Two code segments should be mutually exclusive If Task 2 is an interrupt, it must be disabled
29
Semaphores A semaphore is a flag which indicates that execution is safe May be implemented as a binary variable, 1 continue, 0 wait TakeSemaphore(): If semaphore is available (1) then take it (set to 0) and continue If semaphore is note available (0) then block until it is available ReleaseSemaphore(): Set semaphore to 1 so that another task can take it Only one task can have a semaphore at one time
30
Critical Regions Task 1 Task 2
TakeSemaphore(); tank_level = ?; time_updated = ?; ReleaseSemaphore(); TakeSemaphore(); printf (“%i %i”, tank_level, time_updated); ReleaseSemaphore(); Semaphores are used to protect critical regions Two critical regions sharing a semaphore are mutually exclusive Each critical region is atomic, cannot be separated
31
POSIX Threads (Pthreads)
IEEE POSIX c: Standard for a C language API for thread control All pthreads in a process share, Process ID Heap File descriptors Shared libraries Each pthread maintains its own, Stack pointer Registers Scheduling properties (such as policy or priority) Set of pending and blocked signals
32
Thread-safeness Ability to execute multiple threads concurrently without making shared data inconsistent Don’t use library functions that aren’t thread-safe
33
Pthreads API Four types of functions in the API
Thread management: Routines that work directly on threads - creating, detaching, joining, etc. Mutexes: Routines that deal with synchronization Condition variables: Routines that address communications between threads that share a mutex. Synchronization: Routines that manage read/write locks and barriers. pthreads.h header file needs to be included in source file gcc –pthread to compile it
34
Thread Management pthread_create pthread_exit
Creates a new thread and makes it executable Arguments Thread: pthread_t pointer to return result Attr: Initial attributes of the thread Start_routine: Code for the thread to run Arg: Argument for the code (void *) pthread_exit Terminate a thread Does not close files on exit
35
Thread Management Creates a set of threads, all running PrintHello
int main (int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc; long t; for(t=0; t<NUM_THREADS; t++){ printf("In main: creating thread %ld\n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); if (rc){ printf("ERROR; return code is %d\n", rc); exit(-1); } pthread_exit(NULL); Creates a set of threads, all running PrintHello Takes an argument, the thread number
36
Thread Management Code run by each thread Prints its own ID number
void *PrintHello(void *threadid) { long tid; tid = (long)threadid; printf("Hello World! It's me, thread #%ld!\n", tid); pthread_exit(NULL); } Code run by each thread Prints its own ID number
37
Joining Threads Joining threads is a way of performing synchronization
Master blocks on pthread_join until worker exits Worker must be made joinable via its attributes
38
Joining Example int main (int argc, char *argv[]) { pthread_t aThread; pthread_attr_t attr; int rc, *t=0; void *status; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); rc = pthread_create(&thread[t], &attr, BusyWork, (void *)t); pthread_attr_destroy(&attr); … // Do something rc = pthread_join(thread[t], &status); pthread_attr_* define attributes of the thread (make it joinable) pthread_attr_destroy frees the attribute structure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.