Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Embedded Systems Rabie A. Ramadan 4.

Similar presentations

Presentation on theme: "Introduction to Embedded Systems Rabie A. Ramadan 4."— Presentation transcript:

1 Introduction to Embedded Systems Rabie A. Ramadan 4

2 Memory Models 2 Stacks A stack is a region of memory that is dynamically allocated to the program in a last-in, first-out (LIFO) pattern. A stack pointer (typically a register) contains the memory address of the top of the stack. Stacks are typically used to implement procedure calls.

3 Memory Models-Stacks 3 In C, the compiler produces code that pushes onto the stack the location of: instruction to execute upon returning from the procedure, the current value of some or all of the machine registers, the arguments to the procedure, sets the program counter equal to the location of the procedure code. Stack Frame The data for a procedure that is pushed onto the stack. When a procedure returns: the compiler pops its stack frame, retrieving the program location at which to resume execution.

4 4

5 Memory Models-Stacks 5 It can be disastrous if the stack pointer is incremented beyond the memory allocated for the stack - stack overflow Result in overwriting memory that is being used for other purposes. Becomes particularly difficult with recursive programs, where a procedure calls itself. recursion Embedded software designers often avoid using recursion to circumvent this difficulty.

6 misuse or misunderstanding of the stack 6 When calling foo (), c refers to the return address – after returning the stack frame c becomes address of b which cause addressing problem.

7 Memory Protection Units 7 A key issue in systems that support multiple simultaneous tasks is preventing one task from disrupting the execution of another. Many processors provide memory protection in hardware. Tasks are assigned their own address space, and if a task attempts to access memory outside its own address space, a segmentation fault or other exception results. This will typically result in termination of the offending application. The memory is allocated from a data structure known as a heap, which facilitates keeping track of which portions of memory are in use by which application

8 Which part of the memory is utilized ? 8 Age, salary, myList, and twice are stored into the stack

9 Which part of the memory is utilized ? 9 Certainly, heap memory since the memory is allocated using malloc

10 What is wrong with this code? 10 if you run the above code, it may not give a segmentation fault immediately as free() returns the memory to heap and now its up to the implementation of heap to take it back to its pool

11 What is wrong with this code? 11 we have pointer 'p', to which we have not allocated any memory. Now we use the garbage address held by the pointer 'p' in the function 'strcat()'. So in the implementation of strcat(), whenever 'p' is accessed, it will give a segmentation fault.

12 What is wrong with this code? 12 we try to access the second argument from command line in the function func() without even checking whether the user has even provided the second argument or not. If the user did not provide then argv[1] will point to a location that our code does not have access to. Hence, in that case we will definitely get a segmentation fault.

13 What is wrong with this code? 13 we allocate some bytes to pointer 'p' but try to write way past these bytes in a loop. So, the result we get is a segmentation fault.

14 Memory Models- Dynamic Memory Allocation 14 General-purpose software applications often have indeterminate requirements for memory, depending on parameters and/or user input. To support such applications, computer scientists have developed dynamic memory allocation schemes, a program can at any time request that the operating system allocate additional memory. The memory is allocated from a data structure known as a heap, which facilitates keeping track of which portions of memory are in use by which application.

15 Memory Models- Dynamic Memory Allocation 15 Memory allocation occurs via an operating system call (such as malloc in C). When the program no longer needs access to memory that has been so allocated, it deallocates the memory (by calling free in C). it is possible for a program to inadvertently accumulate memory that is never freed. This is known as a memory leak, for embedded applications, which typically must continue to execute for a long time, it can be disastrous. The program will eventually fail when physical memory is exhausted.

16 Memory Models- Dynamic Memory Allocation 16 memory fragmentation occurs when a program chaotically allocates and deallocates memory in varying sizes. A fragmented memory has allocated and free memory chunks interspersed, and often the free memory chunks become too small to use. In this case, defragmentation is required. Defragmentation and garbage collection are both very problematic for real-time systems. Straightforward implementations of these tasks require all other executing tasks to be stopped while the defragmentation or garbage collection is performed. Implementations using such “stop the world” techniques can have substantial pause times, running sometimes for many milliseconds.

17 Programs 17

18 Code Compression 18 Memory is one of the key driving factors in embedded system design larger memory indicates an increased chip area, more power dissipation, and higher cost. memory imposes constraints on the size of the application programs. Code compression techniques address the problem by reducing the program size.

19 Traditional Code Compression 19 Compression is done off-line (prior to execution) Compressed program is loaded into the memory. Decompression is done during the program execution (online).

20 Dictionary-based Approach 20 Take the advantage of commonly occurring instruction sequences by using a dictionary The repeating occurrences are replaced by a codeword that points to the index of the dictionary that contains the pattern.

21 Improved Dictionary-based Approach 21 Improve the dictionary based compression technique by considering mismatches. Step1: Determine the instruction sequences that are di ff erent in few bit positions (hamming distance) Step 2: Store that information in the compressed program Step 3: Update the dictionary (if necessary). The compression ratio will depend on how many bit changes are considered during compression

22 Example 22 This example considers only 1-bit change the third pattern (from top) in the original program is di ff erent from the first dictionary entry (index 0) on the sixth bit position (from left). The compression ratio for this example is 95%.

23 CODE COMPRESSION USING BIT- MASKS 23 Your Reading Homework Link A presentation is required – I will be selecting randomly one of you to explain it next time.

24 Memory Optimization Techniques 24

25 PLATFORM-INDEPENDENT CODE TRANSFORMATIONS 25 Code Rewriting Techniques for Access Locality and Regularity Consisting of loop (and sometimes also data flow) transformations, Should this algorithm be implemented directly?

26 Code Rewriting Techniques for Access Locality and Regularity 26 Result in high storage and bandwidth requirements (assuming that N is large), b[] signals have to be written to an off-chip background memory in the first loop and read back in the second loop.

27 Code Rewriting Techniques for Access Locality and Regularity 27 Rewriting the code using a loop merging transformation, gives the following: b[] signals can be stored in registers up to the end of the accumulation, since they are immediately consumed after they have been produced. In the overall algorithm, this reduces memory bandwidth requirements significantly,

28 Code Rewriting Techniques to Improve Data reuse 28 It is important to optimize data transfers and storage to utilize the memory hierarchy efficiently The compiler literature up to now focused on improving data reuse by performing loop transformations. Hierarchical data reuse copies are added to the code, exposing the different levels of reuse

29 Code Rewriting Techniques to Improve Data reuse 29 Depends on the knowledge about the memory hierarchy and their sizes. Still hard to implement as well as to understand Only Part of the arrays are accessed in the internal loops Make them ready in buffers

30 Types of loop transformations © 2006 Elsevier Loop permutation changes order of loops. Index rewriting changes the form of the loop indexes. Loop unrolling copies the loop body. Loop splitting creates separate loops for operations in the loop body. Loop fusion or loop merging combines loop bodies. Loop padding adds data elements to an array to change how the array maps into memory.

31 Loop permutation © 2006 Elsevier Changes the order of loop indices Can help reduce the time needed to access matrix elements 2-D arrays in C are stored in row major order Access the data row by row. Example of matrix-vector multiplication

32 Loop fusion © 2006 Elsevier Combines loop bodies for (i = 0; i <N; i++)for (i = 0; i <N; i++) { x[i] = a[i] * b[i]; for (i = 0; i <N; i++) y[i] = a[i] * c[i]; y[i] = a[i] * c[i]; } Original loopsAfter loop fusion How might this help improve performance?

33 Buffer management © 2006 Elsevier In embedded systems, buffers are often used to communicate between subsystems Excessive dynamic memory management wastes cycles, energy with no functional improvements. Many embedded programs use arrays that are statically allocated Several loop transformations have been developed to make buffer management more efficient Before: for (i=0; i<N; ++i) for (j=0; j<N-L; ++j) b[i][j] = 0; for (i=0; i<N; ++i) for (j=0; j<N-L; ++j) for (k=0; k<L; ++k) b[i][j] += a[i][j+k]; After: for (i=0; i<N; ++i) for (j=0; j<N-L; ++j) b[i][j] = 0; for (k=0; k<L; ++k) b[i][j] += a[i][j+k]; closer

34 Memory Estimation 34 One of the techniques is based on live elements (Signals) Requires a dependency graph In computer sciences a dependency graph is directed graph representing dependencies of several instructions towards each other

35 Example 35

36 Lets build the Dependency graph 36

37 37

38 Dependences 38 Instruction Dependency The operation performed by a stage depends on the operation(s) performed by other stage(s). E.g. Conditional Branch  Instruction I 4 can not be executed until the branch condition in I 3 is evaluated and stored.

39 Dependences 39  Data Dependency:  A source operand of instruction I i depends on the results of executing a proceeding I j i > j  E.g.  I j can not be fetched unless the results of I i are saved.

40 Data Dependency  Write after write  Read after write  Write after read  Read after read  does not cause stall

41 Read after write

42 Example Consider the execution of the following sequence of instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. Show all types of data dependency

43 Answer

44 Memory Modeling 44 Based on the dependency and data flow graph, All variables that need to be preserved over more than one control step are stored in registers. The minimization of the number of registers assigned to the variables because the register count impacts the area of the resulting design.

45 Register Allocation by Graph Coloring 45 The life time of each variable is computed first, A graph is constructed whose nodes represent variables, The existence of an edge indicates that the life times overlap, i.e., they cannot share the same register; A register can only be shared by variables with nonoverlapping life times. Thus, the problem of minimizing the register count for a given set of variables and their life times is equivalent to the graph coloring problem. Assign colors to each node of the graph such that the total number of colors is minimum and no two adjacent nodes share the same color

46 The Phase Order Problem In which order do we apply a number of optimizations to the program to achieve the greatest benefit? Original code First optimization method Used only one register Can we optimize it more ? Now, pipeline can be more efficient

47 Multiple Memory Access Optimization © 2006 Elsevier Many microprocessors have instructions which load or store two or more registers. A typical example is the ARM7 processor: it has `LDM' load-multiple and `STM' store multiple instructions.

Download ppt "Introduction to Embedded Systems Rabie A. Ramadan 4."

Similar presentations

Ads by Google