Storage 30-Nov-18.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Dynamic Memory Management
Computer Architecture CSCE 350
Lecture 10: Heap Management CS 540 GMU Spring 2009.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
CPSC 388 – Compiler Design and Construction
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
Stacks. 2 What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on the “top” of the stack Anything removed.
Run time vs. Compile time
25-Jun-15 Storage. 2 Parts of a computer For purposes of this talk, we will assume three main parts to a computer Main memory, once upon a time called.
The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:
Stacks. 2 What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on the “top” of the stack Anything removed.
CS 61C L03 C Arrays (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #3: C Pointers & Arrays
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
Pointers Applications
Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
Storage Management. The stack and the heap Dynamic storage allocation refers to allocating space for variables at run time Most modern languages support.
1 Records Record aggregate of data elements –Possibly heterogeneous –Elements/slots are identified by names –Elements in same fixed order in all records.
1 C++ Classes and Data Structures Jeffrey S. Childs Chapter 4 Pointers and Dynamic Arrays Jeffrey S. Childs Clarion University of PA © 2008, Prentice Hall.
Object-Oriented Programming in C++
11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson Slides4-2.ppt Modification date: March 23, Procedures Essential ingredient of high level.
Instruction Sets: Addressing modes and Formats Group #4  Eloy Reyes  Rafael Arevalo  Julio Hernandez  Humood Aljassar Computer Design EEL 4709c Prof:
1 Lecture07: Memory Model 5/2/2012 Slides modified from Yin Lou, Cornell CS2022: Introduction to C.
Lecture 7 Page 1 CS 111 Summer 2013 Dynamic Domain Allocation A concept covered in a previous lecture We’ll just review it here Domains are regions of.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
DYNAMIC MEMORY ALLOCATION. Disadvantages of ARRAYS MEMORY ALLOCATION OF ARRAY IS STATIC: Less resource utilization. For example: If the maximum elements.
CSE 220 – C Programming malloc, calloc, realloc.
CSC 533: Programming Languages Spring 2016
Object Lifetime and Pointers
Data Types In Text: Chapter 6.
Lecture 6 of Computer Science II
Memory Management.
Protecting Memory What is there to protect in memory?
CSC 533: Programming Languages Spring 2015
Storage 18-May-18.
How objects are located in memory
Chapter 2 Memory and process management
Protecting Memory What is there to protect in memory?
Protecting Memory What is there to protect in memory?
Memory Management 6/20/ :27 PM
Dynamic Memory Allocation
Storage Management.
Data Structures Interview / VIVA Questions and Answers
CS 153: Concepts of Compiler Design November 28 Class Meeting
Main Memory Management
O.S Lecture 13 Virtual Memory.
Object Oriented Programming COP3330 / CGS5409
Stacks.
Stacks, Queues, and Deques
CSCE Fall 2013 Prof. Jennifer L. Welch.
Memory Allocation CS 217.
Memory Management Tasks
Coding Concepts (Basics)
Pointers The C programming language gives us the ability to directly manipulate the contents of memory addresses via pointers. Unfortunately, this power.
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Stacks.
CSCE Fall 2012 Prof. Jennifer L. Welch.
Arrays.
COMP755 Advanced Operating Systems
CSC 533: Programming Languages Spring 2019
CMPE 152: Compiler Design May 2 Class Meeting
Presentation transcript:

Storage 30-Nov-18

Parts of a computer For purposes of this talk, we will assume three main parts to a computer Main memory, once upon a time called core, but these days called RAM (Random Access Memory) RAM consists of a very long sequence of bits, organized into bytes (8-bit units) or words (longer units) Peripheral memory, these days called disks (even when they aren’t) or drives Peripheral memory consists of a very, very long sequence of bits, organized into pages of words or bytes Peripheral memory is thousands of times slower than RAM The CPU (Central Processing Unit), which manipulates these bits, and moves them back and forth between main memory and peripheral memory

It’s all bits Everything in a computer is represented by a sequence of bits—integers, floating point numbers, characters, and, most importantly, instructions Bits are the ultimate flexible representation—at least until we have working quantum computers, which use qubits (quantum bits) Modern languages use strong typing to prevent you from accidentally treating a floating point number as a boolean, or a string as an integer A weakly typed language provides some protection, but there are ways around it But it wasn’t always this way...

Storage is storage At one time, words representing machine instructions and words representing data could be intermixed Strong typing was a thing of the future It was the programmer’s responsibility to avoid executing data, or doing arithmetic on instructions Both of these things could be done, either accidentally or deliberately Machine instructions are just a sequence of bits They can be manipulated like any other sequence of bits Hence, programmers could change any instruction into any other instruction (of the same size), or rewrite whole blocks of instructions A self-modifying program is one that changes its own instructions

Self-modifying programs Once upon a time, self-modifying programs were thought of as a good thing Just think of how flexible your programs could be! ...yes, and smoking was once considered good for your health The usual way to step through an array was by adding one to the address part of a load or store instruction You could write some really clever self-modifying programs But, as the poet Piet Hein says: Here’s a good rule of thumb: Too clever is dumb.

Preparation for next example In the next example, we will talk about how a higher-level language might be translated into assembly languages Here are some of the assembly instructions we will use: The load instruction copies a value from a memory location into a special register called the accumulator Example: load 53 gets whatever is in location 53 and puts it into the accumulator The enter instruction puts a given value into the accumulator Example: enter 53 puts 53 itself into the accumulator All arithmetic is done in the accumulator Example: add 53 adds the contents of location 53 to the accumulator The store instruction copies a value from the accumulator into memory Example: store 53 puts whatever is in the accumulator into location 53

Procedure calls Consider the following: a = add(b, c); ... function add(x, y) { return x + y; } Here’s how it might have been translated to assembly language in the old days (red values are filled in as the program runs): 42 [ 0] // a 43 [ 10] // b 44 [ 15] // c 20 [load from 43] // addr of b 21 [store in 71] 22 [load from 44] // addr of c 23 [store in 72] 24 [enter 27] // the return addr 25 [store in addr part of 70] 26 [jump to 73] 27 [store in 42] // addr of a 70 [jump to 27] // gets return addr 71 [ 10] // will receive b 72 [ 15] // will receive c 73 [load value at addr 71] 74 [add value at addr 72] 75 [jump to 70]

Problems with the previous code In this example, storage was static—you always knew where everything was (and it didn’t move around) If you called a function, you told it where to return to, by storing the return address in the function itself Hence, you could call the function from (almost) anywhere, and it would find its way back You stored the parameter values in the function itself This worked fine until recursion was invented Recursion requires: Multiple return addresses Multiple copies of parameters and local variables In other words, recursion requires dynamic storage

The end of an era What really killed off self-modifying programs was the advent of timesharing computers Multiple users, or at least multiple programs, could share the computer, taking turns But there isn’t always enough main memory to satisfy everybody When one program is running, another program (or parts of it) may need to be copied to disk, and brought back in again later This is really, really slow If only the data changed, not the program, we wouldn’t have to save the program (which is often the largest part) over and over and over... Besides, with the new emphasis on understandable programs, self-modifying programs were turning out to be a Really Bad Idea Besides, think about what a security nightmare self-modifying programs could be!

An aside—compilers and loaders Although self-modifying code is a bad idea, it is still necessary for computers to be able to create and modify machine instructions This is what a compiler does—it creates machine instructions A loader takes a compiled program and puts it somewhere in a computer memory It can’t always put it in the same place, so it has to be able to modify the addresses in the instructions Still, compilers and loaders don’t modify themselves

Static and dynamic storage In the beginning, storage was static—you declared your variables at the beginning of the program, and that was all you got A procedure or function with, say, three parameters, got three words in which to store them The parameters went in a fixed, known location in memory, assigned to them by the compiler Recursion had not yet been invented The programming language Algol 60 introduced recursive functions and procedures Parameters went onto a stack Hence, parameters were dynamically assigned to memory locations, not by the compiler, but by the running program itself Storage was dynamically allocated and deallocated as needed

Stacks Stacks obey a simple regimen—last in, first out (LIFO) When you enter a function or procedure or method, storage is allocated for you on the stack When you leave, the storage is released In Java, this is even more fine-grained—storage is allocated and deallocated for individual blocks, and even for for statements Since this is so well-defined, your compiler writes the code to do it for you But it’s still dynamic—done by your running program Since virtually every language supports recursion these days (and all the popular languages do), computers typically provide machine-language instructions to simplify stack operations

Heaps Stacks are great, but they have their limitations Suppose you want to write a method to read in an array You enter the method, and declare the array, thus dynamically allocating space for it You read values into the array You return from the method and POOF! your array is gone You need something more flexible—something where you have control over allocation and deallocation The invention that allows this (which came somewhat later than the stack, I’m not sure when) is the heap You explicitly get storage via malloc (C) or new (Java) The storage remains until you are done with it

Stacks vs. heaps Stack allocation and deallocation is very regular Heap allocation and deallocation is unpredictable Stack allocation and deallocation is handled by the compiler Heap allocation is at the whim of the programmer Heap deallocation may also be up to the programmer (C, C++) or by the programming language system (Java) Values on stacks are typically small and uniform in size In Java, arrays and objects don’t go in the stack—references to them do Values on the heap can be any size Stacks are tightly packed, with no wasted space Deallocation can leave gaps in the heap

user gets from here on down Implementing a heap A heap is a single large area of storage When the program requests a block of storage, it is given a pointer (reference) to some part of this storage that is not already in use The task of the heap routines is to keep track of which parts of the heap are available and which are in use To do this, the heap routines create a linked list of blocks of varying sizes Every block, whether available or in use, contains header information about the block We will describe a simple implementation in which each block header contains two items of information: A pointer to the next block, and The size of this block pointer to next size of block User data (an Object) user gets from here on down

user gets N words from here (ptr) to end of block Anatomy of a block Here is our simple block: pointer to next size of block User data (an Object) user gets N words from here (ptr) to end of block ptr-2 ptr-1 ptr ptr+1 ptr+2 : ptr+N-1 Java Objects hold more information than this (for example, the class of the object) Notice that our implementation will return a pointer to the first word available to the user Data with negative offsets are header data ptr-1 contains the size of this block, including header information ptr-2 will be used to construct a free space list of available blocks

The heap, I Initially, the user has no blocks, and the free space list consists of a single block In our implementation, we will allocate space from the end of the block To begin, let’s assume that the user asks for a block of two words next = 0 size = 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free

The heap, II The user has asked for a block of size 2 The “free” block is reduced in size from 20 to 16 (two words asked for by the user, plus two for a new header) The new block has size 4 and the next field is not used Next, assume the user asks for a block of three words next = 0 size = 16 size = 4 //////////// 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free given to user

The heap, III The user has asked for a block of size 3 The “free” block is reduced in size from 16 to 11 (three words asked for by the user, plus two for a new header) The new block has size 5 and the next field is not used Next, assume the user asks for a block of just one word next = 0 size = 11 size = 5 //////////// size = 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free given to user

The heap, IV The user has asked for a block of size 1 The “free” block is reduced in size from 11 to 8 (one word for the user, plus two for a new header) The new block has size 3 and the next field is not used Next, the user releases the second block (at 13) next = 0 size = 8 size = 3 //////////// size = 5 size = 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free given to user

The heap, V The user has released the block of size 5 The freed block is added to the front of the free space list: Its next field is set to the old value of free free is set to point to this block Next, the user requests a block of size 4 The first block on the free list isn’t large enough, so we have to go to the next free block next = 0 size = 8 size = 3 //////////// next = 2 size = 5 size = 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free

The heap, VI The user requests a block of size 3 The size of the first free block is now 3, and its next field does not change The user gets a pointer to the new block Now the user releases the smallest block (at 10) Again, this will be added to the beginning of the free space list next = 0 size = 2 size = 6 //////////// size = 3 next = 2 size = 5 size = 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 given to user free

The heap, VII The user releases the smallest block (at 10) The freed block is added to the front of the free space list: Its next field is set to the old value of free free is set to point to this block Now the user requests a block of size 4 Currently, we cannot satisfy this request We have enough space, but no single block is large enough However, free blocks 10 and 13 are adjacent to each other We can coalesce blocks 10 and 13 next = 0 size = 2 size = 6 //////////// next = 13 size = 3 next = 2 size = 5 size = 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free

The heap, VIII Blocks at 10 and 13 have now been coalesced The size of the new block is the sum of the sizes of the old blocks We had to adjust the links Now we can give the user a block of size 4 next = 0 size = 2 size = 6 //////////// next = 2 size = 8 size = 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 free

Pointers Allocating storage from the heap is easy Person p = new Person ( ); In Java, you request storage from the heap with new; there is no other way to get storage on the heap All Objects are on the heap In C and C++ you get a pointer to the new storage; in Java you get a reference The implementation is identical; the difference is that there are more operations on pointers than on references C and C++ provide operations on pointers C and C++ let you do arithmetic on pointers, for example, p++; Pointers are pervasive in C and C++; you can't avoid them

Advantages/disadvantages Pointers give you: Greater flexibility and (maybe) convenience A much more complicated syntax More ways to create hard-to-find errors Serious security holes References give you: Less flexibility (no pointer arithmetic) Simpler syntax, more like that of other variables Much safer programs with fewer mysterious bugs Pointer arithmetic is inherently unsafe You can accidentally point to the wrong thing You cannot be sure of the type of the thing you are pointing to

Deallocation There are two potential errors when de-allocating (freeing) storage yourself: De-allocating too soon, so that you have dangling references (pointers to storage that has been freed and possibly reused) A dangling reference is not a null link—it points to something (you just don’t know what) Forgetting to de-allocate, so that unused storage accumulates and you have a memory leak If you have to de-allocate storage yourself, a good strategy is to keep track of which function or method “owns” the storage The function that owns the storage is responsible for de-allocating it Ownership can be transferred to another function or method You just need a clearly defined policy for determining ownership In practice, this is easier said than done

Discipline Most C/C++ advocates say: However: It's just a matter of being disciplined I'm disciplined, even if other people aren't Besides, there are good tools for finding memory problems However: Virtually all large C/C++ programs have memory problems

Garbage collection Garbage is storage that has been allocated but is not longer available to the program It's easy to create garbage: Allocate some storage and save the pointer to it in a variable Assign a different value to that variable A garbage collector automatically finds and de-allocates garbage This is far safer (and more convenient) than having the programmer do it Dangling references cannot happen Memory leaks, while not impossible, are pretty unlikely Practically every modern language, not including C++, uses a garbage collector

Garbage collection algorithms There are two well-known algorithms (and several not so well known ones) for doing garbage collection: Reference counting Mark and sweep

Reference counting When a block of storage is allocated, it includes header data that contains an integer reference count The reference count keeps track of how many references the program has to that block Any assignment to a reference variable modifies reference counts If the variable previously referenced an object (was not null), the reference count of that object is decremented If the new value is an object (not null), the reference count for the new object is incremented When a reference count reaches zero, the storage can immediately be garbage collected For this to work, the reference count has to be at a known displacement from the reference (pointer) If arbitrary pointer arithmetic is allowed, this condition cannot be guaranteed

Problems with reference counting If object A points to object B, and object B points to object A, then each is referenced, even if nothing else in the program references either one This fools the garbage collector, which doesn't collect either object A or object B Thus, reference counting is imperfect and unreliable; memory leaks still happen However, reference counting is a simple technique and is occasionally used

Mark and sweep When memory runs low, languages that use mark-and-sweep temporarily pause the program and run the garbage collector The collector marks every block It then does an exhaustive search, starting from every reference variable in the program, and unmarks all the storage it can reach When done, every block that is still marked must not be accessible from the program; it is garbage that can be freed In order for this technique to work, It must be possible to find every block (so they are in a linked list) It must be possible to find and follow every reference The mark has to be at a known displacement from the reference Again, this is not compatible with arbitrary pointer arithmetic

Problems with mark and sweep Mark-and-sweep is a complex algorithm that takes substantial time Unlike reference counting, it must be done all at once—nothing else can be going on The program stops responding during garbage collection This is unsuitable for many real-time applications

Garbage collection in Java Java uses mark-and-sweep Mark-and-sweep is highly reliable, but may cause unexpected slowdowns You can ask Java to do garbage collection at a time you feel is more appropriate The call is System.gc(); But not all implementations respect your request This problem is known and is being worked on There is also a “Real-time Specification for Java”

No garbage collection in C or C++ C and C++ do not have garbage collection—it is up to the programmer to explicitly free storage when it is no longer needed by the program C and C++ have pointer arithmetic, which means that pointers might point anywhere There is no way to do reference counting if the programming language does not have strict control over pointers There is no way to do mark-and-sweep if the programming language does not have strict control over pointers Pointer arithmetic and garbage collection are incompatible--it is essentially impossible to have both

The End