L6: Malloc Lab Writing a Dynamic Storage Allocator October 30, 2006 15-213 “The course that gives CMU its Zip!” L6: Malloc Lab Writing a Dynamic Storage Allocator October 30, 2006 Topics Memory Allocator (Heap) L6: Malloc Lab Reminders L6: Malloc Lab Due Nov 10, 2006 Section A (Donnie H Kim) recitation8.ppt (some slides from lecture notes)
L6: Malloc Lab Things that matter in this lab: Performance goal Maximizing throughput Maximizing memory utilization Implementation Issues (Design Space) Free Block Organization Placement Policy Splitting Coalescing And some advice
Some sort of useful backgrounds
So what is memory allocation? kernel virtual memory memory invisible to user code stack %esp Memory mapped region for shared libraries Allocators request additional heap memory from the operating system using the sbrk function. the “brk” ptr run-time heap (via malloc) uninitialized data (.bss) initialized data (.data) program text (.text)
Malloc Package #include <stdlib.h> void *malloc(size_t size) If successful: Returns a pointer to a memory block of at least size bytes, (typically) aligned to 8-byte boundary. If size == 0, returns NULL If unsuccessful: returns NULL (0) and sets errno. void free(void *p) Returns the block pointed at by p to pool of available memory p must come from a previous call to malloc or realloc. void *realloc(void *p, size_t size) Changes size of block p and returns pointer to new block. Contents of new block unchanged up to min of old and new size.
Allocation Examples p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(2)
Performance goals Maximizing throughput (Temporal) Defined as the number of requests that it completes per unit time Maximizing Memory Utilization (Spatial) Defined as the ratio of the requested memory size and the actual memory size used There is a tension between maximizing throughput and utilization! Find an appropriate balance between two goals! Keep this in mind, we will come back to these issues
Implementation Issues Free Block Organization How do we keep track of the free blocks? How do we know how much memory to free just given a pointer? Placement Policy How do we choose an appropriate free block? Splitting What do we do with the extra space when allocating a structure that is smaller than the free block it is placed in? Coalescing How do we reinsert freed block? p0 free(p0) p1 = malloc(1)
Implementation Issues 1: Free Block Organization Identifying which block is free or allocated Available design choices of how to manage free blocks Implicit List Explicit List Segregated List Header, Footer organization storing information about the block (size, allocated, freed)
Keeping Track of Free Blocks Method 1: Implicit list using lengths -- links all blocks Method 2: Explicit list among the free blocks using pointers within the free blocks Method 3: Segregated free list Different free lists for different size classes Method 4: Blocks sorted by size Can use a balanced tree (e.g. Red-Black tree) with pointers within each free block, and the length used as a key 5 4 6 2 5 4 6 2
Free Block Organization Free Block with header 1 word a = 1: allocated block a = 0: free block size: block size payload: application data (allocated blocks only) size a payload Format of allocated and free blocks optional padding
Free Block Organization Free Block with Header and Footer Header size a a = 1: allocated block a = 0: free block size: total block size payload: application data (allocated blocks only) payload and padding Format of allocated and free blocks Boundary tag (footer) size a
Implementation Issues 2: Placement Policy “Placement Policy” choices First Fit Search free list from the beginning and chose the first free block Next Fit Starts search where the previous search has left off Best Fit Examine every free block to find the best free block
Implementation Issues 3: Splitting “Splitting” Design choices Using the entire free block Simple, fast Introduces internal fragmentation (good placement policy might reduce this) Splitting Split free block into two parts, when second part can be used for other requests (reduces internal fragmentation) p1 = malloc(1)
Implementation Issues 4: Coalescing False Fragmentations Free block chopped into small, unusable free blocks Coalesce adjacent free blocks to get bigger free block Coalescing - Policy decision of when to perform coalescing Immediate coalescing Merging any adjacent blocks each time a block is freed Deferred coalescing Merging free blocks some time later Ex) when allocation request fails. Trying “Bidirectional Immediate Coalescing” proposed by Donald Knuth would be good enough for this lab
Performance goals Maximizing throughput (Temporal) Defined as the number of requests that it completes per unit time Maximizing Memory Utilization (Spatial) Defined as the ratio of the requested memory size and the actual memory size used There is a tension between maximizing throughput and utilization! Find an appropriate balance between two goals!
Performance goal (1) - Throughput Throughput is mostly determined by time consumed to search free block How you keep track of your free block affects search time Naïve allocator Never frees block, just extend the heap when you need a new block : throughput is extremely fast, but…? Implicit Free List The allocator can indirectly traverse the entire set of free blocks by traversing all of the blocks in the heap, definitely slow. Explicit Free List The allocator can directly traverse entire set of free blocks by traversing all of the free blocks in the heap Segregated Free List The allocator can directly traverse a particular free list to find an appropriate free block
Performance goal (2) – Memory Utilization Poor memory utilization caused by fragmentation Comes in two forms: internal and external fragmentation Internal Fragmentation Based on previous requests Causes Allocator impose minimal size of block (depending on allocator’s choice of block format) Satisfying alignment requirements External Fragmenatation Based on future requests Aggregate free memory is enough, but no single free block is large enough to handle the request
Internal Fragmentation For some block, internal fragmentation is the difference between the block size and the payload size. Caused by overhead of maintaining heap data structures, padding for alignment purposes, or explicit policy decisions (e.g., not to split the block). Depends only on the pattern of previous requests, and thus is easy to measure. block Internal fragmentation Internal fragmentation payload
External Fragmentation Occurs when there is enough aggregate heap memory, but no single free block is large enough p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) p4 = malloc(6) oops! External fragmentation depends on the pattern of future requests, and thus is difficult to measure.
The Malloc Lab
Assumptions Assumptions made in Malloc Lab Standard C library malloc always returns payload pointer that is aligned to 8 bytes, so should yours 64-bit Architecture pointers are 8 bytes long! size_t is now 8 bytes (unsigned long) But the requested size will be less than 4 bytes You may use 4 byte headers and footers and get away Free word Allocated block (4 words) Free block (2 words) Allocated word
Porting to 64-bit Machine Porting the code in your CS:APP text book to 64-bit sizeof(long) == 4 // 32-bit sizeof(long) == 8 // 64-bit The only significant difference is in the definitions of the GET and PUT macros. Changes (To keep our 32-bit header and footers) #define GET(p) (*(size_t *)(p)) // 32 bits #define GET(p) (*(unsigned int *)(p)) // 64 bits #define PUT(p, val) (*(size_t *)(p) = (val)) // 32 bits #define PUT(p, val) (*(unsigned int *)(p) = (val)) // 64 bits if ((long)(bp = mem_sbrk(size)) < 0) if ((int)(bp = mem_sbrk(size)) < 0)
Using MACROS – why? #include <stdio.h> #define GET8(p) (*(unsigned long *)(p)) #define PUT8(p, val) (*(unsigned long *)(p) = (unsigned long)(val)) void test(void *p, void *pval){ unsigned long *newpval; /* Reading and writing pointers the hard way */ *(unsigned long *)p = (unsigned long) pval; newpval = (unsigned long *)(*(unsigned long *)p); printf("pval=%p newpval=%p\n", pval, newpval); /* Reading and writing pointers the easy way */ PUT8(p, pval); newpval = (unsigned long *) GET8(p); } int main() { char *pval = (char *)0x99; char buf[128]; test(&buf[0], pval); return 0;
Approach Advice Start with the implicit list implementation in your text book, and understand every details of it When you finish your implicit list, start thinking about your heap checker The more time you spend on this, the more time you will save later Go on and start implementing explicit list with several placement policies Modulate, and save each of your placement policy for comparison When you finish your explicit list, you would like to add more checks in your heap checker, do this right away. Now when you feel your explicit list is robust, move on to the segregated free list. We are looking for a good segregated free list implementation. You can go further by trying other schemes such as balanced trees, but a solid segregated free list implementation is good enough for a full credit You can also try some tweaks on the given trace files
Heap Checker (10 pts) Basic Checks Guidelines (5/10 pts) Check Heap (while working on implicit list) Check epilogue and prologue blocks Block’s address alignment (8 bytes) Heap boundaries Check your blocks’ header and footer Size (minimum size , alignment) prev/next allocate/free bit consistency (explicit list) header and footer matching each other Check your coalescing All blocks are coalesced correctly (no two consecutive free blocks in the heap)
Heap Checker (10 pts) Free List Checks Guidelines (5/10 pts) Check Free List (while working on explicit free list) All next/prev pointers are consistent (If A’s next pointer points to B, B’s prev pointer should point to A) All free list pointers points between mem_heap_lo() and mem_heap_high() Count free blocks by iterating every block, and traversing free list by pointers, see if they match Recommended to add more as you wish Check Segregated Free List (segregated free list) All blocks in each list bucket fall within bucket size range Be creative
Style (10 pts) It will be some of the most difficult and sophisticated code you have written so far in your career. Thing we are looking for: Explain your high level design at front of your code (2 pts) Each function should be prepared by a header comment (2 pts) Comment properly inside each functions (2 pts) Decompose into functions and use as few global variables as possible (2 pts) Use macros, inline functions, C preprocessors wisely (2 pts) Please try to write a clean code that is readable and self-explaining! For you For your Teaching Staff And for world peace
Debugging Techniques Guidelines for Debugging Intensively testing your code even though it seems to work is a good programming practice, try to learn the process from this lab You can print out all the information and monitor it Do this when you just started When the trace file is small You can also print out error messages only when something is wrong Printing and monitoring becomes painful when trace files are huge Just print errors
Debugging Tips Guidelines for using mdriver’s options Use ./mdriver –c <file> option to run a particular trace file just once, which only checks correctness ./mdriver runs your allocator multiple times to estimate the throughput of your allocator by using k-best measurement scheme (if you are interested, refer to ch 9 and mdriver source code) Use ./mdriver –v <level> option to set verbosity level It is sometimes useful to have layers of debugging depth Can also use #define, #ifdef, #if Make sure to turn all checking routines off completely when measuring performance – it does affect performance
More Hints? Going further (beyond solid segregated list) Before trying this, make sure your allocator is doing what you intended, using heap/free list checkers If you think you have implemented a solid segregated free list, try focus on trace files that gives you less performance results
More Hints? Some possible tackle points In malloc(), you have to adjust the requested size to meet alignment requirements or minimum block size requirements It turns out that how you adjust size affects the performance of some trace files And sometimes it is better to force your allocator to avoid splitting the free block by using larger block than the request size It will obviously increase internal fragmentation, but can also increase throughput by avoiding repeated splitting and coalescing How large will you extend your heap, when you have to extend your heap? How do you classify each free list?
Questions ?