Linux Kernel Development Memory Management Pavel Sorokin Gyeongsang National University
2 Overview Unlike user-space, the kernel is not always afforded the capability to easily allocate memory Pages Zones Memory allocation procedures Slab layer
3 Pages Physical pages are basic unit of memory management Each architecture enforces its own page size 32-bit architecture-4 Kb page size 64-bit architecture-8 Kb page size Kernel represents every physical page on system with a struct page structure struct page { page_flags_tflags; atomic_t_count; atomic_t_mapcount; unsigned longprivate; struct address_space*mapping; pgoff_tindex; struct list_headlru; void*virtual; }; flags – stores the status of page _count – how many references there are to this page virtual – stores virtual address The goal of page structure – to describe physical memory, not the data contained therein, because data may me in cache page, but not in physical page
4 Zones Because of hardware limitations, the kernel cannot treat all pages as identical Because of limitations, the kernel divides pages into different zones ZONE_DMA- capable for undergoing DMA ZONE_NORMAL- normal, regularly mapped, pages ZONE_HIGHMEM- “high memory”, which are pages not permanently mapped into the kernel’s address space Zones do not have any physical relevance; they are simply logical grouping used by kernel to keep track of pages Each zone is represented by struct zone lock- spin lock to protect structure from concurrent access free_pages- number of free pages in this zone name- NULL-terminated string, representing name of zone
5 Zones ZONE_DMA – some architectures have problems to perform DMA (direct memory access) to all memory addresses ZONE_HIGHMEM – some architectures have problems with performing directly mapping varies in x86 – ZONE_DMA consist of memory from 0 to 16 Mb in x86 – ZONE_HIGHMEM consist of memory above 896 Mb ZONE_NORMAL – whatever is left over after the ZONE_DMA and ZONE_HIGHMEM zones in x86 – ZONE_NORMAL consist of memory from 16 to 896 Mb
6 Memory Allocation Procedures The kernel provides one low-level mechanism for requesting memory, along with several interfaces the kernel implements to allow allocation and freeing memory Core function, that allocates continuous physical pages and returns pointer to the first page’s page structure; in error - NULL struct page * alloc_pages(unsigned int gfp_mask, unsigned int order) To convert a given page to its logical address void * page_address(struct page * page) Function, that returns logical address after page allocation unsigned long __get_free_pages(unsigned int gfp_mask, unsigned int order) If it is necessary to work only with one page struct page * alloc_page(unsigned int gfp_mask) unsigned long __get_free_page(unsigned int gfp_mask)
7 Memory Allocation Procedures When pages are no more necessary, they should be freed A family of functions allow to free allocated pages void __free_pages(struct page *page, unsigned int order) void free_pages(unsigned long addr, unsigned int order) void free_pages(unsigned long addr) Careful is needed when pages are free because of mistake can result in corruption
8 Memory Allocation Procedures For more general byte-sized allocations kernel provide another functions Function that allocates byte sized-chunks void * kmalloc(size_t size, int flags) Function that allocates byte sized-chunks, but memory is only virtually continuous void * vmalloc(unsigned long size) void * kfree(const void * ptr) void * vfree(const void * ptr)
9 Flags of Memory Allocation gfp_mask flag The flags are broken up into three categories action modifiers zone modifiers types All the flags are declared in
10 Flags of Memory Allocation Action modifiers Specify how the kernel is supposed to allocate the requested memory FlagDescription __GFP_WAITThe allocator can sleep __GFP_HIGHThe allocator can access emergency pools __GFP_IOThe allocator can start disc I/O __GFP_FSThe allocator can start filesystem I/O __GFP_COLDThe allocator should use cache cold pages __GFP_NOWARNThe allocator will not print failure messages __GFP_REPEATThe allocator will repeat the allocation if it fails, but allocation can potentially fail __GFP_NOFAILThe allocator will indefinitely repeat the allocation, allocation cannot fail __GFP_NORETRYThe allocator will newer retry if the allocation fails __GFP_NO_GROWUsed internally by the slab layer __GFP_COMPAdd compound page metadata. Used internally by the huget1b codes
11 Flags of Memory Allocation Zone modifiers Specify from which memory zone the allocation should originate FlagDescription __GFP_DMAAllocate only from ZONE_DMA __GFP_HIGHMEMAllocate only from ZONE_HIGHMEM or ZONE_NORMAL By default kernel allocates memory in ZONE_NORMAL If neither flag is specified, the kernel fulfills the allocation from either ZONE_DMA or ZONE_NORMAL, but preference will be on ZONE_NORMAL
12 Flags of Memory Allocation Type Flags Specify the required action and zone modifiers to fulfill a particular type of transaction FlagDescription GFP_ATOMIC__GFP_HIGH Allocation is high priority and must not sleep GFP_NOIO__GFP_WAIT Allocation can block, but must not initiate disc I/O GFP_NOFS(__GFP_WAIT | __GFP_IO) Allocation can block and initiate disc I/O, but will not initiate a filesystem operations GFP_KERNEL(__GFP_WAIT | __GFP_IO | __GFP_FS) Normal allocation and might block GFP_USER(__GFP_WAIT | __GFP_IO | __GFP_FS) Normal allocation and might block. For user-space processes GFP_HIGHUSER(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM ) For ZONE_HIGHMEM and might block. For user-space processes GFP_DMA__GFP_DMA For ZONE_DMA. For device drivers
13 Flags of Memory Allocation Which Flag to Use When In most cases used GFP_KERNEL or GFP_ATOMIC flags SituationSolution Process context, can sleepUse GFP_KERNEL Process context, cannot sleepUse GFP_ATOMIC, or perform your allocations with GFP_KERNEL at en earlier or later point when you can sleep Interrupt handlerUse GFP_ATOMIC SoftirqUse GFP_ATOMIC TaskletUse GFP_ATOMIC Need DMA-able memoryUse (GFP_DMA | GFP_KERNEL) can sleep Need DMA-able memoryUse (GFP_DMA | GFP_ATOMIC), or perform your allocation at cannot sleepan earlier point when you can sleep
14 Slab Layer Free list – made for facility frequent allocation and deallocation of data A free list contains a block of available, already allocated, data structures When code requires a new instance of a data structure, it can grab one of the structures off the free list rather that allocate new When data structure no longer needed, it is returned to free list instead of deallocating Main problem is that there are no global exist no global control of free lists
15 Slab Layer Slab layer made to solve the problem of global free list control Frequently used data structures tend to be allocated and freed often, so cache them Free lists allocated continuous to prevent memory fragmentation Free lists provides improved performance of using data structures If part of cache is made per-processor, allocations and frees can be performed without SMP lock
16 Slab Layer Design of slab layer Slab layer divides different objects into groups called caches One cache for one object type The caches then divided into slabs Each slab contains some number of objects full- all objects in slab are allocated, no free objects Each slab is in one of three states partial- slab has some allocated objects and some free objects empty- no allocated objects in slab, all objects are free When kernel requests a new object request satisfied from partial slab, if such exist request satisfied from empty slab, if such exist new empty slab allocated, if no one empty or partial slab exist
17 Slab Layer relationship between caches, slabs, and objects Cache Slab Object Each cache is represented by a kmem_cache_s structure Structure kmem_cache_s are contains three lists – slabs_full, slabs_partial, slabs_empty struct slab { struct list_headlist;/* full, partial, or empty list */ unsigned longcolouroff;/* offset for the slab coloring */ void*s_mem;/* first object in the slab */ unsigned intinsue;/* allocated objects in the slab */ kmem_bufctl_tfree;/* first free object, if any */ }
18 Slab Layer Slab layer memory management Memory allocation for new slabs __get_free_pages() Memory deallocation for slabs __kmem_freepages() Slab layer invokes memory allocation only when there does not exist any partial or empty slabs in a given cache The slab layer managed on a per-cache basis through a simple interface, which is exported to the entire kernel The interface allows the creation and destruction of caches and the allocation and freeing of objects within the caches