Why Kernel Works? kernel function gets dynamic memory in a fairly straightforward manner _get_free_pages( ) or alloc_pages( ) The simple approaches work.

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

Tutorial 8 March 9, 2012 TA: Europa Shang
CSCC69: Operating Systems
The Linux Kernel: Memory Management
Memory management.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Shared Memory  Creating a Shared Memory Segment Allocated in byte amounts  Shared Memory Operations Create Attach Detach  Shared Memory Control Remove.
Processes CSCI 444/544 Operating Systems Fall 2008.
Memory Management (II)
Memory Management Policies: UNIX
CE6105 Linux 作業系統 Linux Operating System 許 富 皓. Chapter 2 Memory Addressing.
Introduction to Kernel
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
BINA RAMAMURTHY UNIVERSITY AT BUFFALO System Structure and Process Model 5/30/2013 Amrita-UB-MSES
Memory Mapping Sarah Diesburg COP5641.
Introduction to Processes CS Intoduction to Operating Systems.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
The Structure of Processes (Chap 6 in the book “The Design of the UNIX Operating System”)
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
1 Linux Operating System 許 富 皓. 2 Memory Addressing.
System calls for Process management
Linux Processes Travis Willey Jeff Mihalik. What is a process? A process is a program in execution A process includes: –program counter –stack –data section.
University of Amsterdam Computer Systems – virtual memory Arnoud Visser 1 Computer Systems Virtual Memory.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Processes and Virtual Memory
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Pintos project 3: Virtual Memory Management
1 Structure of Processes Chapter 6 Process State and Transition Data Structure for Process Layout of System Memory THE DESIGN OF THE UNIX OPERATING SYSTEM.
What is a Process ? A program in execution.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 4.
Memory Management 백 일 우
Introduction to Kernel
Chapter 2: The Linux System Part 4
Processes and threads.
Virtual Memory: Systems
Structure of Processes
Modeling Page Replacement Algorithms
Unix Process Management
System Structure and Process Model
Virtual Memory: Systems /18-213/14-513/15-513: Introduction to Computer Systems 18th Lecture, October 25, 2018.
System Structure and Process Model
Structure of Processes
Virtual Memory: Systems
System Structure B. Ramamurthy.
Virtual Memory: Systems
Making Virtual Memory Real: The Linux-x86-64 way
System Structure and Process Model
CSC 660: Advanced Operating Systems
Modeling Page Replacement Algorithms
Virtual Memory CSCI 380: Operating Systems Lecture #7 -- Review and Lab Suggestions William Killian.
Pentium III / Linux Memory System April 4, 2000
Virtual Memory.
Process Control B.Ramamurthy 2/22/2019 B.Ramamurthy.
CSE451 Virtual Memory Paging Autumn 2002
Chapter 3: Processes.
Virtual Memory: Systems CSCI 380: Operating Systems
CSCI 380: Operating Systems William Killian
Unix Process Control B.Ramamurthy 4/11/2019 B.Ramamurthy.
Process Description and Control in Unix
Process Description and Control in Unix
Internal Representation of Files
Structure of Processes
User-level Memory Chris Gill, David Ferry, Brian Kocoloski
Virtual Memory and Paging
CSE 542: Operating Systems
Threads CSE 2431: Introduction to Operating Systems
Presentation transcript:

Process Address Space The Process's Address Space The Memory Descriptor Memory Regions Page Fault Exception Handler Creating and Deleting a Process Address Space Managing the Heap

Why Kernel Works? kernel function gets dynamic memory in a fairly straightforward manner _get_free_pages( ) or alloc_pages( ) The simple approaches work for two reasons The kernel is the highest-priority component, no point to defer requests. The kernel trusts itself. All kernel functions are assumed to be error-free

Use Mode Processes When allocating memory to User Mode process... Process requests for dynamic memory are non-urgent. Kernel tries to defer allocating dynamic memory to User Mode processes. When a User Mode process asks for dynamic memory, the process doesn't get additional page frames It gets the right to use a new range of linear addresses, become part of its address space. This interval is called a memory region. Kernel must be prepared to catch all addressing errors caused by processes in User Mode.

Process's Address Space address space of a process Consists of all linear addresses that the process is allowed to use. Each process sees a different set of linear addresses, no relationship between. The kernel may dynamically modify a process address space By adding or removing intervals of linear addresses, called memory regions

Process's Address Space A memory map of the executable file's code, called the text section A memory map of the executable file's initialized global variables, called the data section A memory map of the containing uninitialized global variables, called the bss section A memory map of the zero page used for the process's user-space stack An additional text, data, and bss section for each shared library Any memory mapped files Any shared memory segments Any anonymous memory mappings (heap), such as those associated with malloc() Zero page: a page consisting of all zeros, used for purposes such as this BSS: block started by symbol.

Memory Region The initial address and the length of a memory region must be multiples of 4K. Situations in which a process gets new memory regions Create a new process Assign a fresh address space, thus a set of memory regions. Load an entirely different program process ID unchanged, release old MRs and assign new memory regions after loading. Assign a new MR when a "memory mapping" on a file create an IPC-shared memory region Expand the size of that memory region when it is used up by User Mode stack when expand its dynamic area (the heap)

System calls related to memory region creation and deletion Description brk( ) Changes the heap size of the process execve( ) Loads a new executable file, thus changing the process address space _exit( ) Terminates the current process and destroys its address space fork( ) Creates a new process, and thus a new address space mmap( ), mmap2( ) Creates a memory mapping for a file, thus enlarging the process address space mremap( ) Expands or shrinks a memory region remap_file_pages( ) Creates a non-linear mapping for a file munmap( ) Destroys a memory mapping for a file, thus contracting the process address space shmat( ) Attaches a shared memory region shmdt( ) Detaches a shared memory region

The Memory Descriptor Include all information related to the process address space in a data structure of type mm_struct referenced by the mm field of the process descriptor. The list is protected against concurrent accesses in multiprocessor systems by the mmlist_lock spin lock. mm_count and mm_users mm_count: the number of light weight processes share the data structure mm_users: the number of processes share the data structure Whenever a process in Kernel Mode modifies a Page Table entry for a "high" linear address (above TASK_SIZE), it should also update the corresponding entry in the sets of Page Tables of all processes in the system.

mm_struct Type Field Description struct vm_area_struct * mmap Pointer to the head of the list of memory region objects struct rb_root mm_rb Pointer to the root of the red-black tree of memory region objects mmap_cache Pointer to the last referenced memory region object unsigned long (*)( ) get_unmapped_area Method that searches an available linear address interval in the process address space void (*)( ) unmap_area Method invoked when releasing a linear address interval unsigned long mmap_base Identifies the linear address of the first allocated anonymous memory region or file memory mapping (see the section "Program Segments and Process Memory Regions" in Chapter 20) free_area_cache Address from which the kernel will look for a free interval of linear addresses in the process address space pgd_t * pgd Pointer to the Page Global Directory atomic_t mm_users Secondary usage counter mm_count Main usage counter int map_count Number of memory regions rw_semaphore mmap_sem Memory regions' read/write semaphore spinlock_t page_table_lock Memory regions' and Page Tables' spin lock struct list_head mmlist Pointers to adjacent elements in the list of memory descriptors start_code Initial address of executable code end_code Final address of executable code start_data Initial address of initialized data end_data Final address of initialized data start_brk Initial address of the heap brk Current final address of the heap

mm_struct Type Field Description unsigned long start_stack Initial address of User Mode stack arg_start Initial address of command-line arguments arg_end Final address of command-line arguments env_start Initial address of environment variables env_end Final address of environment variables rss Number of page frames allocated to the process anon_rss Number of page frames assigned to anonymous memory mappings total_vm Size of the process address space (number of pages) locked_vm Number of "locked" pages that cannot be swapped out (see Chapter 17) shared_vm Number of pages in shared file memory mappings exec_vm Number of pages in executable memory mappings stack_vm Number of pages in the User Mode stack reserved_vm Number of pages in reserved or special memory regions

mm_struct Type Field Description unsigned long def_flags Default access flags of the memory regions nr_ptes Number of Page Tables of this process unsigned long [] saved_auxv Used when starting the execution of an ELF program (see Chapter 20) unsigned int dumpable Flag that specifies whether the process can produce a core dump of the memory cpumask_t cpu_vm_mask Bit mask for lazy TLB switches (see Chapter 2) mm_context_t context Pointer to table for architecture-specific information (e.g., LDT's address in 80 86 platforms) swap_token_time When this process will become eligible for having the swap token (see the section "The Swap Token" in Chapter 17) char recent_pagein Flag set if a major Page Fault has recently occurred int core_waiters Number of lightweight processes that are dumping the contents of the process address space to a core file (see the section "Deleting a Process Address Space" later in this chapter) struct completion * core_startup_done Pointer to a completion used when creating a core file (see the section "Completions" in Chapter 5) struct completion core_done Completion used when creating a core file rwlock_t ioctx_list_lock Lock used to protect the list of asynchronous I/O contexts (see Chapter 16) struct kioctx * ioctx_list List of asynchronous I/O contexts (see Chapter 16) struct kioctx default_kioctx Default asynchronous I/O context (see Chapter 16) hiwater_rss Maximum number of page frames ever owned by the process hiwater_vm Maximum number of pages ever included in the memory regions of the process

Memory Regions (ULK) Memory areas (LKD) Each memory region descriptor identifies a linear address interval. Memory regions owned by a process never overlap the kernel tries to merge new regions All the regions owned by a process are linked in a simple list. successive regions can be separated by an area of unused memory addresses A process may own up to MAX_MAP_COUNT different memory regions(65536). ULK: understanding the Linux kernel LKD: Linux kernel development

vm_area_struct struct vm_area_struct { struct mm_struct *vm_mm; /* associated mm_struct */ unsigned long vm_start; /* VMA start, inclusive */ unsigned long vm_end; /* VMA end , exclusive */ struct vm_area_struct *vm_next; /* list of VMA's */ pgprot_t vm_page_prot; /* access permissions */ unsigned long vm_flags; /* flags */ struct rb_node vm_rb; /* VMA's node in the tree */ union { /* links to address_space->i_mmap or i_mmap_nonlinear */ struct { struct list_head list; void *parent; struct vm_area_struct *head; } vm_set; struct prio_tree_node prio_tree_node; } shared; struct list_head anon_vma_node; /* anon_vma entry */ struct anon_vma *anon_vma; /* anonymous VMA object */ struct vm_operations_struct *vm_ops; /* associated ops */ unsigned long vm_pgoff; /* offset within file */ struct file *vm_file; /* mapped file, if any */ void *vm_private_data; /* private data */ };

Adding or removing a linear address interval

Descriptors related to the address space of a process The list might be long and fast search is expected.

Lists and Trees of Memory Areas Memory areas are accessed via both the mmap (linked list) and the mm_rb (rb-tree) fields of the memory descriptor The linked list is used when every node needs to be traversed. The red-black tree is used when locating a specific memory area in the address space.

Red-Black Tree Every node must be either red or black. The root of the tree must be black. The children of a red node must be black. Every path from a node to a descendant leaf must contain the same number of black nodes. When counting the number of black nodes, null pointers are counted as black nodes. These four rules ensure that any red-black tree with n internal nodes has a height of at most 2log(n + 1). Up to Version 2.4.9, the Linux kernel used AVL tree.

Example of red-black trees

Memory Region Access Rights Each memory region therefore consists of a set of pages that have consecutive page numbers. Page access rights included in a memory region descriptor may be combined arbitrarily. It is possible, for instance, to allow the pages of a region to be executed but not read. To implement this protection scheme efficiently, the read, write, and execute access rights associated with the pages of a memory region must be duplicated in all the corresponding Page Table entries

READ, WRITE, EXECUTE, SHARE Access rights are scaled down to If the page has both write and share access rights, the Read/Write bit is set. If the page has the read or execute access right but does not have either the write or the share access right, the Read/Write bit is cleared. If the page does not have any access rights, the Present bit is cleared so that each access generates a Page Fault exception. However, to distinguish this condition from the real page-not-present case, Linux also sets the Page size bit to 1. 80x86 chip checks the Page size bit in Page Directory entries, but not in Page Table entries.

Memory Region Handling find_vma( ): It locates the first memory region whose vm_end field is greater than addr find_vma_intersection( ) : Finding a region that overlaps a given interval get_unmapped_area( ) : Finding a free interval arch_ get_unmapped_area() arch_ get_unmapped_area_topdown() insert_vm_struct( ): Inserting a region in the memory descriptor list vm area addr addr

find_vma() struct vm_area_struct * find_vma(struct mm_struct *mm, unsigned long addr) { struct vm_area_struct *vma = NULL; if (mm = null) return NULL; /*hit?*/ if (vma && vma->vm_end > addr && vma->vm_start <= addr) return mm->mmap_cache; /*rb-tree*/ struct rb_node *rb_node; rb_node = mm->mm_rb.rb_node; vma = NULL; while (rb_node) { struct vm_area_struct * vma_tmp; vma_tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb); if (vma_tmp->vm_end > addr) { vma = vma_tmp; if (vma_tmp->vm_start <= addr) break; rb_node = rb_node->rb_left; } else rb_node = rb_node->rb_right; } /*update the cache value*/ if (vma) mm->mmap_cache = vma; return vma;

find_vma() The result of the find_vma() function is cached in the mmap_cache field of the memory descriptor. Checking the cached result is quick and the hit rate is about 30~40% in practice.

Overall scheme for the Page Fault handler

Figure 8-5. The flow diagram of the Page Fault handler

Page Fault Exception Handler Handling a Faulty Address Outside the Address Space Handling a Faulty Address Inside the Address Space minor fault the Page Fault has been handled without blocking the current process major fault the Page Fault forced the current process to sleep most likely because time was spent while filling the page frame assigned to the process with data read from disk. Demand Paging ZERO page Copy On Write Why?

Creating and Deleting a Process Address Space kernel invokes the copy_mm( ) function while creating a new process processes can be created by calling clone( ) the kernel invokes the exit_mm( ) function to release the address space owned by that process

Managing the Heap Each Unix process owns a specific memory region called heap, which is used to satisfy the process's dynamic memory requests. malloc(size) calloc(n,size) free(addr) brk(addr) sbrk(incr)