CSC 660: Advanced Operating Systems CSC 660: Advanced OS Process Address Space CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Topics Process Address Space Virtual Memory Areas Page Tables Page Fault Handler CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems User vs. Kernel Memory Process requests for memory non-urgent. Process will not immediately use all of the requested memory. Kernel defers process memory requests until necessary. User processes cannot be trusted. Kernel must catch addressing errors CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Process Address Space Code 0x00000000 Data Heap Stack Env/Argv 0xBFFFFFFF CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems mm_struct Type Name Purpose vm_area_struct *mmap List of VMAs struct rb_root mm_rb Tree of VMAs pgd_t *pgd Page global directory struct list_head mmlist List of mm_structs atomic_t mm_users number of users mm_count primary usage count unsigned long start_code start of code region end_code end of code region <linux/sched.h> CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems mm_struct How to access? current->mm What about kernel threads? Kernel mapping identical for all tasks. current->mm is NULL. Use space of prev task: current->active_mm How to allocate? allocate_mm() uses mm_cachep slab cache. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Virtual Memory Areas Each VMA is a unique memory object. Contiguous page-aligned memory space. VMA has a single set of permissions. Common VMAs in a process space text (code) data, bss zero shared libs mmap, shared memory, heap <linux/mm.h> CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Virtual Memory Areas CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems vm_area_struct Type Name Purpose struct mm_struct *mm Associated addr spc. unsigned long vm_start Start (inclusive) vm_end End (exclusive) vm_area_struct *vm_next List of VMAs pgprot_t vm_page_prot Access permissions vm_flags VMA flags struct file *vm_file Mapped file (if any) <linux/sched.h> CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems VMA Flags Flag Effect VM_READ Pages can be read from VM_WRITE Pages can be written to VM_EXEC Pages can be executed VM_SHARED Pages are shared btw processes. VM_IO Mapping of device I/O space. VM_RESERVED Area must not be paged to disk. VM_LOCKED Pages in region are locked. VM_GROWSUP VMA can grow upward. VM_GROWSDOWN VMA can grown downward. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Hardware and Flags VMA flags mapped to MMU hardware. x86 without PAE Two bits: Read/Write, User/Supervisor Read implies Execute. Write implies Read. x86 with PAE NX flag implies no execute. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems VMA Operations void open(struct vm_area_struct *area) Invoked with VMA added to an address space. void close(struct vm_area_struct *area) Invoked when removed from an address space. struct page *nopage(struct vm_area_struct *area, unsigned long address, int unused) Invoked by page fault handler when a page not present in physical memory is accessed. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Viewing Process VMAs 08048000-080e0000 r-xp 00000000 21:01 977341 /bin/bash 080e0000-080e6000 rw-p 00097000 21:01 977341 /bin/bash 080e6000-08162000 rw-p 080e6000 00:00 0 b7ceb000-b7cf3000 r-xp 00000000 21:01 390964 /lib/tls/i686/cmov/libnss_files-2.3.2.so b7cf3000-b7cf4000 rw-p 00008000 21:01 390964 /lib/tls/i686/cmov/libnss_files-2.3.2.so b7cf4000-b7cfc000 r-xp 00000000 21:01 390969 /lib/tls/i686/cmov/libnss_nis-2.3.2.so b7cfc000-b7cfd000 rw-p 00007000 21:01 390969 /lib/tls/i686/cmov/libnss_nis-2.3.2.so b7cfd000-b7d0e000 r-xp 00000000 21:01 390961 /lib/tls/i686/cmov/libnsl-2.3.2.so b7d0e000-b7d0f000 rw-p 00011000 21:01 390961 /lib/tls/i686/cmov/libnsl-2.3.2.so b7e6f000-b7f91000 r-xp 00000000 21:01 390956 /lib/tls/i686/cmov/libc-2.3.2.so b7f91000-b7f9a000 rw-p 00121000 21:01 390956 /lib/tls/i686/cmov/libc-2.3.2.so b7f9a000-b7f9c000 rw-p b7f9a000 00:00 0 b7f9c000-b7f9e000 r-xp 00000000 21:01 390958 /lib/tls/i686/cmov/libdl-2.3.2.so b7f9e000-b7f9f000 rw-p 00001000 21:01 390958 /lib/tls/i686/cmov/libdl-2.3.2.so b7f9f000-b7fd3000 r-xp 00000000 21:01 390931 /lib/libncurses.so.5.4 b7fd3000-b7fdb000 rw-p 00034000 21:01 390931 /lib/libncurses.so.5.4 b7fdb000-b7fdc000 rw-p b7fdb000 00:00 0 b7fe9000-b7feb000 rw-p b7fe9000 00:00 0 b7feb000-b8000000 r-xp 00000000 21:01 390951 /lib/ld-2.3.2.so b8000000-b8001000 rw-p 00015000 21:01 390951 /lib/ld-2.3.2.so bffeb000-c0000000 rw-p bffeb000 00:00 0 ffffe000-fffff000 ---p 00000000 00:00 0 CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Viewing Process VMAs 08048000 608K r-x-- /bash 080e0000 24K rw--- /bash 080e6000 496K rw--- [ anon ] b7ceb000 32K r-x-- /libnss_files-2.3.2.so b7cf3000 4K rw--- /libnss_files-2.3.2.so b7cf4000 32K r-x-- /libnss_nis-2.3.2.so b7cfc000 4K rw--- /libnss_nis-2.3.2.so b7cfd000 68K r-x-- /libnsl-2.3.2.so b7d0e000 4K rw--- /libnsl-2.3.2.so b7e6f000 1160K r-x-- /libc-2.3.2.so b7f91000 36K rw--- /libc-2.3.2.so b7f9a000 8K rw--- [ anon ] b7f9c000 8K r-x-- /libdl-2.3.2.so b7f9e000 4K rw--- /libdl-2.3.2.so b7f9f000 208K r-x-- /libncurses.so.5.4 b7fd3000 32K rw--- /libncurses.so.5.4 b7fdb000 4K rw--- [ anon ] b7fe9000 8K rw--- [ anon ] b7feb000 84K r-x-- /ld-2.3.2.so b8000000 4K rw--- /ld-2.3.2.so bffeb000 84K rw--- [ stack ] ffffe000 4K ----- [ anon ] CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Page Tables MMU translates virtual addresses to physical. Uses page tables to perform translation. Operating system must manage page tables. Each process has its own page table. Addresses divided into two parts Page number (p) – index into page table. Page offset (d) – offset within page. p d 19 31 CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Page Tables CSC 660: Advanced Operating Systems
Multi-level Page Tables How large is the page table? 32-bit address space, 4K pages. 220 entries, 32-bits/entry = 4M Much worse on 64-bit architectures. Most page table entries are unused. Solution Multi-level page tables. Divide address into one part for each level of tables plus one part for the offset. CSC 660: Advanced Operating Systems
Linux 4-level Page Table Converted from 3- to 4-levels in 2.6.11. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems x86 Page Tables x86 paging Page directory (10), page table (10), offset (12) x86 extended paging (4MB pages) Page table (10), offset (22) PAE paging Page Directory Pointer Table allows 36-bit phys addresses. Logical addresses 32-bits in size: 4GB process limit remains. x86-64 paging Global(9), Upper(9), Lower(9), Table(9), Offset(12) 48-bit addressing allows 256TB of address space. CSC 660: Advanced Operating Systems
Translation Lookaside Buffers Address translation is expensive. 4-level: each virt addr => 5 memory accesses Solution: TLB Caches address translations. Flushed during a context switch. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Page Cache Primary kernel disk cache. Each page consists of multiple block buffers. Page types: files, block devices, swap space Reads and writes use cache except O_DIRECT pdflush daemon Write operations deferred by page cache until: Free memory shrinks below specified threshold. Dirty data older than a specified threshold. Multiple threads to avoid device congestion. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Page Lifecycle Page is read into memory from disk. File read, mmap, read ahead, page fault. Page is made dirty by process writing to it. Page is removed from cache Dirty page expires or kernel needs memory. Non-dirty pages can just be de-allocated. Dirty pages have to be written back to disk. CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Page Fault Handler Causes of page faults Attempt to access a virtual address that’s not mapped to a physical page. Attempt to access a virtual address in a way that’s forbidden (ex: write to ro page.) CPU raises page fault exception on fault. Kernel calls do_page_fault() CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems Page Fault Handler CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems do_page_fault() If page fault from kernel If address is valid, handle page fault. If address invalid, kernel oops. If interrupt context, kernel oops. else page fault from user process If address is NULL or kernel, seg fault. If address not in a VMA, seg fault. Except if add just below stack, grow stack. If permissions do not permit action, seg fault. arch/i386/mm/fault.c CSC 660: Advanced Operating Systems
CSC 660: Advanced Operating Systems References Daniel P. Bovet and Marco Cesati, Understanding the Linux Kernel, 3rd edition, O’Reilly, 2005. Mel Gorman, Understanding the Linux Virtual Memory Manager, Prentice Hall, 2004. Robert Love, Linux Kernel Development, 2nd edition, Prentice-Hall, 2005. Claudia Rodriguez et al, The Linux Kernel Primer, Prentice-Hall, 2005. Avi Silberchatz, Operating System Concepts, 7th edition, Wiley, 2004. Andrew S. Tanenbaum, Modern Operating Systems, 3rd edition, Prentice-Hall, 2005. CSC 660: Advanced Operating Systems