Linux Virtual Memory for Intel Processor

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

Memory Management Unit
Part IV: Memory Management
The Linux Kernel: Memory Management
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Virtual Memory Operating System Concepts chapter 9 CS 355
16.317: Microprocessor System Design I
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
OS Memory Addressing.
UQC152H3 Advanced OS Memory Management under Linux.
Chapter 8.3: Memory Management
Memory Management (II)
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
CE6105 Linux 作業系統 Linux Operating System 許 富 皓. Chapter 2 Memory Addressing.
Memory Management 2010.
Informationsteknologi Friday, November 16, 2007Computer Architecture I - Class 121 Today’s class Operating System Machine Level.
Chapter 3.2 : Virtual Memory
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Computer Organization and Architecture
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
Memory Management in Windows and Linux &. Windows Memory Management Virtual memory manager (VMM) –Executive component responsible for managing memory.
CS 346 – Chapter 8 Main memory –Addressing –Swapping –Allocation and fragmentation –Paging –Segmentation Commitment –Please finish chapter 8.
Memory Addressing in Linux  Logical Address machine language instruction location  Linear address (virtual address) a single 32 but unsigned integer.
Computer Architecture Lecture 28 Fasih ur Rehman.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Lecture 19: Virtual Memory
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
1 Linux Operating System 許 富 皓. 2 Memory Addressing.
Memory Addressing in Linux (Chap. 2, Understanding the Linux Kernel) J. H. Wang Oct. 20, 2008.
Chapter 4 Memory Management Virtual Memory.
80386DX.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Processes and Virtual Memory
Page Replacement Implementation Issues Text: –Tanenbaum ch. 4.7.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
OS Memory Addressing. Architecture CPU – Processing units – Caches – Interrupt controllers – MMU Memory Interconnect North bridge South bridge PCI, etc.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 4.
W4118 Operating Systems Instructor: Junfeng Yang.
CS203 – Advanced Computer Architecture Virtual Memory.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Chapter 2: The Linux System Part 4
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
CSE 120 Principles of Operating
CS703 - Advanced Operating Systems
Address Translation Mechanism of 80386
143A: Principles of Operating Systems Lecture 5: Address translation
x86 segmentation, page tables, and interrupts
Page Replacement Implementation Issues
CSE 153 Design of Operating Systems Winter 2018
Chapter 8: Main Memory.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CS399 New Beginnings Jonathan Walpole.
Page Replacement Implementation Issues
Introduction to the Intel x86’s support for “virtual” memory
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Lecture 7: Flexible Address Translation
CS703 - Advanced Operating Systems
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
CSE 542: Operating Systems
Presentation transcript:

Linux Virtual Memory for Intel Processor Debzani Deb

Overview Overview of Virtual memory. What are the supports available in Intel architecture for virtual memory. How Linux use those hardware support and implement virtual memory. Process Address Space. Page fault handler. What are the additional improvements in kernel2.6. References.

Introduction In “Virtual Memory” environment a large logical address space is simulated with a small amount of physical memory (RAM) and some disk storage (swap space). Processor’s addressable logical address is converted to physical address during program execution. Implementation requires extensive hardware assistance and a lot of complex OS code and time. Virtual memory can be implemented as Paging : Fixed sized memory blocks. Segmentation: variable sized memory blocks. Fetch technique : Demand Paging Replacement technique: Least Recently Used (LRU) algorithm.

Why Virtual Memory? Process may be too big for Physical Memory RAM Process may be too big for Physical Memory There are more active process than the physical memory can hold. Solution: “Virtual Memory” where a large virtual address space(4GB) for each process is simulated with a small amount of physical memory (RAM) and some disk storage (swap space). Process 2 (50 MB) Process 3 (30 MB) Process 1 (50 MB) OS (8 MB)

Virtual Memory Process 1 Process 2 RAM Page1(1) Page1(1) Page1(2) Process 1 Running Process 2 Scheduled to run Process 1 Sleep Process 2 Running Process 2 faulted Page 2(1) Page 2(2) Page 2(1) Page 3(1) Page 3(2) Page1(2) Page 2(2) Page 4(1) OS (8 MB) Page 5(1) Page 5(2) Page 6(2) Page 6(1) The system works because principle of locality holds. Thrashing : System swaps in/out all the time, no real work is done. Page 7(2) Page 7(1)

IA-32 Virtual Memory IA-32 architecture supports either pure segmentation or segmentation/paging virtual memory. Logical address Consists of a segment selector(16 bit) and an offset(32 bit). Linear Address (LA) or Virtual Address (VA) The base address of the segment + offset. This 32 bit address is used to address 4GB of memory. Physical Address (PA) 32 bit Address in RAM. Linear Address Segmentation Unit Logical Address Paging Physical Address

IA-32 Virtual Memory

IA-32 Segmentation(1) Segment Registers (6) Hold and retrieve segment selectors quickly. CS (Code segment register) points to a segment containing program instructions. Also includes Current privilege Level (CPL) field to denote privilege level : 0 means kernel mode and 3 means user mode. DS (Data segment register) points to a segment containing static and external data. SS (Stack segment register) points to a segment containing the current program stack. ES, FS & GS are general purpose registers and may refer to arbitrary data segments.

IA-32 Segmentation(2) Segment Descriptors (8 Byte) Unique Segment Identifier. Stored in Global Descriptor Table (GDT). Contains 32 bit Base address of the segment 20 bit limit 4 bit Type that denote segment type and access rights. DPL (Descriptor Privilege Level) Field : 0 means use is restricted to only kernel mode, 3 means both mode.

IA-32 Protection Protection Intel Use 4 Privilege levels: 0-3 with 0 being the most privilege level. The privilege level of executing program is determined by the privilege level of the code segment currently executing. CPL (Current privilege level): Bit 0 & 1 of CS (code segment) register. The processor changes CPL when program control is transferred to a code segment with a different privilege level. DPL (Descriptor’s privilege level): Bits in Segment descriptor. When the currently executing code segment attempts to access a segment, The DPL is compared to the CPL of CS. Programs executing in a high privilege level can not access segments with a lower privilege level while programs low privilege level can access all segments.

Segmentation in Linux There is no mode bit to disable segmentation. Linux prefer paging over segmentation because of simplicity and portability. The pages are divided among 4 Segments. All process use the same logical address and segment descriptors. GDT is implemented is /arch/i386/kernet/head.S Each time CPL in CS change, DS and SS changed correspondingly. SS points to DS. Segments used by Linux Type DPL Accessed By Kernel Code Code, Read, Execute Kernel Kernel Data Data, Read , Write User Code 3 Both User Data

Protection in Linux Segments overlap in linear address space /arch/i386/kernet/head.S Thus access is effectively allowed to the entire virtual address space using any of the above segments. All processes have two segments 0 - 3GB: user segment 3GB - 4GB kernel segment Boundary is determined by PAGE_OFFSET = 0xC00000000. Process in user mode (CPL = 3) can only access addresses lower than 3 GB (only segments with DPL = 3). Process in kernel mode (e.g. after a system call) can access both. When CPL = 0, can access segments (DPL =0,3) Any distinction between code and data is enforced at the page level, not at the segment level: R/W , U/S bit of page.

IA-32 Paging Paging RAM is partitioned into fixed-sized page frames. Linear address is divided into same size pages The processor use information contained in page directories and page tables (stored in RAM) to map linear to physical address and to generate page fault exception. Translation Lookaside Buffers (TLB) are used to store most recently accessed page directory and table entries to reduce access time. Intel supports 4KB, 2MB, 4MB page size. Paging is controlled by three flags in the processor’s control registers and sets by OS during initialization. PG (paging): Available in all Intel processor starting from 80386. Enable paging. PSE (page size extensions): Introduced in the Pentium processor. Permit large page(4 MB/2 MB when PAE is set) PAE (physical address extension): Introduced in the Pentium Pro processors. Provides a method of extending physical address to 36 bits(64MB). Support page size of 4 KB/2 MB.

Page Table and directories 32 bit linear address is divided into 3 fields(4KB page) Page Directory : Most significant 10 bits (1024 entry) Page Table: The intermediate 10 bits (1024 entry) Offset: Least significant 12 bits (Each page is 4KB) Incase of 2MB/4MB page, most significant 10 bits are for page directory and rest 22 bits are for page offset. Page tables are not used.

Page Directory and Page table Entries When 32 bit address and 4KB page used 20 bit base address, bits 12 through 32. Present: when set, Page is in RAM. Read/Write: When set, page can be read and written into. User/supervisor: When set, user privilege level, otherwise both. Accessed: sets each time paging unit access the entry. PCD (page-level cache disable) and PWT (page-level write through) Dirty: Applies page table entries only. Sets when the page is accessed for write. Global: Introduced in Pentium Pro. Applies page table entries only. When set indicates a global page and prevent the page flushed from TLB when context switch occurs. Page size: Applies page directories only. When 1 refers to 2MB/4MB page frame & PGD points to page. 4KB page when 0. This flags are checked by hardware to see whether requested kind of addressing can be performed.

Paging in Linux(1) Linux uses 3 level paging to adopt to 64 bit architectures. Page global directory (PGD) Page Middle directory (PMD) Page table Linear address is divided into four parts: three table offset and an page offset. What happens with IA-32, which use only two level page tables? Linux makes the PMD entry points back to PGD. IA-32 contains 1024 entries in PGD, one entry in PMD and 1024 entries in page table. Each process has its own PGD. During context switch, PGD base value of the process executing next is loaded into CR3 and TLB get flushed.

Paging in Linux(2) Linux use PAE, but don’t use PSE. Also use page size (PS) flag of PGD to refer different page size for that specific PGD. Mixing 4MB and 4 KB page size Kernel use large page(4MB) and one level translation to reduce TLB entries and memory. Application use 4KB page. PAE PS of PGD Page size Physical Address size 4KB 32 bit 1 4MB 36 bit 2MB

Paging in Linux(3) include/asm-i386/page.h 5 #define PAGE_SHIFT 12 6 #define PAGE_SIZE (1UL << PAGE_SHIFT) 7 #define PAGE_MASK (~(PAGE_SIZE-1)) include/asm-i386/pgtable.h include/asm-i386/pgtable-2level.h Page table lookup code : mm/memory.c

Paging in Linux (4) The linear address space is split into two parts. The userspace(0-3GB) can be addressed in both mode Kernel space(3GB-4GB) can be accessed in only kernel mode. PAGE_OFFSET is defined as 0xc0000000 (3 GB) Kernel Paging (4 MB page) Kernel code and data stored in a group of reserved page frame. Never be dynamically assigned or swapped to disk. Kernel maintains a set of page tables rooted at Master Kernel Page Global Directory. How kernel initializes it’s own page tables? swapper_pg_dir is initialized during kernel compilation. Phase 1: Kernel can address the first 8 MB of RAM by either LA identical to PA or 8MB starts from 0xc0000000. Phase 2: Only transform LA starts from 0xc0000000 to PA from 0. Where Paging starts? /arch/i386/kernel/head.S

Physical Memory Management Physical memory is divided into three Zones: DMA, Normal & HighMEM. Page frames are assigned from these zones. Each physical page is associated with a page descriptor All pages are stored in mem_map array. Requesting page frames: alloc_pages() allocates groups of contiguous page frames and use buddy system. If alloc_pages can’t find a free page frame, it calls try_to_free_pages() to reclaim. try_to_free_pages() reclaim pages according to LRU algorithm. Memory for small data structures are carried out by Slab Allocator.

Process Address Space The linear address space is split into two parts. The userspace(0-3GB) changes with each context switch and accessed in both mode. Kernel space(3GB-4GB) remains constant and accessed while in kernel mode. Memory descriptor mm_struct. One structure exits for each process and is shared among threads. Memory descriptor for kernel threads. PAGE_OFFSET 0xC0000000 Kernel code & data Kernel File name, Environment Arguments Stack Heap Data Code Header Shared Libs User code & data

Memory Regions Full address space rarely used Each address space consists of several non overlapping page aligned regions that are in use. Each region contains pages with same protection and purpose. A list of mapped regions by /proc/PID/maps Regions are described by vm_area_struct If a file is memory mapped, the file pointer is available through vm_file. do_mmap(), find_vma(), get_unmapped_aera()

Process Address Space Linear Address Memory Regions mmap_cache mmap Start End Next Start End Next Start End Next Memory Regions mmap_cache mmap Memory Descriptor

Page faulting Demand fetching Two types of page fault Page is only fetched from swap space when hardware raise a page fault exception, which then the OS traps and allocates a page. A number of pages after the faulting page is prefetched. Two types of page fault Major: Has to read from disk, expensive. Minor: Page in swap cache, protection fault. Architecture specific function do_page_fault(). basically decides what type of fault and how can it be handled. If it is a valid page fault in a valid memory region then call architecture independent function handle_mm_fault(). It allocates the required page table entries and calls handle_pte_fault.

Do_page_fault() flow diagram

handle_mm_fault() Call graph Allocates required page table entries, if they don’t exist handle_pte_fault Based on properties, corresponding handlers are called do_no_page If first time allocation do_swap_page Pages swapped out to disk do_wp_page Copy on Write (COW) page do_anonymous_page Handle anonymous access

Copy on Write (COW) During fork kernel duplicates the parent address space to child. It requires Allocating page frames for the page tables of child process. Allocating page frames for the pages of the child process. Copying the pages of parent process to the pages of child process. Linux use an efficient copy on write approach The pages and page table entries are shared between parent and child process and can’t be modified. Whenever either one tries to write, a write fault occurs. Kernel then duplicates the page into a new page frame and marks it as writable. The original page frame remain write protected. When other process tries to write, kernel check whether it is only owner. If so then the page become writable.

What’s different in 2.6 The big change is Linux's new support for NUMA servers. Support for high end systems with multiple processors, with separate memory pools directly connected to each processor. Support for Intel's PAE (Physical Address Extension) allows the access up to 64 GB of RAM in paged mode. Linux can now run applications that access large blocks of memory. For example, bigger databases are now supported on Linux. Reverse Mapping Multiple virtual pages (pages shared by different processes) might point to the same physical page. The technique is useful when the kernel wants to free a particular physical page.

References IA-32 Intel® Architecture Software Developer’s Manual Volume 3: System Programming Guide (Document 253668): Chapter 3 & 4. Bovet, D., and Cesati, M. Understanding the Linux Kernel. O'Reilly, 2001. (chapter 2, 7, 8 & 16) Virtual memory management for Linux 2.4 kernel: Description    Code documentation http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html Dietel & Dietel, Operating Systems, Prentice Hall , 2004 The Wonderful World of Linux 2.6 by Joseph Pranevich