Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linux Virtual Memory for Intel Processor

Similar presentations


Presentation on theme: "Linux Virtual Memory for Intel Processor"— Presentation transcript:

1 Linux Virtual Memory for Intel Processor
Debzani Deb

2 Overview Overview of Virtual memory.
What are the supports available in Intel architecture for virtual memory. How Linux use those hardware support and implement virtual memory. Process Address Space. Page fault handler. What are the additional improvements in kernel2.6. References.

3 Introduction In “Virtual Memory” environment a large logical address space is simulated with a small amount of physical memory (RAM) and some disk storage (swap space). Processor’s addressable logical address is converted to physical address during program execution. Implementation requires extensive hardware assistance and a lot of complex OS code and time. Virtual memory can be implemented as Paging : Fixed sized memory blocks. Segmentation: variable sized memory blocks. Fetch technique : Demand Paging Replacement technique: Least Recently Used (LRU) algorithm.

4 Why Virtual Memory? Process may be too big for Physical Memory
RAM Process may be too big for Physical Memory There are more active process than the physical memory can hold. Solution: “Virtual Memory” where a large virtual address space(4GB) for each process is simulated with a small amount of physical memory (RAM) and some disk storage (swap space). Process 2 (50 MB) Process 3 (30 MB) Process 1 (50 MB) OS (8 MB)

5 Virtual Memory Process 1 Process 2 RAM Page1(1) Page1(1) Page1(2)
Process 1 Running Process 2 Scheduled to run Process 1 Sleep Process 2 Running Process 2 faulted Page 2(1) Page 2(2) Page 2(1) Page 3(1) Page 3(2) Page1(2) Page 2(2) Page 4(1) OS (8 MB) Page 5(1) Page 5(2) Page 6(2) Page 6(1) The system works because principle of locality holds. Thrashing : System swaps in/out all the time, no real work is done. Page 7(2) Page 7(1)

6 IA-32 Virtual Memory IA-32 architecture supports either pure segmentation or segmentation/paging virtual memory. Logical address Consists of a segment selector(16 bit) and an offset(32 bit). Linear Address (LA) or Virtual Address (VA) The base address of the segment + offset. This 32 bit address is used to address 4GB of memory. Physical Address (PA) 32 bit Address in RAM. Linear Address Segmentation Unit Logical Address Paging Physical Address

7 IA-32 Virtual Memory

8 IA-32 Segmentation(1) Segment Registers (6)
Hold and retrieve segment selectors quickly. CS (Code segment register) points to a segment containing program instructions. Also includes Current privilege Level (CPL) field to denote privilege level : 0 means kernel mode and 3 means user mode. DS (Data segment register) points to a segment containing static and external data. SS (Stack segment register) points to a segment containing the current program stack. ES, FS & GS are general purpose registers and may refer to arbitrary data segments.

9 IA-32 Segmentation(2) Segment Descriptors (8 Byte)
Unique Segment Identifier. Stored in Global Descriptor Table (GDT). Contains 32 bit Base address of the segment 20 bit limit 4 bit Type that denote segment type and access rights. DPL (Descriptor Privilege Level) Field : 0 means use is restricted to only kernel mode, 3 means both mode.

10 IA-32 Protection Protection
Intel Use 4 Privilege levels: 0-3 with 0 being the most privilege level. The privilege level of executing program is determined by the privilege level of the code segment currently executing. CPL (Current privilege level): Bit 0 & 1 of CS (code segment) register. The processor changes CPL when program control is transferred to a code segment with a different privilege level. DPL (Descriptor’s privilege level): Bits in Segment descriptor. When the currently executing code segment attempts to access a segment, The DPL is compared to the CPL of CS. Programs executing in a high privilege level can not access segments with a lower privilege level while programs low privilege level can access all segments.

11 Segmentation in Linux There is no mode bit to disable segmentation.
Linux prefer paging over segmentation because of simplicity and portability. The pages are divided among 4 Segments. All process use the same logical address and segment descriptors. GDT is implemented is /arch/i386/kernet/head.S Each time CPL in CS change, DS and SS changed correspondingly. SS points to DS. Segments used by Linux Type DPL Accessed By Kernel Code Code, Read, Execute Kernel Kernel Data Data, Read , Write User Code 3 Both User Data

12 Protection in Linux Segments overlap in linear address space /arch/i386/kernet/head.S Thus access is effectively allowed to the entire virtual address space using any of the above segments. All processes have two segments 0 - 3GB: user segment 3GB - 4GB kernel segment Boundary is determined by PAGE_OFFSET = 0xC Process in user mode (CPL = 3) can only access addresses lower than 3 GB (only segments with DPL = 3). Process in kernel mode (e.g. after a system call) can access both. When CPL = 0, can access segments (DPL =0,3) Any distinction between code and data is enforced at the page level, not at the segment level: R/W , U/S bit of page.

13 IA-32 Paging Paging RAM is partitioned into fixed-sized page frames.
Linear address is divided into same size pages The processor use information contained in page directories and page tables (stored in RAM) to map linear to physical address and to generate page fault exception. Translation Lookaside Buffers (TLB) are used to store most recently accessed page directory and table entries to reduce access time. Intel supports 4KB, 2MB, 4MB page size. Paging is controlled by three flags in the processor’s control registers and sets by OS during initialization. PG (paging): Available in all Intel processor starting from Enable paging. PSE (page size extensions): Introduced in the Pentium processor. Permit large page(4 MB/2 MB when PAE is set) PAE (physical address extension): Introduced in the Pentium Pro processors. Provides a method of extending physical address to 36 bits(64MB). Support page size of 4 KB/2 MB.

14 Page Table and directories
32 bit linear address is divided into 3 fields(4KB page) Page Directory : Most significant 10 bits (1024 entry) Page Table: The intermediate 10 bits (1024 entry) Offset: Least significant 12 bits (Each page is 4KB) Incase of 2MB/4MB page, most significant 10 bits are for page directory and rest 22 bits are for page offset. Page tables are not used.

15 Page Directory and Page table Entries
When 32 bit address and 4KB page used 20 bit base address, bits 12 through 32. Present: when set, Page is in RAM. Read/Write: When set, page can be read and written into. User/supervisor: When set, user privilege level, otherwise both. Accessed: sets each time paging unit access the entry. PCD (page-level cache disable) and PWT (page-level write through) Dirty: Applies page table entries only. Sets when the page is accessed for write. Global: Introduced in Pentium Pro. Applies page table entries only. When set indicates a global page and prevent the page flushed from TLB when context switch occurs. Page size: Applies page directories only. When 1 refers to 2MB/4MB page frame & PGD points to page. 4KB page when 0. This flags are checked by hardware to see whether requested kind of addressing can be performed.

16 Paging in Linux(1) Linux uses 3 level paging to adopt to 64 bit architectures. Page global directory (PGD) Page Middle directory (PMD) Page table Linear address is divided into four parts: three table offset and an page offset. What happens with IA-32, which use only two level page tables? Linux makes the PMD entry points back to PGD. IA-32 contains 1024 entries in PGD, one entry in PMD and 1024 entries in page table. Each process has its own PGD. During context switch, PGD base value of the process executing next is loaded into CR3 and TLB get flushed.

17 Paging in Linux(2) Linux use PAE, but don’t use PSE.
Also use page size (PS) flag of PGD to refer different page size for that specific PGD. Mixing 4MB and 4 KB page size Kernel use large page(4MB) and one level translation to reduce TLB entries and memory. Application use 4KB page. PAE PS of PGD Page size Physical Address size 4KB 32 bit 1 4MB 36 bit 2MB

18 Paging in Linux(3) include/asm-i386/page.h
5 #define PAGE_SHIFT 12 6 #define PAGE_SIZE (1UL << PAGE_SHIFT) 7 #define PAGE_MASK (~(PAGE_SIZE-1)) include/asm-i386/pgtable.h include/asm-i386/pgtable-2level.h Page table lookup code : mm/memory.c

19 Paging in Linux (4) The linear address space is split into two parts.
The userspace(0-3GB) can be addressed in both mode Kernel space(3GB-4GB) can be accessed in only kernel mode. PAGE_OFFSET is defined as 0xc (3 GB) Kernel Paging (4 MB page) Kernel code and data stored in a group of reserved page frame. Never be dynamically assigned or swapped to disk. Kernel maintains a set of page tables rooted at Master Kernel Page Global Directory. How kernel initializes it’s own page tables? swapper_pg_dir is initialized during kernel compilation. Phase 1: Kernel can address the first 8 MB of RAM by either LA identical to PA or 8MB starts from 0xc Phase 2: Only transform LA starts from 0xc to PA from 0. Where Paging starts? /arch/i386/kernel/head.S

20 Physical Memory Management
Physical memory is divided into three Zones: DMA, Normal & HighMEM. Page frames are assigned from these zones. Each physical page is associated with a page descriptor All pages are stored in mem_map array. Requesting page frames: alloc_pages() allocates groups of contiguous page frames and use buddy system. If alloc_pages can’t find a free page frame, it calls try_to_free_pages() to reclaim. try_to_free_pages() reclaim pages according to LRU algorithm. Memory for small data structures are carried out by Slab Allocator.

21 Process Address Space The linear address space is split into two parts. The userspace(0-3GB) changes with each context switch and accessed in both mode. Kernel space(3GB-4GB) remains constant and accessed while in kernel mode. Memory descriptor mm_struct. One structure exits for each process and is shared among threads. Memory descriptor for kernel threads. PAGE_OFFSET 0xC Kernel code & data Kernel File name, Environment Arguments Stack Heap Data Code Header Shared Libs User code & data

22 Memory Regions Full address space rarely used
Each address space consists of several non overlapping page aligned regions that are in use. Each region contains pages with same protection and purpose. A list of mapped regions by /proc/PID/maps Regions are described by vm_area_struct If a file is memory mapped, the file pointer is available through vm_file. do_mmap(), find_vma(), get_unmapped_aera()

23 Process Address Space Linear Address Memory Regions mmap_cache mmap
Start End Next Start End Next Start End Next Memory Regions mmap_cache mmap Memory Descriptor

24 Page faulting Demand fetching Two types of page fault
Page is only fetched from swap space when hardware raise a page fault exception, which then the OS traps and allocates a page. A number of pages after the faulting page is prefetched. Two types of page fault Major: Has to read from disk, expensive. Minor: Page in swap cache, protection fault. Architecture specific function do_page_fault(). basically decides what type of fault and how can it be handled. If it is a valid page fault in a valid memory region then call architecture independent function handle_mm_fault(). It allocates the required page table entries and calls handle_pte_fault.

25 Do_page_fault() flow diagram

26 handle_mm_fault() Call graph
Allocates required page table entries, if they don’t exist handle_pte_fault Based on properties, corresponding handlers are called do_no_page If first time allocation do_swap_page Pages swapped out to disk do_wp_page Copy on Write (COW) page do_anonymous_page Handle anonymous access

27 Copy on Write (COW) During fork kernel duplicates the parent address space to child. It requires Allocating page frames for the page tables of child process. Allocating page frames for the pages of the child process. Copying the pages of parent process to the pages of child process. Linux use an efficient copy on write approach The pages and page table entries are shared between parent and child process and can’t be modified. Whenever either one tries to write, a write fault occurs. Kernel then duplicates the page into a new page frame and marks it as writable. The original page frame remain write protected. When other process tries to write, kernel check whether it is only owner. If so then the page become writable.

28 What’s different in 2.6 The big change is Linux's new support for NUMA servers. Support for high end systems with multiple processors, with separate memory pools directly connected to each processor. Support for Intel's PAE (Physical Address Extension) allows the access up to 64 GB of RAM in paged mode. Linux can now run applications that access large blocks of memory. For example, bigger databases are now supported on Linux. Reverse Mapping Multiple virtual pages (pages shared by different processes) might point to the same physical page. The technique is useful when the kernel wants to free a particular physical page.

29 References IA-32 Intel® Architecture Software Developer’s Manual Volume 3: System Programming Guide (Document ): Chapter 3 & 4. Bovet, D., and Cesati, M. Understanding the Linux Kernel. O'Reilly, (chapter 2, 7, 8 & 16) Virtual memory management for Linux 2.4 kernel: Description    Code documentation Dietel & Dietel, Operating Systems, Prentice Hall , 2004 The Wonderful World of Linux 2.6 by Joseph Pranevich


Download ppt "Linux Virtual Memory for Intel Processor"

Similar presentations


Ads by Google